AWS TL;DR: DataSync (Data Transfer Service)
Back to AWS TL;DR Hub

AWS DataSync

/tldr: A managed, secure, and fast service for online transfer of large, performance-sensitive datasets between various locations.

Online Data Transfer Hybrid Storage Migration Sync & Replication

1. Core Functionality and Setup

DataSync provides a high-performance, purpose-built transfer protocol that is up to 10x faster than open-source tools. It is designed for migrating active data, replicating data for disaster recovery, or simply archiving data to the cloud.

The DataSync Agent

For transfers involving **on-premises storage** (NFS, SMB, Self-Managed Object Storage), you must deploy the **DataSync Agent**. This agent is a lightweight VM (or container) that runs in your local data center, connects securely to AWS, and handles the actual data transfer.

2. The Transfer Workflow (Location to Task)

DataSync operations are defined by three components: the Agent (if needed), the Source/Destination **Locations**, and the **Task**.

Agent (If Needed)

Deployed locally as a VM (or container) to access on-premises file systems (NFS/SMB) or S3-compatible storage. It encrypts and sends data to the DataSync service.

Locations

Define the source and destination endpoints. This is typically a combination of a private/on-premises source (via the Agent) and an AWS storage destination.

  • **Source/Destination:** S3, EFS, FSx for Windows File Server, FSx for Lustre, FSx for ONTAP.
  • **On-Premises:** NFS, SMB, Self-Managed S3.

Task

The Task defines the transfer operation, including the source and destination Locations, scheduling (one-time or recurring), and key configuration settings.

  • **Verification:** Integrity checks (checksums) after transfer.
  • **Filtering:** Include/exclude specific files based on patterns.
  • **Deletion:** Option to delete files at the destination if they no longer exist at the source.

3. Key Differentiators and Use Cases

DataSync provides reliability, speed, and automation crucial for enterprise data management.

Technical Advantages

  • **Incremental Transfers:** Only transfers data that has changed since the last run.
  • **Bandwidth Optimization:** Automatically compresses data and uses in-transit encryption (TLS).
  • **Managed:** Handles all orchestration, retries, error recovery, and logging (via CloudWatch).
  • **Metadata Preservation:** Preserves file system metadata like ownership, timestamps, and permissions.

Common Use Cases

  • **Cloud Migration:** Moving petabytes of NAS data to S3 or EFS in one go.
  • **Data Archival:** Moving cold data from high-cost local storage to Amazon S3 Glacier.
  • **Replication:** Maintaining a synchronized copy of on-premises data in AWS for Disaster Recovery.
  • **In-Cloud Transfer:** Replicating data between AWS accounts, regions, or storage services (e.g., S3 to FSx).

DataSync is your go-to service for moving large datasets online quickly and reliably, bypassing the limitations of standard network protocols.

AWS Fundamentals Series: AWS DataSync