AWS S3 Fundamentals
/tldr: Scalable, highly durable, object storage for anything.
1. Core Concepts: Buckets and Objects
S3 is **Object Storage**, not a file system (like EBS) or a database. It stores flat files called Objects.
The Bucket
- **Global Namespace:** Bucket names must be globally unique across all of AWS (like a website domain name).
- **Region Specific:** A bucket is created in a specific AWS Region.
- **Storage Container:** It serves as a container for objects (data files).
Bucket Name Example: my-globally-unique-app-data-2024
The Object
- **Key/Value Pair:** Data is stored as an Object (Value) and retrieved via its Key (Full Path/Name).
- **Metadata:** Each object includes system metadata (size, date, owner) and optional user-defined metadata.
- **Size Limit:** Objects can range from 0 bytes up to 5 TB.
Object Key Example: project-folder/images/user-avatar-123.jpg
2. Durability and Consistency
Extreme Durability
S3 is designed for eleven nines (99.999999999%) of durability. This is achieved by storing data redundantly across a minimum of three Availability Zones (AZs) within a region, using techniques like erasure coding.
- **Durability vs. Availability:** Durability is the likelihood that data won't be lost. Availability (typically 99.99%) is the likelihood the service is accessible and operational.
- **Shared Responsibility:** AWS guarantees the durability of the platform; you are responsible for securing access (IAM, Bucket Policies).
Read-After-Write Consistency
S3 provides **Read-After-Write Consistency** for all operations (PUTs, POSTs, and DELETEs). This means:
- When you write a new object, you can immediately read it back.
- When you overwrite or delete an existing object, subsequent read requests will return the latest version (or a "Not Found" error if deleted).
3. Storage Classes (Cost Tiers)
You select a storage class based on access frequency, which directly impacts cost. The lower the access frequency, the cheaper the monthly storage, but the higher the retrieval cost.
S3 Standard (General Purpose)
**Best For:** Frequently accessed data (e.g., website content, mobile apps, default storage).
*Highest availability, lowest latency, highest storage cost.*
S3 Intelligent-Tiering & S3 Standard-IA
**Best For:** Data accessed less frequently but requiring immediate access when needed (e.g., backups, logs).
*Intelligent-Tiering automatically moves data between Standard and IA based on access patterns.*
S3 Glacier Flexible Retrieval & Deep Archive
**Best For:** Long-term archival and compliance data.
*Lowest storage cost, but retrieval requires a delay (minutes to hours) and incurs a retrieval fee.*
S3 is the storage backbone for nearly every AWS service.