SQL TL;DR: Table Partitioning

Table Partitioning

/tldr: Scaling huge tables by splitting them into smaller, independent units.

Database Administration Performance Tuning Scalability

THE CORE IDEA: Divide and Conquer

Partitioning is the process of physically dividing a very large table into multiple smaller, more manageable sub-tables, called **partitions**. These partitions are still logically treated as a single table by the database, but they can be stored and managed independently.

1. Partitioning Strategies

Range Partitioning (Date or Numeric)

**Mechanism:** Data is divided based on a contiguous range of values in the partitioning column.
**Best For:** Time-series data (e.g., sales data partitioned by year or month).
**Benefit:** Highly effective for *partition pruning*—queries only scan the relevant partitions.

-- Example: Partitioning a table by year
PARTITION BY RANGE (order_date) (
    PARTITION p2022 VALUES LESS THAN ('2023-01-01'),
    PARTITION p2023 VALUES LESS THAN ('2024-01-01'),
    PARTITION pMAX VALUES LESS THAN (MAXVALUE)
);

List Partitioning (Discrete Values)

**Mechanism:** Data is divided based on a specific, non-contiguous list of values.
**Best For:** Regional or categorical data (e.g., separating users by Continent or specific State/Country codes).
**Requirement:** The list of values must be known in advance.

-- Example: Partitioning by Region Code
PARTITION BY LIST (region_code) (
    PARTITION p_us VALUES ('US', 'CA'),
    PARTITION p_eu VALUES ('UK', 'FR', 'DE'),
    PARTITION p_asia VALUES ('JP', 'CN')
);

Hash Partitioning (Even Distribution)

**Mechanism:** Data is distributed evenly across a specified number of partitions using a hashing algorithm applied to the partitioning column.
**Best For:** Achieving optimal parallel access, especially when queries don't typically filter on the partitioning key.
**Goal:** Prevents "hot spots" where some partitions are much larger than others.

-- Example: Distributing data evenly across 4 partitions
PARTITION BY HASH (customer_id) PARTITIONS 4;

2. Benefits and Trade-offs

Key Benefits

**Query Performance (Pruning):** The optimizer only scans the relevant partitions (e.g., only the Q4 2023 partition), drastically reducing I/O.
**Maintenance Speed:** You can detach/delete old data (e.g., archive 2022 data) by dropping a partition instantly, which is much faster than running a massive DELETE statement.
**Load Efficiency:** Bulk-loading new data is faster by loading it directly into a new, empty partition rather than inserting into the main table.
**Parallel Access:** Different partitions can be accessed concurrently by different threads, improving performance on multi-core systems.

Trade-offs (Drawbacks)

**Design Complexity:** Requires careful planning and ongoing management. Choosing the wrong partition key can *reduce* performance.
**Single-Partition Queries:** Queries that span many partitions (e.g., a query across all 5 years of data) can sometimes be slower than running on a non-partitioned table.
**Partition Skew:** If not designed well, some partitions may become much larger than others, creating "hot spots" and negating the benefit.
**Indexing:** Local indexes (one per partition) are highly efficient, but global indexes (spanning all partitions) add overhead.

Partitioning is rarely necessary for tables under 100GB or 1 billion rows.