SQL TL;DR: Table Partitioning
Back to SQL TL;DR Hub

Table Partitioning

/tldr: Scaling huge tables by splitting them into smaller, independent units.

Database Administration Performance Tuning Scalability

THE CORE IDEA: Divide and Conquer

Partitioning is the process of physically dividing a very large table into multiple smaller, more manageable sub-tables, called **partitions**. These partitions are still logically treated as a single table by the database, but they can be stored and managed independently.

1. Partitioning Strategies

Range Partitioning (Date or Numeric)

  • **Mechanism:** Data is divided based on a contiguous range of values in the partitioning column.
  • **Best For:** Time-series data (e.g., sales data partitioned by year or month).
  • **Benefit:** Highly effective for *partition pruning*—queries only scan the relevant partitions.
-- Example: Partitioning a table by year
PARTITION BY RANGE (order_date) (
    PARTITION p2022 VALUES LESS THAN ('2023-01-01'),
    PARTITION p2023 VALUES LESS THAN ('2024-01-01'),
    PARTITION pMAX VALUES LESS THAN (MAXVALUE)
);
                    

List Partitioning (Discrete Values)

  • **Mechanism:** Data is divided based on a specific, non-contiguous list of values.
  • **Best For:** Regional or categorical data (e.g., separating users by Continent or specific State/Country codes).
  • **Requirement:** The list of values must be known in advance.
-- Example: Partitioning by Region Code
PARTITION BY LIST (region_code) (
    PARTITION p_us VALUES ('US', 'CA'),
    PARTITION p_eu VALUES ('UK', 'FR', 'DE'),
    PARTITION p_asia VALUES ('JP', 'CN')
);
                    

Hash Partitioning (Even Distribution)

  • **Mechanism:** Data is distributed evenly across a specified number of partitions using a hashing algorithm applied to the partitioning column.
  • **Best For:** Achieving optimal parallel access, especially when queries don't typically filter on the partitioning key.
  • **Goal:** Prevents "hot spots" where some partitions are much larger than others.
-- Example: Distributing data evenly across 4 partitions
PARTITION BY HASH (customer_id) PARTITIONS 4;
                    

2. Benefits and Trade-offs

Key Benefits

  • **Query Performance (Pruning):** The optimizer only scans the relevant partitions (e.g., only the Q4 2023 partition), drastically reducing I/O.
  • **Maintenance Speed:** You can detach/delete old data (e.g., archive 2022 data) by dropping a partition instantly, which is much faster than running a massive DELETE statement.
  • **Load Efficiency:** Bulk-loading new data is faster by loading it directly into a new, empty partition rather than inserting into the main table.
  • **Parallel Access:** Different partitions can be accessed concurrently by different threads, improving performance on multi-core systems.

Trade-offs (Drawbacks)

  • **Design Complexity:** Requires careful planning and ongoing management. Choosing the wrong partition key can *reduce* performance.
  • **Single-Partition Queries:** Queries that span many partitions (e.g., a query across all 5 years of data) can sometimes be slower than running on a non-partitioned table.
  • **Partition Skew:** If not designed well, some partitions may become much larger than others, creating "hot spots" and negating the benefit.
  • **Indexing:** Local indexes (one per partition) are highly efficient, but global indexes (spanning all partitions) add overhead.

Partitioning is rarely necessary for tables under 100GB or 1 billion rows.

This marks the end of the SQL TL;DR series.