Table Partitioning
/tldr: Scaling huge tables by splitting them into smaller, independent units.
Database Administration
Performance Tuning
Scalability
THE CORE IDEA: Divide and Conquer
Partitioning is the process of physically dividing a very large table into multiple smaller, more manageable sub-tables, called **partitions**. These partitions are still logically treated as a single table by the database, but they can be stored and managed independently.
1. Partitioning Strategies
Range Partitioning (Date or Numeric)
- **Mechanism:** Data is divided based on a contiguous range of values in the partitioning column.
- **Best For:** Time-series data (e.g., sales data partitioned by year or month).
- **Benefit:** Highly effective for *partition pruning*—queries only scan the relevant partitions.
-- Example: Partitioning a table by year
PARTITION BY RANGE (order_date) (
PARTITION p2022 VALUES LESS THAN ('2023-01-01'),
PARTITION p2023 VALUES LESS THAN ('2024-01-01'),
PARTITION pMAX VALUES LESS THAN (MAXVALUE)
);
List Partitioning (Discrete Values)
- **Mechanism:** Data is divided based on a specific, non-contiguous list of values.
- **Best For:** Regional or categorical data (e.g., separating users by Continent or specific State/Country codes).
- **Requirement:** The list of values must be known in advance.
-- Example: Partitioning by Region Code
PARTITION BY LIST (region_code) (
PARTITION p_us VALUES ('US', 'CA'),
PARTITION p_eu VALUES ('UK', 'FR', 'DE'),
PARTITION p_asia VALUES ('JP', 'CN')
);
Hash Partitioning (Even Distribution)
- **Mechanism:** Data is distributed evenly across a specified number of partitions using a hashing algorithm applied to the partitioning column.
- **Best For:** Achieving optimal parallel access, especially when queries don't typically filter on the partitioning key.
- **Goal:** Prevents "hot spots" where some partitions are much larger than others.
-- Example: Distributing data evenly across 4 partitions
PARTITION BY HASH (customer_id) PARTITIONS 4;
2. Benefits and Trade-offs
Key Benefits
- **Query Performance (Pruning):** The optimizer only scans the relevant partitions (e.g., only the Q4 2023 partition), drastically reducing I/O.
- **Maintenance Speed:** You can detach/delete old data (e.g., archive 2022 data) by dropping a partition instantly, which is much faster than running a massive
DELETEstatement. - **Load Efficiency:** Bulk-loading new data is faster by loading it directly into a new, empty partition rather than inserting into the main table.
- **Parallel Access:** Different partitions can be accessed concurrently by different threads, improving performance on multi-core systems.
Trade-offs (Drawbacks)
- **Design Complexity:** Requires careful planning and ongoing management. Choosing the wrong partition key can *reduce* performance.
- **Single-Partition Queries:** Queries that span many partitions (e.g., a query across all 5 years of data) can sometimes be slower than running on a non-partitioned table.
- **Partition Skew:** If not designed well, some partitions may become much larger than others, creating "hot spots" and negating the benefit.
- **Indexing:** Local indexes (one per partition) are highly efficient, but global indexes (spanning all partitions) add overhead.
Partitioning is rarely necessary for tables under 100GB or 1 billion rows.