Indexing Strategies (DDL)
/tldr: Optimizing query speed by creating search structures.
THE CORE IDEA: The Library Card Catalog
An index is a data structure, typically a B-Tree, that stores the values of one or more columns from a table in a structured, sorted way. When you search for a value, the database reads the tiny, sorted index first, then goes straight to the data, avoiding a slow, full table scan.
Fast Search: Index Read → Data Row
Slow Search: Full Table Scan
1. Primary Index Types
The main distinction is how the index relates to the physical storage of the table data.
Clustered Index
The table data **is physically ordered** in the database based on the keys of the clustered index.
- **One per Table:** A table can only have one clustered index.
- **Primary Key:** The Primary Key is usually implemented as the clustered index.
- **Best for:** Range scans and sorting (e.g., searching by date).
Non-Clustered Index
A separate, sorted list of index values that points to the physical location of the data (or the clustered index key).
- **Multiple per Table:** A table can have many non-clustered indexes.
- **Best for:** Specific lookups and columns used heavily in the `WHERE` clause.
2. Indexing Syntax and Strategy
The CREATE INDEX Command
-- Basic syntax for creating a non-clustered index
CREATE [UNIQUE] INDEX index_name
ON table_name (column1, column2, ...);
-- Example: Creating an index on Customer Email for fast login lookups
CREATE UNIQUE INDEX idx_customer_email
ON customers (email);
Multi-Column Indexes
Indexes defined on multiple columns (e.g., `(country, state)`). Order matters! SQL can only use the index if the query starts with the leftmost column(s).
-- Query using only 'country' can use this index.
-- Query using only 'state' CANNOT use this index.
CREATE INDEX idx_country_state ON addresses (country, state);
Covering Indexes (Index-Only Scans)
The index contains all the columns needed by the query (in the `SELECT` and `WHERE` clauses). The database never has to look up the data row itself, making the query extremely fast.
-- Index includes 'order_date' (for filtering) AND 'order_total' (for SELECT)
CREATE INDEX idx_order_date_total
ON orders (order_date)
INCLUDE (order_total);
3. Indexing Best Practices & Anti-Patterns
Best Practices
- Index columns heavily used in `WHERE`, `JOIN`, and `ORDER BY` clauses.
- Prefer indexing columns with high **cardinality** (many unique values, e.g., `email`, `SSN`) over low cardinality (few unique values, e.g., `gender`, `is_active`).
- Keep indexes narrow (fewer columns) to improve storage and speed.
Anti-Patterns (When Indexes Hurt)
- **Over-Indexing:** Too many indexes slow down `INSERT`, `UPDATE`, and `DELETE` operations, as every index must be maintained.
- **Using Functions in WHERE:** Applying a function to an indexed column (e.g., `WHERE YEAR(order_date) = 2023`) negates the index, forcing a full scan.
- **Indexing Everything:** Small tables (under 10,000 rows) often do not benefit from indexes, as a full scan is faster than traversing the B-Tree.
Optimize for speed, not quantity.