Normalization & Design (DDL)
/tldr: Structuring tables to ensure data integrity and minimize redundancy.
THE CORE IDEA: The Integrity Blueprint
Normalization is the process of structuring a relational database to minimize **redundancy** and eliminate data **anomalies** (insertion, update, and deletion). It is achieved by decomposing large tables into smaller, interlinked tables.
Key Concepts
1. The Normal Forms (1NF, 2NF, 3NF)
1NF (First Normal Form)
**Rule:** All column values must be **atomic** (indivisible), and there must be no repeating groups or arrays within a single cell.
-- BAD (Non-atomic skill list in one cell)
| EmployeeID | EmployeeSkills |
| 101 | SQL, Python, Java |
-- GOOD (Atomic, moved to a separate table)
| EmployeeID | Skill |
| 101 | SQL |
| 101 | Python|
2NF (Second Normal Form)
**Rule:** Must be in 1NF, AND every non-key attribute must be dependent on the **entire** Primary Key (no partial dependencies). Relevant only when the PK is composite (made of two or more columns).
-- PK is (OrderID, ItemID). ItemName depends only on ItemID (partial dependency).
-- FIX: Move ItemName to a separate 'Items' table keyed by ItemID.
3NF (Third Normal Form)
**Rule:** Must be in 2NF, AND there are no **transitive dependencies**. A transitive dependency occurs when a non-key attribute determines another non-key attribute (PK → A → B).
-- PK is CustomerID. State depends on City, which depends on CustomerID.
-- BAD: CustomerID → City → State
-- FIX: Move City and State to a 'Cities' lookup table keyed by CityID or ZipCode.
2. BCNF and Denormalization
Boyce-Codd Normal Form (BCNF)
BCNF is a stricter version of 3NF.
Rule:
For every functional dependency (X → Y), X must be a **Candidate Key**.
BCNF handles cases where a non-key attribute determines part of a composite candidate key (a rare scenario not solved by 3NF).
Denormalization & Trade-offs
Denormalization intentionally reintroduces redundancy (moving data from a lookup table back into the main table) to improve read performance.
- **Pro:** Faster read queries (fewer JOINs). Great for data warehousing/reporting.
- **Con:** Slower write operations (inserts/updates). Higher storage cost. Risk of data anomalies.
Normalize to write, Denormalize to read.