SQL TL;DR: Normalization & Design (DDL)

Normalization & Design (DDL)

/tldr: Structuring tables to ensure data integrity and minimize redundancy.

Data Architecture Data Integrity Database Design

THE CORE IDEA: The Integrity Blueprint

Normalization is the process of structuring a relational database to minimize **redundancy** and eliminate data **anomalies** (insertion, update, and deletion). It is achieved by decomposing large tables into smaller, interlinked tables.

Key Concepts

Functional Dependency: Column A determines Column B (A → B). Example: `CustomerID` → `CustomerName`.

Determinant: The column(s) on the left side of a functional dependency (A in A → B).

Candidate Key: A minimal set of attributes that uniquely identifies every tuple (row) in a relation.

Primary Key (PK): The candidate key chosen to uniquely identify rows.

1. The Normal Forms (1NF, 2NF, 3NF)

1NF (First Normal Form)

**Rule:** All column values must be **atomic** (indivisible), and there must be no repeating groups or arrays within a single cell.

-- BAD (Non-atomic skill list in one cell)
| EmployeeID | EmployeeSkills |
| 101        | SQL, Python, Java | 

-- GOOD (Atomic, moved to a separate table)
| EmployeeID | Skill |
| 101        | SQL   |
| 101        | Python|

2NF (Second Normal Form)

**Rule:** Must be in 1NF, AND every non-key attribute must be dependent on the **entire** Primary Key (no partial dependencies). Relevant only when the PK is composite (made of two or more columns).

-- PK is (OrderID, ItemID). ItemName depends only on ItemID (partial dependency).
-- FIX: Move ItemName to a separate 'Items' table keyed by ItemID.

3NF (Third Normal Form)

**Rule:** Must be in 2NF, AND there are no **transitive dependencies**. A transitive dependency occurs when a non-key attribute determines another non-key attribute (PK → A → B).

-- PK is CustomerID. State depends on City, which depends on CustomerID.
-- BAD: CustomerID → City → State
-- FIX: Move City and State to a 'Cities' lookup table keyed by CityID or ZipCode.

2. BCNF and Denormalization

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF.

Rule:

For every functional dependency (X → Y), X must be a **Candidate Key**.

BCNF handles cases where a non-key attribute determines part of a composite candidate key (a rare scenario not solved by 3NF).

Denormalization & Trade-offs

Denormalization intentionally reintroduces redundancy (moving data from a lookup table back into the main table) to improve read performance.

**Pro:** Faster read queries (fewer JOINs). Great for data warehousing/reporting.
**Con:** Slower write operations (inserts/updates). Higher storage cost. Risk of data anomalies.

Normalize to write, Denormalize to read.