Backup & Recovery

/tldr: The strategy for minimizing data loss and downtime after a failure.

Data Integrity Disaster Recovery Business Continuity

1. Core Concepts: The Disaster Plan

Backup and Recovery is defined by two critical metrics established by the business:

Recovery Point Objective (RPO)

**What it is:** The maximum amount of data (measured in time) that the business can afford to lose.

*Example: If RPO is 1 hour, the recovered data must be no older than 1 hour before the failure.*

Recovery Time Objective (RTO)

**What it is:** The maximum time allowed for the database system to be offline and restored after a failure.

*Example: If RTO is 4 hours, the database must be fully operational and available to users within 4 hours of the failure.*

The Transaction Log (The Key to RPO)

The **Transaction Log (or Journal)** is the heart of recovery. It records every change made to the database. To achieve a low RPO (minimal data loss), you must back up the transaction log frequently (e.g., every 5 minutes). This allows for Point-in-Time Recovery.

-- SQL Server Example for Log Backup
BACKUP LOG [MyDatabase] TO DISK = 'C:\Backup\Log_20241204.trn';

2. Backup Types: Cost vs. Recovery Time

The three main types of data backups balance backup size and duration against the speed and complexity of restoration.

Full Backup

Copies the entire database structure and all data pages.

**Size:** Largest.
**Recovery:** Fastest and simplest (only requires the full backup file).
**Use Case:** Weekly or initial benchmark.

Differential Backup

Copies all changes made since the *last full backup*.

**Size:** Medium (grows until the next full backup).
**Recovery:** Requires only the **last full** and the **last differential** backup.
**Use Case:** Daily—a good balance between size and recovery time.

Incremental Backup

Copies all changes made since the *last backup of any type* (full or incremental).

**Size:** Smallest (only the changes since the immediate prior backup).
**Recovery:** Slowest and most complex; requires the **last full** backup plus **all subsequent incremental** backups in sequence.
**Use Case:** Hourly—minimizes backup time but maximizes recovery time.

3. The Recovery Process

Recovery involves two main steps, often handled automatically by the database system:

Roll Forward (REDO)

Applies committed transactions from the log to the data files, bringing the database to the point of failure (or the chosen point in time).

Roll Back (UNDO)

Reverses any incomplete or uncommitted transactions found in the log at the moment of failure to ensure database consistency.

Point-in-Time Recovery (PITR)

PITR allows you to restore the database to a specific second *before* an erroneous event (like a huge accidental DELETE). This requires a full backup followed by *all* subsequent transaction log backups, which are applied up to the target time.

Backup is useless without a tested Recovery plan.