Databricks TL;DR: Notebooks and Magic Commands
Back to Databricks main page

Databricks Notebooks & Magic Commands

/tldr: An interactive, cell-based environment for multi-language execution and data exploration.

Polyglot Interactive Computing Collaboration Magic Commands

1. Notebook Fundamentals

Databricks Notebooks are based on the Apache Zeppelin model, designed to execute code against a Spark cluster. They are highly collaborative, supporting real-time co-editing and version control.

Polyglot Capability

The primary language of a notebook is set when created (Python, SQL, Scala, or R), but any cell can switch to a different language using a Magic Command.

Spark Integration

All code executes on the attached Databricks cluster, enabling seamless access to Spark, Delta Lake tables, and the unified file system (DBFS).

2. Magic Commands: The Core Concept

Magic commands start with a single percent sign (%) and must be the first line in a cell. They are used to change the cell's language, run shell commands, or interact with file systems.

Language Magic Commands

These commands change the language parser for that specific cell. This is critical for complex workflows where you might use Python for transformation logic, SQL for querying tables, and Scala for optimized operations.

The dbutils Utility

While not a magic command, dbutils is the primary utility object available in all languages to perform common tasks like interacting with secrets, managing the file system, and chaining notebooks.

dbutils.fs.ls("/mnt/data/")
dbutils.secrets.get(scope="my_scope", key="my_key")

3. Essential Magic Commands

%sql - Execute SQL

Runs standard SQL queries. DataFrames generated in other cells are accessible as temporary views or Unity Catalog tables.

%sql SELECT region, COUNT(*) FROM prod.gold.customers GROUP BY 1

%sh - Run Shell Commands

Executes shell scripts on the driver node of the cluster. Useful for installing external packages, checking file statuses, or running Linux utilities.

%sh pip install numpy # Installs numpy on the cluster ls -R /databricks/driver

%fs - File System Utility (Legacy)

A predecessor to dbutils.fs, used to perform basic file system operations on DBFS (Databricks File System).

%fs ls /FileStore/ %fs head /mnt/raw_data/sample.csv

%run - Chain Notebooks

Executes another notebook in the current notebook's scope. All variables, functions, and temporary tables defined in the target notebook are available in the current one.

%run /path/to/another/notebook_functions

Notebooks are the Lakehouse IDE, and Magic Commands are the power tools that make polyglot development possible.

Databricks Fundamentals Series: Notebooks and Magic Commands