Live PySpark Emulator

Mastering Spark requires muscle memory. Use this browser-based sandbox to practice DataFrame API syntax without setting up a complex cluster.

Step 1: Initialize the Spark API

Copy and paste this "Shim" into the sandbox below to enable .select(), .filter(), and .groupBy() logic.

import pandas as pd

class SparkMock:
    def __init__(self, df): self._df = df
    def select(self, *c): return SparkMock(self._df[list(c)])
    def filter(self, q): return SparkMock(self._df.query(q))
    def groupBy(self, c): self._g = c; return self
    def agg(self, d): return SparkMock(self._df.groupby(self._g).agg(d).reset_index())
    def show(self): print(self._df.to_string(index=False))

# Create Practice Data
data = [("James","Sales",3000), ("Michael","Sales",4600), ("Robert","Sales",4100)]
df = SparkMock(pd.DataFrame(data, columns=["name","dept","salary"]))

print("✅ Spark API Ready! Try: df.filter('salary > 4000').show()")
PYSPARK_REPL_V1.0 CONNECTED

Note: This sandbox uses a lightweight shim to simulate Spark behavior in the browser. High-volume data should be handled in a real Databricks/Spark environment.