Spark SQL is the module within Apache Spark that provides a SQL interface for structured data processing. It allows developers and data engineers to write standard ANSI SQL queries against distributed datasets, mixing SQL with the DataFrame API in the same Spark application. In the context of Apache Iceberg, Spark SQL is the primary control plane for DDL operations, DML operations, and table maintenance procedures.
DDL Operations on Iceberg Tables
Spark SQL provides the standard CREATE, ALTER, and DROP syntax extended to support Iceberg-specific features. For example, you can create a partitioned Iceberg table with:
CREATE TABLE catalog.db.events (ts TIMESTAMP, user_id STRING) USING ICEBERG PARTITIONED BY (days(ts));
Iceberg's partition evolution allows changing this partition strategy at any time without rewriting the underlying data, using Spark SQL ALTER TABLE commands.
DML and Time Travel
Spark SQL supports the full range of data manipulation on Iceberg tables including INSERT INTO, INSERT OVERWRITE, UPDATE, DELETE, and MERGE INTO. Spark SQL also exposes Iceberg's time travel capability: SELECT * FROM catalog.db.events TIMESTAMP AS OF '2024-01-01' lets analysts query historical states of the table without any data duplication.
Iceberg Stored Procedures
Perhaps the most powerful Spark SQL feature specific to Iceberg is the stored procedure system. These are invoked via the CALL statement and include the critical maintenance operations:
CALL catalog.system.rewrite_data_files('db.table')for compactionCALL catalog.system.expire_snapshots('db.table', TIMESTAMP '...')to clean old snapshotsCALL catalog.system.rollback_to_snapshot('db.table', snapshot_id)for instant data recovery
These procedures make Spark SQL the standard administrative language for the Apache Iceberg ecosystem.

