A data contract is a formal, machine-readable agreement between a data producer (the team or system that writes data) and data consumers (the teams or systems that read that data). A data contract defines exactly what the producer commits to delivering: the schema, the data types, the quality expectations, the update frequency (SLA), and the semantic meaning of fields. Data contracts are a cornerstone of data mesh architectures and modern data product thinking.
What a Data Contract Contains
- Schema Definition: The exact column names, data types, and nullable constraints that the producer commits to maintaining. Any change to the schema must trigger a contract review.
- Quality Expectations: Specific, testable quality rules (e.g., "order_id is unique," "event_time is never null," "revenue is always positive"). These are often implemented using Great Expectations or dbt tests.
- Freshness SLA: The maximum acceptable age of data (e.g., "sales data is updated within 1 hour of transaction"). Observability tools monitor Iceberg snapshot freshness against this SLA.
- Semantic Documentation: Human-readable descriptions of what each field means, what its valid value range is, and how it should be interpreted. This context is critical for AI agents and LLMs generating SQL against the lakehouse.
Data Contracts and Iceberg
Apache Iceberg's schema evolution rules align naturally with data contract enforcement. Iceberg prevents breaking schema changes (like renaming a column or changing a type in a non-compatible direction) by default. Data contract systems can hook into Iceberg's schema evolution workflow to enforce that any proposed schema change is reviewed against all known downstream consumers before it is applied. Tools like Soda Core, OpenDataContract, and emerging catalog integrations enable automated contract validation on every write to an Iceberg table.

