Analytics Engineering

Analytics Engineering is the practice of applying software engineering rigor to data transformation work. It emerged as a distinct discipline around 2019, largely because dbt (data build tool) made it practical for SQL-fluent analysts to write, test, and version-control transformation models without needing a full software engineering background. The analytics engineer sits between the data engineer (who builds raw ingestion pipelines) and the data scientist (who builds models and analyses), and is responsible for turning raw data into trusted, documented, production-grade datasets.

Before analytics engineering became a named discipline, transformation logic lived in GUI-configured ETL tools, undocumented one-off SQL scripts, or embedded inside BI tool calculated fields. None of these were testable, none were version-controlled, and none were accessible to automated systems. Analytics engineering changes all three of those things.

The Three Core Practices

Version Control for Transformation Logic

Every dbt model is a SQL file stored in a Git repository. Changes go through pull requests with peer review. The history of every transformation definition is permanently recorded, including who changed it, when, and why. If a number in a dashboard changes after a model is updated, the root cause is immediately auditable through the commit history.

Automated Testing

dbt's testing framework lets analytics engineers declare data quality assertions as YAML alongside the SQL models. Common tests include: "this column should never contain null values," "customer IDs should be unique in this table," "order status should only contain values from this defined list." These tests run on every pipeline execution and fail the build if any assertion is violated, preventing bad data from reaching downstream consumers.

Inline Documentation

Every dbt model and every column in that model has a description field in YAML. dbt generates a browsable documentation website from these descriptions, giving analysts a reliable reference for understanding what each dataset contains. For AI agents, this documentation is the direct input to the Data Context Layer, which is the structured metadata the agent reads before generating SQL to ensure it understands each field correctly.

The AI Era Requirement

Analytics engineering outputs are now infrastructure for AI as much as for human analysts. An organization planning to deploy AI agents against its lakehouse should treat dbt model documentation completeness the same way it treats data quality: as a non-negotiable production readiness gate. An undocumented column is a potential hallucination source for any AI agent that encounters it.

The Three Core Practices

Version Control for Transformation Logic

Automated Testing

Inline Documentation

The AI Era Requirement

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone

Analytics Engineering

The Three Core Practices

Version Control for Transformation Logic

Automated Testing

Inline Documentation

The AI Era Requirement

Related Articles

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone