Master Data Management (MDM) is the practice of creating, maintaining, and governing a single, authoritative, "golden record" version of core business entities (customers, products, employees, locations, suppliers) across all systems in the enterprise. Without MDM, the same customer might be represented differently in Salesforce, the ERP, the web analytics platform, and the data lakehouse, making cross-system analysis unreliable.

Why MDM is Critical for Lakehouses

Data lakehouses ingest data from dozens of source systems simultaneously. Each source system has its own representation of core entities: a customer in Salesforce has a different ID format than the same customer in SAP, which differs again from the web analytics cookie ID. Without MDM resolution, analytical queries comparing customers across systems produce misleading results because the same individual is counted multiple times or not matched at all.

MDM Patterns

MDM and Iceberg

The MDM golden record dimension table (dim_customer_master, dim_product_master) is typically stored as an Iceberg table in the lakehouse's Gold layer. CDC pipelines keep it synchronized with the MDM system. All fact tables join to these master dimension tables using the MDM-resolved surrogate keys, ensuring consistent customer and product definitions across all analytical queries.

Master the Agentic Lakehouse

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon