Enterprise data security for lakehouses encompasses the full spectrum of controls, policies, and technologies protecting analytical data from unauthorized access, data breaches, insider threats, and compliance violations. Unlike perimeter security (network firewalls protecting a server room), cloud lakehouse security must protect data stored in shared-responsibility cloud environments (S3, ADLS) where organizational and cloud provider responsibilities are distinct.
Security Layers in the Iceberg Lakehouse
- Encryption at Rest: All Parquet files stored in S3 are encrypted using server-side encryption (SSE-S3, SSE-KMS) or client-side encryption. AWS KMS, Azure Key Vault, and Google Cloud KMS provide key management with automatic key rotation and audit trails for every decryption event.
- Encryption in Transit: All communication between query engines and S3, between client applications and query engines, and between catalog and engines uses TLS 1.2+. Arrow Flight SQL enforces TLS by default.
- Identity and Access Management: AWS IAM, Azure AD, or GCP IAM control which service accounts and users have the S3 bucket permissions to read or write Parquet files. The catalog layer (Polaris, Unity Catalog) adds application-level RBAC on top of cloud IAM, providing fine-grained table and column-level governance beyond what cloud IAM alone supports.
- Audit Logging: Every query execution, access grant, and schema change is logged. AWS CloudTrail logs S3 API calls. Catalog platforms log all catalog operations. Query engines log query text, user identity, and execution results for compliance reporting.
- Network Security: Private Link, VPC endpoints, and IP allowlists restrict which networks can access the catalog and query engine endpoints, preventing exposure to the public internet.

