Apache Flink is an open-source, unified stream and batch processing framework. While Apache Spark treats streaming as a series of micro-batches (a batch processing model), Flink was designed from the ground up as a true stateful streaming engine where each event is processed individually as it arrives. In 2026, the combination of Apache Flink and Apache Iceberg has matured into the standard architecture for building real-time streaming data lakehouses.

Flink's Iceberg Integration

The Flink-Iceberg connector enables Flink jobs to write directly to Iceberg tables on object storage. Key aspects of this integration include:

The Streaming-Compaction Cycle

A critical operational reality of Flink-to-Iceberg streaming is the small file problem. Every checkpoint commit (which might occur every 1-5 minutes) creates new Parquet files. Production deployments must pair the Flink streaming job with a background compaction service that continuously merges these small files into optimally-sized blocks. Batching writes into 1-to-5-minute intervals reduces the number of commits by orders of magnitude compared to per-second commits while still providing near-real-time data availability for most dashboard use cases.

Master the Agentic Lakehouse

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon