Apache Spark 4.1 introduces Real-Time Mode (RTM) for Structured Streaming, enabling millisecond-level latency without abandoning the microbatch architecture. The key architectural changes include: longer-duration epochs with continuous intra-epoch data flow (eliminating per-record checkpointing overhead), concurrent processing stages where reducers start as soon as mappers produce output, and non-blocking operators that minimize buffering. This allows Spark to handle both high-throughput ETL and ultra-low-latency workloads in a single engine, removing the need to run Apache Flink alongside Spark for latency-sensitive use cases like fraud detection or real-time feature engineering. RTM is already in production at Databricks.
Sort: