Kafka time lag metrics can silently underreport consumer delays due to two features: log compaction and retention deletion. When a consumer's committed offset points to a deleted message, Kafka returns the next available message with a newer timestamp, making lag appear smaller than it really is. In extreme cases, reported lag of 30 seconds can mask actual lag of 30 minutes. The klag-exporter tool detects both conditions by comparing requested vs. returned offsets and checking if committed offsets fall below the low watermark, exposing detection flags on metrics. Mitigations include increasing min.compaction.lag.ms, scaling slow consumers, and treating flagged time lag values as lower bounds rather than exact measurements.
Table of contents
Quick Recap: How Time Lag WorksLog Compaction ProblemRetention Deletion: When Your Offsets Fall Off the CliffDetection: How klag-exporter Catches These LiesWhat to Do About ItFor Monitoring DashboardsSort: