Iris is an observability platform for ETL jobs that captures critical job metrics, provides real-time monitoring and offline analytics, and offers actionable insights. It uses metrics collector, Kafka queue, and TIG stack for its platform architecture and collects data using JVM Profiler and sparkMeasure. Iris visualizes real-time data using Grafana and routes metrics to InfluxDB for offline analysis. It also employs K-means clustering for job classification and enhances accuracy in calculating Databricks infrastructure costs. Case studies demonstrate how Iris improves performance and cost efficiency. Future plans include developing APIs, adding a Presto listener, and creating a feedback loop for continuous improvement.

17m read timeFrom engineering.grab.com
Post cover image
Table of contents
IntroductionUnderstanding the needsObservability with IrisTransforming observations into insightsSeeing Iris in actionThe future of IrisConclusion
1 Comment

Sort: