Trendyol built Helyx, an in-house data observability platform that detects silent data failures in large-scale warehouses. The system monitors freshness, volume, and schema anomalies by collecting metadata snapshots into InfluxDB, comparing current metrics against 6-week baselines using statistical modeling. A key innovation is tracking `last_volume_changed_time` to catch tables that update metadata but contain stale data. The platform includes auto-normalization, importance scoring for critical assets, contextual threshold adjustments (like relaxed monitoring during overnight batch windows), and Slack-based incident management with self-tuning feedback loops that learn team tolerance levels over time.
Table of contents
How we turned simple metadata into powerful data quality insightsDefining the Problem Space: What Can Go Wrong?System ArchitectureMetric Collector & Time-Series StorageGet Emir Arslan’s stories in your inboxAnomaly Detection ServiceDashboard: Key Metrics at a GlanceIncident Page: Detailed Anomaly ManagementAsset InventoryActionable Observability: Subscriptions & SlackConclusionHome - Trendyol CareersSort: