Trendyol built Helyx, an in-house data observability platform that detects silent data failures in large-scale warehouses. The system monitors freshness, volume, and schema anomalies by collecting metadata snapshots into InfluxDB, comparing current metrics against 6-week baselines using statistical modeling. A key innovation is tracking `last_volume_changed_time` to catch tables that update metadata but contain stale data. The platform includes auto-normalization, importance scoring for critical assets, contextual threshold adjustments (like relaxed monitoring during overnight batch windows), and Slack-based incident management with self-tuning feedback loops that learn team tolerance levels over time.

6m read timeFrom medium.com
Post cover image
Table of contents
How we turned simple metadata into powerful data quality insightsDefining the Problem Space: What Can Go Wrong?System ArchitectureMetric Collector & Time-Series StorageGet Emir Arslan’s stories in your inboxAnomaly Detection ServiceDashboard: Key Metrics at a GlanceIncident Page: Detailed Anomaly ManagementAsset InventoryActionable Observability: Subscriptions & SlackConclusionHome - Trendyol Careers

Sort: