Best of Observability — March 2026

1
Article
Tokio·10w
Introducing dial9: a flight recorder for Tokio
dial9 is a new runtime telemetry tool for Tokio that captures a full timeline of runtime events — individual polls, parks, wakes, and Linux kernel events — rather than just aggregate metrics. Built to diagnose production-only performance issues, it helped identify kernel scheduling delays of 10ms+ on an AWS service, fd_table lock contention causing 100ms+ polls during startup, and a global mutex in backtrace::trace. With under 5% overhead, it can run continuously in production. Setup requires wrapping the Tokio runtime with TracedRuntime and traces can be viewed in a browser-based viewer or stored to S3.
33
2
2
Article
Supabase·12w
Log Drains: Now available on Pro
Supabase has launched Log Drains for Pro tier users, allowing them to forward logs from all Supabase infrastructure layers (Postgres, Auth, Storage, Edge Functions, Realtime, API Gateway) to external logging backends. Supported destinations include Sentry, Grafana Loki, Datadog, AWS S3, Axiom, and any generic HTTP endpoint. Logs are batched (up to 250 per batch or every second) with optional Gzip compression. Pricing starts at $10 per drain per project plus $0.20 per million events and $0.09 per GB egress. The feature enables centralized observability across the full stack without context-switching between the Supabase console and existing monitoring tools.
23
1
3
Article
Grafana Labs·12w
OpenTelemetry support for .NET 10: A behind-the-scenes look
Grafana Labs engineers worked with the OpenTelemetry .NET community to deliver native .NET 10 support for the OpenTelemetry instrumentation libraries on the same week as .NET 10's stable release in November 2025. The post covers the value of early preview validation (catching a logging source generator regression before RC2), key changes in the new release including schema URL support for metrics and traces, new ASP.NET Core metrics for authentication, Blazor, and memory pools, and a breaking change where the default trace context propagator switched to W3C. It also flags a binding redirect gotcha for .NET Framework apps using .NET 10 assemblies.
24
1
4
Article
Tinybird·9w
Maple: an open-source observability platform built with Tinybird's TypeScript SDK
David Granzin, a Senior Systems Engineer at Superwall, built Maple — an open-source observability platform for distributed traces, logs, and metrics — using Tinybird's TypeScript SDK on top of managed ClickHouse. With no infrastructure to provision, Tinybird's local-first development, schema visualization, branch environments, and TypeScript SDK cut his estimated 12-week build down to 7 weeks. The TypeScript SDK also enabled AI coding agents to read and modify Tinybird resources directly, accelerating iteration. In the same timeframe, David also built audit log infrastructure for a second project, Hazel, an AI-first Slack alternative.
36
5
Article
monday Engineering·10w
Less Noise, Better Sleep: Data-Driven Approach to Healthier Alerts
Monday.com's engineering team tackled alert fatigue and noisy on-call schedules by treating production alerts as data. They built a PagerDuty integration to log every alert into a monday.com board, enriched raw data with context (true/false positive, root cause, duplicates), reviewed trends quarterly in engineering reviews, and converted findings into prioritized backlog items. Over a year, this framework cut false-positive alerts by 2x and improved system resiliency across a 20-engineer group.
22
6
Article
OpenTelemetry·10w
How Mastodon Runs OpenTelemetry Collectors in Production
Mastodon, a non-profit decentralized social platform with ~20 staff, shares how a single engineer runs OpenTelemetry Collectors in production across two large Kubernetes deployments handling up to 10 million requests per minute. The setup uses one Collector per Kubernetes namespace, managed via the OpenTelemetry Operator and Argo CD, with no complex gateway tiers. Traffic is controlled through tail-based sampling (0.1% for successful traces, 100% for errors). The full production config is shared, including OTLP ingestion, Kubernetes metadata enrichment, resource detection, and Datadog export. Key lessons: keep architecture simple, use Kubernetes operators for lifecycle management, rely on semantic conventions, and upgrade frequently.
13
7
Article
Grafana Labs·12w
Build, buy, or open source? Understanding your options with Grafana’s AI-powered observability
A framework for choosing between building, buying, or using open source for AI-powered observability. The three options are framed as lanes: building in-house offers maximum control but requires significant ongoing investment; open source provides a flexible foundation but leaves teams responsible for AI orchestration, prompt design, and evaluation; and managed solutions like Grafana Cloud offer out-of-the-box AI capabilities including Grafana Assistant and Assistant Investigations. The post argues these aren't mutually exclusive—Grafana's ecosystem allows teams to combine approaches, shift strategies over time, and extend managed AI with custom agents and internal knowledge sources via open APIs.
12
8
Article
KubeSquad·9w
Instrumenting Rust TLS with eBPF
Instrumenting rustls with eBPF for L7 traffic capture is non-trivial because rustls separates TLS operations from socket I/O, unlike OpenSSL. On the write path, the OpenSSL-style correlation still works. On the read path, the order is inverted: the syscall fires before plaintext is available, requiring reverse correlation by stashing the file descriptor on recvfrom and attaching it later when reader.read fires. An additional gotcha is that Rust's Result<usize> return convention stores success/error in rax and byte count in rdx, not rax alone. Symbol detection is also tricky due to unstable Rust name mangling, requiring pattern-based scanning rather than exact name matching. These techniques are implemented in Coroot, an open source eBPF-based observability tool, enabling automatic TLS decryption for hyper, axum, sqlx, tokio-rustls, and other rustls-backed libraries with no code changes.
14
9
Article
OpenTelemetry·12w
OTTL context inference comes to the Filter Processor
OpenTelemetry collector-contrib v0.146.0 introduces OTTL context inference to the Filter Processor, previously only available in the Transform Processor. Four new top-level config fields — trace_conditions, metric_conditions, log_conditions, and profile_conditions — let users write flat condition lists without manually organizing them into context blocks like resource, span, or spanevent. The processor automatically infers the execution context from path prefixes, combines conditions with logical OR, and executes them hierarchically (higher-level matches short-circuit lower-level evaluation for performance). Advanced grouping with per-group error_mode settings is also supported. The legacy configuration format remains fully supported.
11

See all Observability archives