How Mastodon Runs OpenTelemetry Collectors in Production

Mastodon, a non-profit decentralized social platform with ~20 staff, shares how a single engineer runs OpenTelemetry Collectors in production across two large Kubernetes deployments handling up to 10 million requests per minute. The setup uses one Collector per Kubernetes namespace, managed via the OpenTelemetry Operator and Argo CD, with no complex gateway tiers. Traffic is controlled through tail-based sampling (0.1% for successful traces, 100% for errors). The full production config is shared, including OTLP ingestion, Kubernetes metadata enrichment, resource detection, and Datadog export. Key lessons: keep architecture simple, use Kubernetes operators for lifecycle management, rely on semantic conventions, and upgrade frequently.

#devops

#kubernetes

#observability

#opentelemetry

Mar 18•9m read time•From opentelemetry.io

Table of contents

Mastodon at a glance Collector architecture: One per namespace, no more Deployment and lifecycle management Traffic management through sampling Configuration: Opinionated, but minimal Advice for small teams What’s next

Comment

Bookmark

Copy

Sort: