Mastodon, a non-profit decentralized social platform with ~20 staff, shares how a single engineer runs OpenTelemetry Collectors in production across two large Kubernetes deployments handling up to 10 million requests per minute. The setup uses one Collector per Kubernetes namespace, managed via the OpenTelemetry Operator and Argo CD, with no complex gateway tiers. Traffic is controlled through tail-based sampling (0.1% for successful traces, 100% for errors). The full production config is shared, including OTLP ingestion, Kubernetes metadata enrichment, resource detection, and Datadog export. Key lessons: keep architecture simple, use Kubernetes operators for lifecycle management, rely on semantic conventions, and upgrade frequently.

9m read timeFrom opentelemetry.io
Post cover image
Table of contents
Mastodon at a glanceCollector architecture: One per namespace, no moreDeployment and lifecycle managementTraffic management through samplingConfiguration: Opinionated, but minimalAdvice for small teamsWhat’s next

Sort: