SQS doesn't propagate OpenTelemetry trace context automatically across async boundaries. When Lambda is triggered via Event Source Mapping (ESM), the publisher and consumer produce disconnected traces. The fix involves manually injecting W3C traceparent into SQS MessageAttributes on the producer side and extracting it on the consumer side. A critical gotcha: Lambda ESM delivers MessageAttributes in camelCase (stringValue, dataType) instead of PascalCase (StringValue, DataType), silently breaking context extraction. The post covers the full implementation in Python — producer inject, consumer extract with casing handling, ADOT Lambda layer configuration (OTEL_PROPAGATORS=tracecontext,baggage is required), batch processing patterns, and local testing with LocalStack. Also compares parent-child vs span links for batch consumers, and contrasts the OTel approach with Datadog, New Relic, and Dynatrace.
Table of contents
The split that ruins your debuggingWhy SQS doesn't propagate context automaticallyArchitectureProducer: injecting trace contextConsumer: the ESM format gotchaBatch processingLambda deployment with ADOTSpan links vs parent-child: which to useTesting locally with LocalStackHow other vendors handle thisWhat the unified waterfall looks likeSummaryReferencesSort: