A deep dive into optimizing Linux timestamp acquisition for low-latency C++ applications, targeting sub-100ns overhead per span in an OpenTelemetry tracing library. Covers the x86 TSC (timestamp counter), vDSO internals, and seqlock mechanics. Progresses from naive clock_gettime() calls (47ns) through direct TSC reads to a custom vDSO bypass implementation (20.5ns, 57% improvement). Also addresses tail latency caused by kernel timer updates and proposes cached timer variants that eliminate latency spikes entirely, at the cost of coupling to kernel data page layout.
Table of contents
Timing the timersThe TSCWhen syscalls aren’tFaster monotonic clocksMaking our own vDSOMeasuring tailsStable timersConclusionAppendix: MethodologySort: