Join me on a Friday night on-call investigation into a rogue Node.js service.

Zalando Tech Blog is a platform where Zalando's engineering team shares insights, experiences, and best practices on technology and innovation. Covering topics such as software architecture, data science, and machine learning, Zalando's Tech Blog offers resources for developers and tech enthusiasts. Developers can learn about Zalando's engineering culture, projects, and technology stack through their blog posts and articles.

Zalando

A Node.js service faced performance issues due to improper handling of worker threads, which caused high resource consumption and server instability within a Kubernetes environment. By spawning multiple workers per CPU core instead of per allocated resource, and aggressively restarting them on errors, a positive feedback loop overwhelmed both the campaign and translation services. Investigation revealed that limiting worker threads and proper resource allocation could resolve the issue, highlighting the importance of optimized worker management and enhanced observability in production environments.

Node.js and the tale of worker threads

<p>cool stuff from zalando again &lt;3</p>


<p>very good blog with detailed explanation</p>