A universe of events and platforms focused on open source, open tech and the open web.

All Things Open

Running LLMs in production is fundamentally different from demos. The real challenges are infrastructure-level: unpredictable latency, GPU underutilization due to poor batching, cost explosion at scale, and slow autoscaling that reacts after demand spikes. Key optimizations include complexity-based request routing, response streaming, prompt compression, caching repeated queries, and careful batching. The core argument is that most teams over-invest in model selection and under-invest in the surrounding system — routing, observability, and cost controls — which is where production failures actually originate.

The hard part of LLMs isn’t the model. It’s everything around it

Demos are easy. Production is a frontier most teams aren't ready to scale.

The gap: Why LLMs fail in production but look perfect in demos

The LLM infrastructure stack behind the model

What actually breaks when you scale LLMs in production

Practical LLM optimizations that actually work at scale

The most common LLM scaling mistakes teams make

The future of LLM infrastructure beyond the model

Final thought: LLMs do not fail in isolation