Best of KubernetesNovember 2025

  1. 1
    Article
    Avatar of bytebytegoByteByteGo·22w

    How Disney Hotstar (now JioHotstar) Scaled Its Infra for 60 Million Concurrent Users

    Disney+ Hotstar scaled from 25 million to 61 million concurrent users during the 2023 Cricket World Cup through a comprehensive infrastructure overhaul. Key improvements included separating cacheable from non-cacheable APIs at the CDN layer, migrating from self-managed KOPS to Amazon EKS, implementing distributed NAT gateways per subnet, and introducing a Datacenter Abstraction model. This abstraction unified multiple Kubernetes clusters into logical data centers with a centralized Envoy-based API gateway, replacing 200+ individual load balancers. The team also eliminated NodePort limitations by switching to ClusterIP services, standardized service endpoints, and adopted single-manifest deployments. The final architecture distributed 200+ microservices across six optimized EKS clusters, each designed for specific workload types.

  2. 2
    Article
    Avatar of bytebytegoByteByteGo·23w

    How Spotify Built Its Data Platform To Understand 1.4 Trillion Data Points

    Spotify processes 1.4 trillion data points daily through a sophisticated data platform that evolved from a single Hadoop cluster to a multi-product system running on Google Cloud. The platform consists of three core components: data collection (capturing events from millions of devices using client SDKs and Kubernetes operators), data processing (running 38,000+ automated pipelines using BigQuery, Flink, and Apache Beam), and data management (ensuring privacy, security, and compliance). The architecture emphasizes self-service capabilities, allowing product teams to define event schemas and deploy infrastructure automatically while maintaining centralized governance. Built-in anonymization, lineage tracking, and quality checks ensure data trustworthiness across financial reporting, personalized recommendations, and experimentation systems.

  3. 3
    Article
    Avatar of hnHacker News·22w

    The Grafana trust problem

    An experienced engineer shares their journey with Grafana's observability stack, detailing how frequent architectural changes, deprecations, and increasing complexity have eroded trust. Starting with simple Loki/Prometheus setups, they've witnessed rapid product churn—Grafana Agent deprecated within 2-3 years, OnCall discontinued, and Mimir 3.0 now requiring Kafka. The constant restructuring, incompatibilities with Prometheus Operator standards, and career-driven development pace make it difficult to maintain stable monitoring infrastructure. While acknowledging the technical quality of Grafana products, the author questions their long-term viability and considers alternatives like the kube-prometheus-stack with Thanos.

  4. 4
    Article
    Avatar of su5hqluae4wlrb1nahjtvSerdarcan Buyukdereli·21w

    Life After NGINX: The New Era of Kubernetes Ingress & Gateways

    A comprehensive comparison of Kubernetes ingress and gateway solutions beyond NGINX, evaluating Traefik, Istio, Kong, Cilium, Pomerium, kgateway, HAProxy, and Contour. The guide analyzes each tool across architecture, traffic management, security features, observability, performance, and future-proofing to help DevOps engineers and SREs make informed production decisions. Includes practical YAML examples, a detailed scoring matrix, and insights on Gateway API adoption for long-term infrastructure planning.

  5. 5
    Article
    Avatar of rhdevRed Hat Developer·24w

    3 MCP servers you should be using (safely)

    Model Context Protocol (MCP) enables AI models to interact with developer tools and services through standardized servers. Three essential MCP servers are highlighted: Kubernetes for cluster management and diagnostics, Context7 for accessing up-to-date technical documentation, and GitHub for repository interactions. Each server requires careful security configuration, including read-only defaults, human approval for write operations, and minimal access tokens to prevent data exfiltration through prompt injection attacks.

  6. 6
    Article
    Avatar of cncfCNCF·24w

    Announcing Vitess 23.0.0

    Vitess 23.0.0 introduces MySQL 8.4.6 as the default version, enhanced observability with new metrics for transaction routing and recovery tracking, and improved operational tooling for VTOrc. The release removes deprecated metrics and APIs, strengthens topology management with better Consul authentication requirements, and includes critical upgrade instructions for Operator users migrating from MySQL 8.0 to 8.4. Key improvements focus on production reliability, monitoring precision, and simplified deployment workflows for horizontally scaled MySQL workloads.

  7. 7
    Video
    Avatar of codeheadCodeHead·21w

    The Docker Alternative Most People DON'T KNOW

    Podman is a Red Hat-backed container engine that offers a daemonless, rootless alternative to Docker while maintaining command compatibility. It provides native pod support similar to Kubernetes, can export pod definitions directly to Kubernetes YAML, and eliminates security risks associated with Docker's root-privileged daemon. The tool runs OCI-compliant containers, offers near drop-in replacement for Docker commands, and allows developers to build custom extensions while maintaining a smaller attack surface through its rootless architecture.

  8. 8
    Article
    Avatar of giantswarmGiant Swarm·23w

    Infrastructure for AI is finally getting a standard

    The CNCF launched the Kubernetes AI Conformance Program at KubeCon North America, establishing the first standardized baseline for running AI/ML workloads on Kubernetes. Giant Swarm became one of the first platforms to receive certification, addressing the fragmentation in AI infrastructure that has plagued organizations as they move from experimental models to production. The standard defines consistent capabilities, APIs, and configurations needed for reliable AI/ML workloads, with research showing 82% of organizations building custom AI solutions and 58% using Kubernetes. The certification provides teams with confidence in their infrastructure choices, backed by major industry players like Bloomberg, Zalando, OpenAI, NVIDIA, and Apple already using Kubernetes-based platforms for AI workloads.

  9. 9
    Article
    Avatar of cncfCNCF·22w

    Kgateway v2.1 is released!

    Kgateway v2.1 introduces agentgateway integration for AI connectivity with LLMs and AI agents, full conformance with Kubernetes Gateway API 1.3.0, and global policy attachment capabilities. The release adds horizontal pod autoscaling, dynamic forward proxy support, enhanced session affinity options, and improved retry/timeout mechanisms. Notable additions include passive health checks with outlier detection and a new operations dashboard for Grafana. The Envoy-based AI Gateway is being deprecated in favor of agentgateway, which provides native support for AI workloads including MCP tools and inference tasks.

  10. 10
    Article
    Avatar of lobstersLobsters·24w

    A prison of my own making

    A developer reflects on how adopting best practices like GitOps, immutable infrastructure, Kubernetes, and declarative systems turned their homelab from a relaxing hobby into an overwhelming burden. They realized that enterprise-grade tooling (NixOS, Fedora Silverblue, CI/CD pipelines) made simple tasks impossibly complex for a solo project. The author shares their decision to simplify by abandoning immutable distros, reducing automation, accepting stateful backups, and prioritizing ease of use over architectural purity.

  11. 11
    Article
    Avatar of mondaymonday Engineering·20w

    ArgoCD diffs at scale

    Monday.com built a custom diffing tool to review GitOps changes at scale. The solution renders Helm manifests on-the-fly during pull requests, comparing target and head branches to generate diff artifacts displayed in a dedicated UI. This approach addresses challenges with hierarchical configuration overlays—large blast radius, difficult-to-understand merged results, and onboarding complexity—without migrating to rendered manifests. The tool uses real Kubernetes cluster capabilities, supports local overrides for testing, and provides grouping/filtering features for managing hundreds of resource changes across multiple clusters and environments.

  12. 12
    Article
    Avatar of istioIstio·24w

    Announcing Istio 1.28.0

    Istio 1.28.0 introduces Gateway API Inference Extension support with InferencePool v1 for managing AI inference workloads, enhanced ambient multicluster capabilities with waypoint routing across remote networks, and native nftables support in ambient mode. The release promotes dual-stack networking to beta, adds security improvements including enhanced JWT authentication with custom claims and NetworkPolicy support for istiod, and provides full Gateway API v1.4 compatibility with BackendTLSPolicy v1. Additional enhancements include ServiceEntry wildcard host support with DYNAMIC_DNS resolution, persona-based installations with resourceScope options, and improved telemetry with dual B3/W3C header propagation.

  13. 13
    Article
    Avatar of faunFaun·24w

    1/3 Hands on Kubernetes with Minikube

    Minikube provides a single-node Kubernetes cluster for local development and learning. This guide walks through installing Minikube, understanding its architecture including control plane components (API server, etcd, controller manager, scheduler, CoreDNS) and worker node components (kubelet, kube-proxy), exploring default services and namespaces, and accessing the Kubernetes Dashboard. All components run on one node while maintaining the same architecture as production Kubernetes clusters.

  14. 14
    Article
    Avatar of vllmvLLM·22w

    Signal-Decision Driven Architecture: Reshaping Semantic Routing at Scale

    vLLM introduces Signal-Decision Architecture, a new approach to semantic routing that replaces fixed classification-based routing with multi-dimensional signal extraction. The architecture combines keyword, embedding, and domain signals with flexible AND/OR logic to enable unlimited routing decisions. It includes built-in plugins for caching, security, and compliance, and uses Kubernetes CRDs for cloud-native deployment. This enables enterprises to scale from 14 fixed categories to hundreds of specialized routing rules with priority-based selection and plugin orchestration.

  15. 15
    Article
    Avatar of jetbrainsJetBrains·23w

    The Go Ecosystem in 2025: Key Trends in Frameworks, Tools, and Developer Practices

    Analysis of Go ecosystem trends in 2025 based on JetBrains Developer Ecosystem Survey reveals 2.2 million developers use Go as their primary language. Gin leads web frameworks at 48% adoption, while chi and Fiber gain ground as gorilla/mux declines. The standard library remains dominant, with testify and gomock supplementing testing capabilities. GoLand holds 47% IDE market share, while AI coding assistants see 70% adoption among Go developers. Popular libraries include log/slog for logging, pgx for PostgreSQL, cobra for CLI tools, and golangci-lint for static analysis. The ecosystem shows maturity with strong focus on backend services, infrastructure tooling, and Kubernetes development.

  16. 16
    Article
    Avatar of cncfCNCF·23w

    Lima becomes a CNCF incubating project

    Lima, a tool for running Linux virtual machines optimized for containers in local development environments, has been promoted to CNCF incubating status. Originally created in 2021 as a containerd demonstration tool for Mac users, Lima now supports multiple container engines (containerd, Docker, Podman, Kubernetes) and has expanded to include AI agent sandboxing use cases. The project has grown significantly since joining CNCF as a sandbox project in 2022, doubling its GitHub stars to 18,200+ and gaining adoption by tools like Colima, Rancher Desktop, and AWS Finch. Version 2.0 introduces a plug-in system for VM drivers, GPU acceleration support, and Model Context Protocol server capabilities.

  17. 17
    Article
    Avatar of cncfCNCF·23w

    OpenFGA Becomes a CNCF Incubating Project

    OpenFGA, an authorization engine based on Google's Zanzibar that implements Relationship-Based Access Control (ReBAC), has been promoted to CNCF incubating status. The project centralizes authorization logic through an API-first approach, making it easier to implement complex access control at scale. Since joining CNCF as a sandbox project in 2022, OpenFGA has gained 37 production adopters, expanded to multiple SDKs (Python, Java, Go, .NET, JS), added maintainers from Grafana Labs and GitPod, and integrated with CNCF projects like OpenTelemetry, Helm, and Prometheus. The project has accumulated 4,300+ GitHub stars and 96 contributors, with future plans including new SDKs for Ruby, Rust, and PHP, AuthZen standard support, and performance improvements.

  18. 18
    Article
    Avatar of itsfossIt's Foss·23w

    22 Linux Books for $25: This Humble Bundle Is Absurdly Good Value

    Humble Bundle is offering 22 Linux technical books from Apress and Springer for $25, covering topics from beginner system administration to advanced Kubernetes orchestration, ARM64 debugging, and embedded Linux development. The collection includes the complete 'Zero to SysAdmin' trilogy, multiple Kubernetes guides, systemd administration, assembly language programming, and certification study materials. All books are DRM-free in PDF and ePub formats, with proceeds supporting Room to Read's literacy programs. The deal expires November 24, 2025.

  19. 19
    Article
    Avatar of opentelemetryOpenTelemetry·24w

    OpenTelemetry eBPF Instrumentation Marks the First Release

    OpenTelemetry eBPF Instrumentation (OBI) has reached its first alpha release after being donated by Grafana Labs. OBI provides zero-code, automatic instrumentation for applications across all programming languages by operating at the protocol level using eBPF technology. It captures metrics and traces without requiring code changes, restarts, or performance impact, supporting protocols like HTTP/HTTPS, gRPC, SQL, Redis, MongoDB, and Kafka. While excellent for getting started with observability and instrumenting compiled binaries, it works best when combined with traditional OpenTelemetry SDKs, particularly for complex distributed tracing scenarios in certain languages and frameworks.

  20. 20
    Article
    Avatar of ubuntuUbuntu·23w

    Canonical Kubernetes officially included in Sylva 1.5

    Canonical Kubernetes has been officially integrated into Sylva 1.5, a European telecommunications cloud-native framework backed by major operators like Nokia and Ericsson. The distribution offers up to 12 years of long-term support and is designed for mission-critical telco workloads including 5G core, O-RAN, and edge services. Sylva 1.5 becomes the first release to include Kubernetes 1.32, enabling validated deployment of cloud-native and virtualized network functions across telco infrastructure with guaranteed interoperability and performance.

  21. 21
    Article
    Avatar of flipkartFlipkart Tech·23w

    When Good Locks Go Bad: Diagnosing a System Meltdown Under Load

    Engineers at Flipkart diagnosed a critical system failure during load testing for their Big Billion Days sale. Their Mirana service crashed under load due to excessive contention on a Redis distributed lock. Initial solutions using queuing failed because they violated the 'fail fast' principle. The team ultimately solved the problem by implementing an AtomicInteger-based semaphore to limit concurrent threads attempting lock acquisition. The key insight was optimizing for actual service performance (200-300ms per request) rather than downstream resource limits, reducing allowed concurrency from 128 to 5 threads per pod and achieving stable throughput of 90 QPS across 9 pods.