Best of KubernetesApril 2026

  1. 1
    Video
    Avatar of techworldwithnanaTechWorld with Nana·5w

    STOP Learning Kubernetes (Do This First)

    Most DevOps jobs don't require deep Kubernetes expertise — it's often listed as a buzzword in job descriptions. The recommended learning path is: cloud fundamentals first, then infrastructure as code, then Docker container basics, and only then Kubernetes fundamentals (pods, deployments, services). Deep Kubernetes knowledge (operators, CRDs, cluster architecture) is only needed for specialized roles like Kubernetes administrator or platform engineer. The post ends with a promotion for free orientation calls to help engineers structure their DevOps learning path.

  2. 2
    Article
    Avatar of lobstersLobsters·4w

    GitHub - xataio/xata: Open source, cloud native, Postgres platform with copy-on-write branching and scale-to-zero

    Xata has open-sourced its cloud-native Postgres platform, previously powering its managed cloud service. Built on Kubernetes using CloudNativePG and OpenEBS, it offers copy-on-write branching (enabling TB-scale Postgres copies in seconds), scale-to-zero compute, auto-scaling, high availability, PITR backups, and a serverless SQL driver over HTTP/WebSockets. Primary use cases are internal Postgres-as-a-Service platforms and ephemeral dev/preview/test environments. The platform requires a Kubernetes cluster and is not recommended for single-instance deployments. Licensed under Apache 2.0.

  3. 3
    Article
    Avatar of grafanaGrafana Labs·4w

    Kubernetes Monitoring Helm chart v4: Biggest update ever!

    Grafana's Kubernetes Monitoring Helm chart v4 is a major overhaul addressing real pain points from v3. Key changes include: converting destinations and collectors from lists to maps (enabling proper multi-file merging and named overrides), replacing hard-coded collector names with user-defined collectors using composable presets, making telemetry service deployments explicit to avoid surprise duplicates, splitting the overloaded clusterMetrics feature into three focused features, separating pod log collection methods into distinct features with native OTLP support, replacing the bulk labelsToKeep approach with explicit opt-in label declarations (reducing memory usage), and allowing granular control over individual profiler types. A migration tool is available to convert v3 values files to v4 format automatically.

  4. 4
    Article
    Avatar of yelpYelp Engineering·4w

    Zero downtime Upgrade: Yelp’s Cassandra 4.x Upgrade Story

    Yelp's Database Reliability Engineering team upgraded over a thousand Cassandra nodes from version 3.11 to 4.1 with zero downtime. The post covers their upgrade strategy including benchmarking (4% p99 latency improvement, 11% throughput gain, up to 58% p99 reduction on key clusters), compatibility challenges with Stargate proxy and Cassandra Source Connector, and a three-stage automated upgrade process (pre-flight, flight, post-flight). Key lessons include a Stargate 2.x regression causing slower range queries (resolved by downgrading to 1.x), schema disagreement on CDC-enabled clusters, and the value of running version-specific components in parallel during the transition. The upgrade was performed in-place via rolling restart on Kubernetes, avoiding the cost and complexity of a separate DC approach.

  5. 5
    Article
    Avatar of faunFaun·4w

    Kubernetes Is Not DevOps : A Short Story

    A hiring manager shares an interview experience where a candidate knew Kubernetes commands but couldn't explain what happens internally when running kubectl apply. The story illustrates a broader industry trend: engineers learning tools without understanding the underlying systems. True DevOps expertise goes beyond tool familiarity — it requires understanding infrastructure provisioning, distributed systems, automation principles, and reliability design. Kubernetes is just one tool; the fundamentals of systems thinking and automation will outlast any specific technology.

  6. 6
    Article
    Avatar of istioIstio·4w

    Announcing Istio 1.28.6

    Istio 1.28.6 is a patch release focused on security fixes and bug corrections. Key additions include Helm v4 server-side apply support, authorized namespace configuration for debug endpoints, and CIDR blocking for JWKS URIs during JWT validation. Notable fixes address a webhook failurePolicy field ownership conflict during helm upgrade, serviceAccount regex matching in AuthorizationPolicy, Gateway API CORS origin parsing, istiod crash with ambient mode and multi-network configs, a retryBudget default percent bug (0.2% instead of 20%), missing size limits on gzip-decompressed WASM binaries, and a race condition causing h2 ping errors.

  7. 7
    Article
    Avatar of infoworldInfoWorld·5w

    Bringing databases and Kubernetes together

    Running databases on Kubernetes is increasingly common, but Day 2 operational challenges like backup, failover, and resilience remain difficult. Cloud DBaaS solves these problems but creates vendor lock-in. Kubernetes Operators can provide equivalent functionality without lock-in, but doing so consistently across all databases is hard. Percona's Everest project has been donated to the CNCF and rebranded as OpenEverest — a fully open source platform for automating database provisioning and management on any Kubernetes infrastructure, with community-driven support for additional databases over time.

  8. 8
    Article
    Avatar of metalbearMetalBear·4w

    What Happens After AI Writes the Code

    AI coding agents introduce 'assumption bugs' — errors based on unverified premises about the real environment (queue message formats, env var names, DB schema) that only surface when code runs against actual dependencies. The typical workflow means these bugs are caught hours later in CI. A 'remocal development' approach, using a tool like mirrord, lets the locally running service connect to a real remote Kubernetes staging cluster in real time, so AI agents can observe failures immediately after generating code, fix them, and push with confidence. mirrord handles shared staging safety via HTTP traffic filtering, database branching, and queue splitting.

  9. 9
    Article
    Avatar of opentelemetryOpenTelemetry·3w

    How Skyscanner scales OpenTelemetry: managing collectors across 24 production clusters

    Skyscanner's platform engineering team shares how they manage OpenTelemetry Collector deployments across 24 production Kubernetes clusters running 1,000+ microservices. Key architectural decisions include a centralized DNS endpoint with Istio-based routing, two collector patterns (Gateway ReplicaSet and Agent DaemonSet), and generating platform-level HTTP/gRPC metrics from Istio service mesh spans using the span metrics connector — eliminating the need for application-level instrumentation. The Java-heavy environment uses a shared base Docker image with the OTel Java agent pre-configured, with all instrumentations disabled by default and only a curated set enabled. SDK-generated HTTP/RPC metrics are dropped in favor of lower-cardinality Istio-derived metrics. Rollouts follow a progressive promotion strategy across dev, alpha, beta, and production cluster tiers using Argo CD. Practical advice includes starting simple, adding memory limiters from day one, and using filter processors early to handle false-positive error statuses.

  10. 10
    Article
    Avatar of atomicobjectAtomic Spin·4w

    K3s: A Better Way to Deploy a Docker App to a Linux Server

    A step-by-step guide to deploying a Dockerized web app on a single Linux server using K3s (lightweight Kubernetes), Helm, Zot (private OCI registry), and CloudNativePG. The guide walks through installing a single-node K3s cluster, setting up a private container registry with Zot, deploying a managed PostgreSQL instance via the CloudNativePG operator, and packaging the app as a Helm chart with automatic database wiring. The full stack runs on 2–4GB RAM and provides a scalable, standards-based alternative to ad-hoc deployment methods like Docker Compose or systemd services.

  11. 11
    Article
    Avatar of devopsdailyDevOps Daily·4w

    The platform engineering skill most DevOps engineers undervalue: separating concerns at the plane level

    Platform teams often start with a single cluster handling everything, but this creates hidden coupling between control logic, workloads, CI pipelines, and observability. The solution is separating concerns into independently deployable planes: a control plane for orchestration, a data plane for workloads, a workflow plane for CI/CD, and an observability plane for telemetry. This multi-plane architecture gives each component a clear operational boundary, enables independent scaling, simplifies incident response, and allows the platform to grow from a single cluster to a multi-cloud fleet without a full rewrite. The author draws on experience contributing to OpenChoreo, a CNCF open source project built on this architecture.

  12. 12
    Article
    Avatar of opensourcesquadOpen Source·2w

    OpenChoreo: The Open-Source Developer Platform for Kubernetes

    OpenChoreo is a CNCF sandbox open-source Internal Developer Platform (IDP) built on Kubernetes. It features a modular multi-plane architecture separating control, data, build, and observability concerns. Platform engineers can define secure golden paths via programmable Kubernetes-native abstractions, while developers get a Backstage-powered portal, declarative GitOps, and built-in SRE and FinOps agents. The project is community-driven and invites contributions, feature requests, and early adoption.

  13. 13
    Article
    Avatar of cloudnativenowCloud Native Now·3w

    Kubernetes v1.36 Promotes Stability, Compatibility & Reproducibility

    kubernetes v1.36 ships 71 enhancements across stable, beta, and alpha tiers. Key highlights include: fine-grained kubelet API authorization reaching GA for least-privilege node security; Resource Health Status expanding to Dynamic Resource allocation (DRA) for hardware health reporting; new alpha workload-aware scheduling (WAS) with gang scheduling and topology-aware policies to reduce reliance on third-party schedulers for AI/ML workloads; Volume Group snapshots graduating to GA for crash-consistent multi-volume backups; CSI service account token secret redaction reaching stable to prevent token leakage; and external service account token signing graduating to stable for integration with external key management systems.

  14. 14
    Article
    Avatar of gitlabGitLab·3w

    A guide to the breaking changes in GitLab 19.0

    GitLab 19.0 (releasing May 21, 2026 for self-managed) includes 15 breaking changes, down from 80 in 17.0. High-impact changes include: replacing NGINX Ingress with Gateway API/Envoy Gateway in the Helm chart, removing bundled PostgreSQL/Redis/MinIO from the Helm chart, dropping OAuth ROPC grant support, and requiring PostgreSQL 17 as the minimum version. Medium-impact changes cover dropping Ubuntu 20.04 and SUSE Linux package support, removing Redis 6 support, updating the Auto DevOps builder image, and removing bundled Mattermost. Lower-impact changes include removing Spamcheck, Slack slash commands, legacy container registry storage drivers, and deprecated GraphQL/REST API attributes. Migration guides and deployment windows for GitLab.com, Self-Managed, and Dedicated are provided for each change.

  15. 15
    Article
    Avatar of metalbearMetalBear·3w

    New Features We Find Exciting in the Kubernetes 1.36 Release

    Kubernetes v1.36 'Haru' brings several notable changes across stability tiers. Mutating Admission Policies graduate to stable, offering a declarative CEL-based in-process alternative to mutating webhooks for common operations like sidecar injection. User Namespaces also reach stable, mapping container UIDs to unprivileged host UIDs to limit container escape impact. Dynamic Resource Allocation (DRA) gains a prioritized fallback scheduling mechanism via a new `firstAvailable` field in ResourceClaims, device taints and tolerations move to beta for health signaling on degraded hardware, and a new alpha `ResourcePoolStatusRequest` API provides visibility into device availability. Additionally, a new `unusedSince` field on PersistentVolumeClaimStatus helps identify idle PVCs consuming storage.

  16. 16
    Article
    Avatar of cncfCNCF·3w

    From public static void main to Golden Kubestronaut: The Art of unlearning

    A Java developer's decade-long journey from monolithic enterprise development to achieving the CNCF Golden Kubestronaut designation. The author reflects on the mental shifts required: abandoning monolith instincts, embracing horizontal scaling over vertical scaling, and accepting that reliability must be designed rather than hoped for. Key advice includes starting with conceptual foundations (KCNA) before memorizing kubectl commands, deliberately breaking things in safe environments, and engaging with the CNCF community. The post also touches on the emerging shift from automated ops to agentic ops, where engineers define goals and constraints for self-governing systems rather than fixing failures reactively.

  17. 17
    Article
    Avatar of kubeflowKubeflow·5w

    Modernizing Kubeflow Pipelines UI

    The Kubeflow Pipelines UI has been upgraded from React 16 to React 19 across 20+ pull requests, modernizing the entire frontend stack. Key changes include replacing Create React App with Vite, Jest/Enzyme with Vitest/Testing Library, Material-UI v3 with MUI v5, react-vis with Recharts, and react-flow-renderer with @xyflow/react. The migration followed a deps-first, bump-last strategy to minimize risk, resulting in zero breaking changes, zero bundle size increase, and improved performance, accessibility, and security posture.

  18. 18
    Article
    Avatar of giantswarmGiant Swarm·6w

    Live migrating hundreds of Kubernetes clusters to Cluster API

    Giant Swarm replaced their custom-built Kubernetes cluster management system with Cluster API (CAPA), live-migrating hundreds of enterprise AWS production clusters without downtime or data loss. The post details the technical mechanics: a CLI-based migration tool, a two-phase process covering CR migration and node transition, and a creative workaround involving forking HashiCorp Vault to extract root CA signing keys for certificate continuity. Key lessons include the importance of aligning team structure with stated priorities, avoiding premature expansion to new providers before completing the core migration, and the strategic value of adopting upstream open source when a custom solution is no longer differentiating. The migration took years, required company-wide involvement, and ultimately freed engineering capacity for higher-value work.