Best of DevOps — January 2026

1
Video
CodeHead·21w
The Best Way To Learn DevOps in 2026
Learning DevOps effectively means understanding it as a mindset for reliably moving code to production, not just a collection of tools. Start with Linux fundamentals (processes, networking, system commands) before diving into containers and orchestration. Follow a single application through its entire lifecycle: write it, containerize it, deploy it, break it intentionally, and observe what happens. Implement CI/CD pipelines for consistency, learn cloud infrastructure as code with tools like Terraform, and master observability through logs, metrics, and traces. The key is choosing the simplest architecture that works and only adding complexity when it solves real problems, not to pad a resume.
139
3
2
Article
freeCodeCamp·19w
Build Your Own Kubernetes Operators with Go and Kubebuilder
A comprehensive 6-hour video course teaches how to build custom Kubernetes operators and controllers from scratch using Go and Kubebuilder. The course covers controller theory, Kubernetes extensibility, environment setup, API and logic building, hands-on development, and advanced internals including Informers, Caches, Finalizers, and Idempotency. A practical example demonstrates managing AWS EC2 instances directly from Kubernetes, treating Kubernetes as an SDK rather than just a deployment platform.
109
3
Article
Data Engineering Blog·20w
A Diary of a Data Engineer
Data engineering has evolved through multiple epochs from the 1970s to today, but the core loop remains unchanged: ingest, model, transform, serve, break, rebuild. Despite shifting from SSIS and star schemas to dbt and Iceberg, data engineers still solve the same fundamental problems with different tools. The role requires understanding business logic, data modeling fundamentals, and DevOps principles while accepting the paradox of being invisible when things work but scrutinized when they break. Success comes from mastering timeless fundamentals like SQL and dimensional modeling rather than chasing every new framework, talking to business stakeholders to understand why data matters, and building reliable foundations that enable better decision-making.
95
1
4
Article
swizec.com·18w
The future of software engineering is SRE
As AI makes writing code easier, the real value in software engineering shifts to operational excellence and reliability. Building a working demo is straightforward, but running a service reliably at scale over time requires true engineering expertise. The hard challenges include maintaining uptime, handling failures gracefully, managing security, coordinating distributed teams, and ensuring systems work consistently for years. Site Reliability Engineering (SRE) skills become increasingly critical as the industry moves beyond throwaway demos to services people can trust and depend on.
91
2
5
Article
InfoQ·17w
OpenEverest: Open Source Platform for Database Automation
Percona announced OpenEverest, an open-source platform for automated database provisioning and management on Kubernetes. Built on Kubernetes operators, it supports MySQL, PostgreSQL, and MongoDB, offering features like automated backups, scaling, and disaster recovery while avoiding vendor lock-in. The platform provides both a web UI and REST API for managing database clusters. Originally launched as Percona Everest, it's transitioning to independent open governance with plans to donate to the CNCF. The latest version adds PostgreSQL 18.1 support and NodePort networking, with future plans to support ClickHouse, Vitess, and observability integrations.
89
1
6
Article
Zed·20w
Run Your Project in a Dev Container, in Zed — Zed's Blog
Zed v0.218 introduces Dev Container support, allowing developers to work inside Docker-based development environments directly from the editor. Dev Containers solve environment inconsistency problems by defining infrastructure as code in a devcontainer.json file, eliminating manual setup and outdated documentation. Zed implements this by leveraging its existing remote development architecture, running a remote server inside the container that communicates with the local UI. Currently uses the devcontainer CLI reference implementation with plans to add custom Zed extensions, forwardPorts support, and built-in spec definition tools.
85
7
7
Article
Planet Erlang·21w
Software Acceleration and Desynchronization
Software development acceleration creates desynchronization across interconnected work loops. When teams speed up individual tasks like code writing, they risk decoupling from slower but essential feedback cycles around operations, architecture, and organizational knowledge. This desynchronization accumulates as drift between mental models and reality, potentially leading to incidents that force rapid resynchronization. Strategic slowdowns in certain areas can actually accelerate overall system performance by maintaining necessary synchronization points. The drive for continuous acceleration is a self-reinforcing temporal structure that shapes how software organizations function, requiring careful analysis of which loops to speed up and which provide essential stability.
83
8
Video
The Coding Gopher·20w
Docker just got some massive upgrades
Docker released the Docker MCP toolkit, a production-grade implementation of Anthropic's Model Context Protocol that containerizes AI agent capabilities. The system uses three core components: a curated catalog of versioned MCP server images, a gateway that acts as a dynamic proxy managing container lifecycle and routing, and a toolkit for credential management and permissions. This architecture isolates agent tools in containers, providing reproducibility, security through policy enforcement, and composability by allowing multiple MCP servers to run side-by-side without dependency conflicts.
80
2
9
Video
TechWorld with Nana·19w
If I would start DevOps from 0 - How would I start and what would I learn
A structured learning path for DevOps beginners breaks down into six phases over several months. Start with Linux fundamentals, bash scripting, and git (1-2 months). Move to cloud basics focusing on AWS compute, storage, and networking (1-2 months). Learn infrastructure as code with Terraform (1 month). Master containerization with Docker and Kubernetes (1-2 months). Build CI/CD pipelines with Jenkins, GitHub Actions, or GitLab CI (1-2 months). Finally, cover observability with Prometheus and Grafana (1 month). The key mistake to avoid is learning tools in isolation—instead, combine technologies through hands-on projects that build on each other continuously rather than starting from scratch each time.
72
4
10
Article
Planet Python·18w
The missing 66% of your skillset
Senior developers need more than just programming language expertise. The ecosystem around Python—including dependency management (uv), Git workflows, testing (pytest), quality control (Ruff, type checkers), CI/CD automation (GitHub Actions), deployment (Docker, Cloud), and CLI proficiency (Makefiles)—comprises roughly two-thirds of professional skillset. Mastering these tools differentiates engineers from scripters and helps developers escape tutorial hell.
70
1
11
Video
Kevin Fang·20w
Dev Picks the Wrong Database, Takes Down Company
Element Creations experienced a 24-hour outage of the matrix.org home server after an engineer accidentally deleted the production database while attempting to restore a failed server. The incident began with a hardware failure requiring database migration, but confusion over which server was primary led to running a destructive command on the wrong machine. Recovery took over a day due to slow backup restoration (51TB), a bug in their backup tool that wasn't patched in production, and slow write-ahead log replay. The postmortem emphasizes faster backup restoration strategies, including local snapshots using copy-on-write filesystems like ZFS, and highlights how operational errors during high-pressure situations are nearly inevitable.
59
5
12
Article
roadmap.sh·18w
MLOps Roadmap has been updated!
The roadmap.sh MLOps roadmap has been updated for 2026, providing a step-by-step guide for learning and mastering MLOps practices. The updated resource offers a structured learning path for those looking to develop skills in machine learning operations.
61
13
Article
selfh.st·18w
Self-Host Weekly #155: One Hundred Million
This weekly newsletter covers Docker management tools, highlighting Dockhand's growing popularity despite initial skepticism. The selfh.st icons project reached 100 million monthly requests. Featured content includes Scanopy for network visualization, a new comic format (libbbf), Snikket's Android redesign, and Raspberry Pi's flash drive. Multiple video tutorials cover Docker management, file sharing, and VPN alternatives.
50
2
14
Article
AWS Fundamentals·18w
The Only Claude Skill Every DevOps Engineer Needs
The Terraform Claude Skill by Anton Babenko transforms Claude AI into a senior DevOps architect that generates production-ready infrastructure code. Unlike generic AI responses that create technical debt through monolithic files, insecure IAM policies, and poor structure, this skill enforces a four-pillar framework: strict engineering loops, modularity guardrails, expert-level Terraform knowledge, and integrated tooling (tflint, tfsec, infracost). Installation involves cloning the skill into Claude's directory, enabling it to produce modular, secure, cost-aware infrastructure with proper testing strategies and CI/CD pipelines that follow HashiCorp best practices.
51
1
15
Article
PostgreSQL·20w
Introducing pgpm: A Package Manager for Modular PostgreSQL
pgpm is a new package manager for PostgreSQL that enables developers to share and reuse application-level database logic (schemas, tables, functions, policies, triggers) as modular, versioned packages. Unlike traditional PostgreSQL extensions that operate at the system level, pgpm works at the application layer using pure SQL, requiring no superuser access or compilation. It organizes code into workspaces with explicit dependency management, automatic resolution, and deterministic deployment order. The tool supports test-driven development with ephemeral databases and CI/CD integration, drawing inspiration from Sqitch while adding recursive composition and modular packaging capabilities.
49
1
16
Article
Spacelift·21w
Top 13 Open-Source Automation Tools for 2025
A curated list of 13 open-source automation tools for DevOps and infrastructure teams in 2025, covering infrastructure as code (Spacelift Intent, OpenTofu, Pulumi), configuration management (Ansible, Puppet, Chef, Salt, CFEngine, Rudder), CI/CD (Jenkins), GitOps (Argo CD), monitoring (Prometheus), and workflow orchestration (Apache Airflow). Each tool is described with key features, licensing, and use cases, along with a comparison table highlighting execution models, configuration approaches, and strengths to help teams choose the right tool for their automation needs.
48
17
Article
Datadog·17w
Debug PostgreSQL query latency faster with EXPLAIN ANALYZE in Datadog Database Monitoring
Datadog Database Monitoring now automatically collects PostgreSQL EXPLAIN ANALYZE execution plans to help troubleshoot slow queries. The feature processes plans captured by PostgreSQL's auto_explain extension, correlates them with APM traces, and provides interactive visualizations. Key use cases include identifying incorrect row estimates that cause inefficient join strategies, and analyzing cache hits versus disk reads to determine whether performance issues stem from I/O bottlenecks or query optimization needs.
40
18
Article
Cyber Security·21w
Free e-book: Intro to Bash Scripting for Developers
A free e-book resource available on GitHub that introduces Bash scripting fundamentals for developers. The book covers basic shell scripting concepts and is aimed at those looking to learn command-line automation and scripting skills.
40
12
19
Video
Mental Outlaw·21w
CachyOS Is Coming For Your Server
CachyOS, an Arch-based Linux distribution known for desktop performance optimizations, is developing a server edition optimized for web servers and databases. While rolling release distros like CachyOS offer performance benefits through compiler optimizations and the latest software, they pose risks for servers due to potential update breakage and security concerns from community repositories. The viability depends on whether performance gains justify the stability trade-offs compared to traditional server distros like Ubuntu and Debian, similar to how Netflix customized FreeBSD for CDN performance improvements.
38
2
20
Article
PostgreSQL·19w
PostgreSQL: pgmoneta 0.20
pgmoneta 0.20.0 has been released with several improvements including redesigned locking for backup repositories, enhanced S3 support, and Grafana 12 compatibility. New features include a configuration file for pgmoneta-cli, interactive mode for pgmoneta-walinfo, WAL stream filtering support via pgmoneta-walfilter, and an initial Model Context Protocol server. pgmoneta is an open-source backup and restore solution for PostgreSQL 14+ that supports full and incremental backups, multiple compression formats, encryption, WAL shipping, and remote management.
35
21
Article
OctopusDeploy·20w
What's new in Argo CD 3.2?
Argo CD v3.2 GA release introduces significant improvements across multiple areas: UI enhancements including hydration status on app tiles and sortable columns, ApplicationSet controller performance upgrades with better concurrency and error reporting, new health checks for DatadogMetric resources, server-side diff support for more accurate resource comparison, and Hydrator upgrades with custom commit messages and automatic .gitattributes generation. These changes improve usability, observability, and reliability for teams running GitOps at scale.
34
22
Article
Salesforce Engineering·17w
Automating Global Rollback for 1.5 Trillion Requests in 10 Minutes
Salesforce Edge team reduced global rollback time from 8-12 hours to 10 minutes by implementing a blue-green deployment architecture on Kubernetes. The solution maintains two fully scaled deployments simultaneously, with custom autoscaling logic that evaluates CPU across both fleets to ensure capacity parity. Traffic cutover is automated through service label updates combined with explicit TCP connection draining via mutual TLS, enabling rapid recovery while preserving four-nines availability for a platform handling 1.5 trillion monthly requests and 23 petabytes of traffic across 21+ global points of presence.
30
1
23
Article
Socket·19w
Rust Support in Socket Is Now Generally Available
Socket has promoted Rust and Cargo support from Beta to General Availability after months of validation. The platform now provides dependency analysis, SBOM generation, and supply chain visibility for Rust projects. During Beta, Socket analyzed thousands of Rust projects and published research on supply chain threats including typosquatting, malicious build scripts, and credential harvesting. The service helps teams identify risks beyond memory safety, focusing on deception, hidden execution paths, and malicious dependencies before they reach production.
25
24
Article
Barion·19w
i think i'm cooked
24
4
25
Article
BigData Boutique blog·18w
OpenSearch Kubernetes Operator 3.0 - Stability and Resilience Finally Delivered
OpenSearch Kubernetes Operator 3.0 Alpha introduces major stability improvements including quorum-safe rolling restarts, multi-namespace support, TLS certificate hot reloading, and gRPC API support. The release addresses critical production issues like upgrade deadlocks, split-brain scenarios, and cluster instability through over 100 changes. Key features include SmartScaler enabled by default, init-container and sidecar support, NFS volumes, and OpenSearch 3.0 compatibility. The API is migrating from opensearch.opster.io to opensearch.org with automatic migration handling. Breaking changes include new security defaults and enabled validation webhooks. The alpha is recommended for testing in lower environments first, with GA release planned after beta testing.
22

See all DevOps archives