Best of Prometheus — 2025

1
Article
freeCodeCamp·47w
Top Application Monitoring Tools for Developers
Application Performance Monitoring (APM) tools help developers detect issues before users report them. Five key tools are compared: New Relic offers comprehensive full-stack observability with real-time metrics and traces; Datadog excels in cloud-native environments with seamless integrations and powerful alerting; Prometheus + Grafana provides open-source flexibility with custom dashboards and PromQL querying; Sentry specializes in error tracking with detailed stack traces and breadcrumbs; PostHog combines product analytics with session recording and feature flags. For small teams, start with Sentry for errors and Prometheus for metrics, then consider unified solutions like Datadog or New Relic as you scale.
92
1
2
Article
Docker·51w
Learn how to make an AI chatbot from scratch
Docker Model Runner simplifies AI chatbot development by integrating LLM execution into familiar Docker workflows. The tutorial demonstrates building a production-ready chatbot with React frontend, Go backend, and comprehensive observability using Prometheus, Grafana, and Jaeger. Key benefits include local model execution for privacy and cost control, streaming responses, real-time metrics collection, and simplified deployment through Docker Compose. The architecture treats AI models as first-class services, eliminating complex setup while providing detailed performance insights including tokens per second, memory usage, and response latency.
88
3
Article
Community Picks·1y
Aptakube · Kubernetes GUI
Aptakube is a modern, lightweight Kubernetes GUI that enables users to view, compare, and manage workloads across multiple clusters simultaneously. It provides features such as an aggregated log viewer, resource diff, quick actions, port forwarding, and human-friendly resource views. Designed to work seamlessly with existing configurations, Aptakube offers enhanced usability and resource management without requiring additional setup on clusters.
62
4
4
Article
freeCodeCamp·49w
How to Debug CI/CD Pipelines: A Handbook on Troubleshooting with Observability Tools
A comprehensive guide to implementing observability in CI/CD pipelines using free and open-source tools. Covers setting up Grafana Loki and lightweight ELK stacks for log aggregation, creating unified logging strategies with correlation IDs, writing advanced LogQL and KQL queries for troubleshooting, integrating Prometheus metrics with logs, and building Grafana dashboards. Includes practical examples for debugging common pipeline failures like build errors, dependency issues, and flaky tests across GitHub Actions, Jenkins, and GitLab.
61
5
Article
Hacker News·27w
The Grafana trust problem
An experienced engineer shares their journey with Grafana's observability stack, detailing how frequent architectural changes, deprecations, and increasing complexity have eroded trust. Starting with simple Loki/Prometheus setups, they've witnessed rapid product churn—Grafana Agent deprecated within 2-3 years, OnCall discontinued, and Mimir 3.0 now requiring Kafka. The constant restructuring, incompatibilities with Prometheus Operator standards, and career-driven development pace make it difficult to maintain stable monitoring infrastructure. While acknowledging the technical quality of Grafana products, the author questions their long-term viability and considers alternatives like the kube-prometheus-stack with Thanos.
57
4
6
Article
Grafana Labs·29w
Grafana Mimir 3.0 release: performance improvements, a new query engine, and more
Grafana Mimir 3.0 introduces a redesigned architecture that separates read and write operations using Apache Kafka as an asynchronous buffer, eliminating performance bottlenecks between ingestion and queries. The release features the Mimir Query Engine (MQE), which processes queries in a streaming fashion rather than bulk loading, reducing peak memory usage by up to 92%. These improvements deliver 15% lower resource usage in large clusters while maintaining faster query execution and higher reliability. The new ingest storage component ensures query spikes won't slow down data ingestion and vice versa, enabling independent scaling of each path.
47
1
7
Article
Last9·1y
Essential Python Monitoring Techniques You Need to Know
Python is widely used in various applications but requires careful performance monitoring due to its unique characteristics like the Global Interpreter Lock (GIL), dynamic typing, and memory management. Key metrics such as CPU usage, memory, response time, throughput, and error rates are essential for optimal performance. The post provides actionable insights for DevOps engineers and SREs to implement basic and advanced monitoring techniques using libraries like psutil, Prometheus, and OpenTelemetry, along with recommendations on tools for containerized environments.
28
8
Article
Devtron·44w
Setting up Prometheus Stack on Kubernetes
Kubernetes monitoring is essential for maintaining application health in dynamic containerized environments. Prometheus collects and stores time-series metrics while Grafana provides visualization through dashboards. The kube-prometheus-stack offers a complete monitoring solution with service discovery, alerting, and predefined dashboards. Devtron simplifies the setup process by providing integrated monitoring capabilities - users can install Grafana via Stack Manager, deploy Prometheus using Helm charts, and configure endpoints to get real-time application metrics including CPU usage, throughput, and latency directly in the Devtron dashboard.
27
9
Article
DevOps·52w
Build an incident response workflow with n8n + Prometheus
Learn how to build an automated incident response workflow using Prometheus for monitoring, n8n for orchestration, and AWS Lambda for executing custom actions. This setup includes sending alerts using Alertmanager and possible integrations with Discord and PagerDuty.
27
2
10
Article
Grafana Labs·1y
Grafana Drilldown: first-class OpenTelemetry support now available for metrics
Grafana Labs has added first-class support for OpenTelemetry in its Metrics Drilldown tool. This integration allows users to seamlessly explore and filter metrics using OpenTelemetry resource attributes alongside Prometheus labels. The new features include automatic query writing, context-aware filtering, and a consolidated interface, significantly simplifying the process of gaining insights into distributed system performance.
27
11
Article
PostgreSQL·1y
PostgreSQL: pgmoneta 0.16
pgmoneta version 0.16.0 is now available, offering new features such as incremental backup support for PostgreSQL 17+, advanced filtering for pgmoneta-walinfo, Prometheus/HTTPS support, Docker/podman images, and various enhancements and bug fixes. pgmoneta provides comprehensive backup and restore solutions, including full and incremental backups, compression options, AES encryption, symlink support, WAL shipping, hot standby, remote management, offline mode, TLS v1.2+ support, daemon mode, and user vault.
22
12
Article
Grafana Labs·1y
Kubernetes Monitoring: One view for observing all your storage volumes
Grafana Cloud has introduced a new Storage tab in its Kubernetes Monitoring solution, providing users with a single view to track volume usage over time, conduct data forensics, and troubleshoot volume provisioning. The tab includes prebuilt panels for tracking various storage metrics, reducing context switching, and making it easier to spot issues with new alert overlays. The view is available at different levels, from pod to cluster, helping users visualize storage metrics comprehensively and make informed decisions about their storage infrastructure.
20
13
Video
Christian Lempa·46w
Grafana Alloy, NEW log + metric collector replaces everything!
Grafana Alloy is a unified telemetry collector that replaces multiple monitoring tools like Promtail, Loki Docker plugin, and cAdvisor. It centralizes log and metric collection from various sources including Linux systems, Docker containers, and system journals. The tool uses a component-based configuration system where over 120 different components can be chained together to collect, process, and forward telemetry data to destinations like Prometheus and Loki. Key benefits include simplified setup, built-in data filtering and transformation capabilities, and elimination of the need for separate collectors for different data sources.
18
14
Article
DevOps·47w
AI-Agent Decision Engine for Self-Healing Server/VPS
A comprehensive automated workflow system that combines Prometheus monitoring, AI agents, and bash scripts to create self-healing server infrastructure. The system uses a multi-stage decision engine where AI agents analyze system health data and determine appropriate responses - from simple notifications for minor issues to automated remediation commands for critical problems. The workflow integrates with Discord for notifications and includes safety validation mechanisms to prevent dangerous command execution.
18
15
Article
37signals Dev·1y
Monitoring 10 Petabytes of data in Pure Storage
The post discusses moving 10 petabytes of data from AWS S3 to Pure Storage FlashBlade for better observability and capability. It explains the setup of Pure OpenMetrics exporter for monitoring with Prometheus and provides code snippets for configuration. Various alerting mechanisms and their implementations are highlighted to ensure system reliability. Additionally, it mentions integrating with tools like Prometheus and Grafana for seamless cluster management.
16
16
Article
Last9·1y
Top 13 Kafka Monitoring Tools You Should Know
Running Kafka in production requires effective performance monitoring to detect bottlenecks and troubleshoot issues. Key metrics to monitor include broker health, cluster performance, consumer health, producer performance, and ZooKeeper. The post reviews 13 Kafka monitoring tools, including Prometheus & Grafana, Last9, Confluent Control Center, and Datadog, detailing their features, strengths, and user feedback. It concludes by helping you choose the right tool based on your needs.
15
17
Article
Last9·21w
Why High-Cardinality Metrics Break Everything
High-cardinality metrics promise granular per-request, per-user insights but quietly break production systems in four ways: costs become unpredictable and scale with runtime behavior rather than configuration; queries slow down during incidents when speed matters most; engineers lose trust as sparse, short-lived series create flickering dashboards and inconsistent results; and teams over-instrument without intent, creating multiplicative cardinality explosion. The core issue isn't that high-cardinality is wrong, but that most observability systems don't surface their own limits around storage, indexing, query performance, and data ambiguity. Success requires treating high-cardinality metrics like APIs with explicit ownership, guardrails, pre-deployment cardinality estimation, and systems designed for interactive exploration under pressure rather than brute-force scans.
12
1
18
Article
Spacelift·42w
Kubernetes Observability: Pillars, Tools & Best Practices
Kubernetes observability involves collecting metrics, logs, and traces to understand cluster internal state and performance. The three pillars include metrics for quantitative data, logs for timestamped events, and traces for request paths through microservices. Key tools include Metrics-Server for basic monitoring, Kube-Prometheus-Stack for comprehensive metrics and visualization, ELK stack for log management, and OpenTelemetry for distributed tracing. Implementation challenges include managing multiple data types, monitoring dynamic resources, handling large data volumes, and preventing data silos. Best practices emphasize setting up alerts, consistent resource labeling, application instrumentation, selective data collection, and compliance alignment.
12
19
Article
PostgreSQL·21w
pgSCV 0.15.1 released!
pgSCV 0.15.1 is now available, a Prometheus-compatible monitoring agent and metrics exporter for PostgreSQL environments. This release includes new functionality and bug fixes between versions 0.15.0 and 0.15.1, though specific features aren't detailed in the announcement. The tool aims to provide a unified solution for collecting metrics from PostgreSQL and related services.
11
20
Article
PostgreSQL·36w
PostgreSQL: pgexporter 0.7
pgexporter version 0.7.0 has been released with improvements to core metrics and new features. This Prometheus exporter for PostgreSQL includes extension support developed as part of a Google Summer of Code project. The tool helps monitor PostgreSQL databases by exporting metrics to Prometheus.
11
21
Article
Spacelift·1y
Kubernetes Observability With Kube-State-Metrics: Guide
Kube-State-Metrics is a Kubernetes addon that provides metrics about cluster objects, enabling you to monitor and make decisions based on the current state of your objects. It works by capturing and serving metrics in Prometheus format, which can be queried for monitoring, alerting, and dashboarding. The guide explains how to install Kube-State-Metrics using various methods, best practices for configurations, and example Prometheus queries. It highlights the importance of combining it with Metrics-Server for a comprehensive observability strategy in Kubernetes environments.
11
22
Article
Trendyol Tech·1y
Kafka Consumer Monitoring Tips
Microservice architecture relies heavily on message brokers like Kafka, making consumer monitoring essential for business sustainability. This post discusses how Trendyol monitors Kafka consumers using Prometheus and handles alerts through Alert Manager. It covers key issues like lag, disconnected consumers, chronic rebalancing, skewed partitions, idle consumers, and consumer processing metrics, providing detailed approaches and PromQL expressions to detect and address these problems.
11
23
Article
Last9·1y
Essential Prometheus Queries: Simple to Advanced
Learn essential Prometheus queries from basic to advanced levels to effectively monitor and optimize your systems. This comprehensive guide covers practical examples, performance optimization techniques, and troubleshooting tips, ensuring you can deploy Prometheus effectively for real-world scenarios like Kubernetes pod monitoring, database query tracking, and dynamic baseline comparisons.
10

See all Prometheus archives