Best of MonitoringSeptember 2024

  1. 1
    Article
    Avatar of last9Last9·2y

    PromQL Cheat Sheet: Must-Know PromQL Queries

    PromQL can be challenging but highly effective for monitoring and troubleshooting system performance. This guide offers essential PromQL queries to help you analyze real-time data, detect trends, identify resource-intensive services, track SLOs/SLIs, manage high cardinality, plan capacity, and perform multi-cluster queries. These snippets aim to make your life easier when working with Prometheus dashboards.

  2. 2
    Article
    Avatar of faunFaun·2y

    Monitoring in Kubernetes: Best Practices

    As Kubernetes adoption rises, effective monitoring is crucial to maintain the health and performance of containerized applications. The post outlines the importance of monitoring, explaining differences between monitoring and observability, and offers best practices including focusing on Four Golden Signals. It highlights the dynamic nature of Kubernetes, necessitating real-time monitoring to manage resource utilization, prevent outages, and ensure security. It also emphasizes the integration of monitoring into CI/CD pipelines and adopting a culture of observability for long-term success.

  3. 3
    Article
    Avatar of hnHacker News·2y

    aceberg/WatchYourLAN: Lightweight network IP scanner. Can be used to notify about new hosts and monitor host online/offline history

    WatchYourLAN is a lightweight network IP scanner with a web GUI that notifies users of new hosts and monitors the online/offline history of network devices. Version 2.0 introduces breaking changes and new features, including configuration through files, GUI, or environment variables, and the ability to send data to InfluxDB2 for Grafana dashboards. It supports various integration options like gotify, email, and Telegram for notifications.

  4. 4
    Article
    Avatar of hnHacker News·2y

    harsxv/tinystatus: Tiny status page generated by a Python script

    TinyStatus is a customizable Python script that generates a status page to monitor various services. It supports HTTP endpoint monitoring, ping checks, and port checks, with responsive design and incident history tracking. Configuration is done via YAML files, and it provides automatic status updates with customizable intervals. Dependencies include Python 3.7+ and pip. The project is open source under the MIT License.

  5. 5
    Article
    Avatar of planetgolangPlanet Golang·2y

    An Ode to Logging

    Effective logging is crucial for software development, helping identify errors and streamline debugging. Use proper loggers instead of print statements in production, distinguish log levels (Info, Error, Warn, Trace), and ensure logs are both human and machine-readable with structured formats like JSON. Group events with Spans to build Traces and always centralize logs with a retention policy to avoid hefty cloud bills.

  6. 6
    Article
    Avatar of postgresPostgreSQL·2y

    Coroot 1.4: Simplify PostgreSQL Monitoring (Open Source)

    Coroot 1.4 enhances PostgreSQL monitoring by offering seamless integration without extra configurations. Key features include L7 protocol decoding for operations, advanced SSL/TLS monitoring using eBPF, broad deployment support across various environments, and cloud-cost analysis. It also provides simple log analysis, low-impact query tracing, and in-depth performance and slow query identification.

  7. 7
    Article
    Avatar of prometheusPrometheus·2y

    Prometheus 3.0 Beta Released

    Prometheus 3.0-beta is now available for testing, featuring a completely rewritten UI, enhancements to Remote Write 2.0, expanded OpenTelemetry support, and experimental Native Histograms. Users are encouraged to test the beta and report any issues for a more stable final release. Notable additions include support for UTF-8 characters in metric and label names, and new configurations for OTLP ingestion.

  8. 8
    Article
    Avatar of grafanaGrafana Labs·2y

    Better root cause analysis: Mastering alert insights with the new central history timeline

    Grafana Alerting's new history feature in version 11.2 offers a comprehensive view of alert state transitions for better root cause analysis. The feature includes a filters section, an events chart, and an events table, helping users identify patterns and diagnose issues quickly. Designed for DevOps engineers, system administrators, and SRE teams, it enhances incident response strategies by providing detailed insights into alert rule state changes.

  9. 9
    Article
    Avatar of last9Last9·2y

    kube-state-metrics: Your Complete Guide to Simplifying Kubernetes Observability

    kube-state-metrics is an open-source add-on for Kubernetes that generates metrics about the state of various Kubernetes objects by listening to the Kubernetes API server. It complements other monitoring tools like metrics-server by providing insights into the health and status of Kubernetes resources such as pods, deployments, and nodes. Installation can be done using Helm, YAML manifests, or building from source. Integration with Prometheus allows for advanced querying and visualization using Grafana. Best practices include setting up appropriate RBAC permissions, enabling high availability, and leveraging custom resource metrics for enhanced observability.

  10. 10
    Article
    Avatar of itnextITNEXT·2y

    An Introduction to the OpenTelemetry Collector

    OpenTelemetry provides open standards for interoperable tools handling telemetry data. The OpenTelemetry Collector, a flexible and extensible deployable binary, acts as a universal translator and pipeline for gathering, processing, and forwarding metrics, traces, and logs. It supports various plugins and can be tailored to different environments, including custom distributions for specific use cases. Deployable on Kubernetes, it can gather telemetry cluster-wide, with specific plugins for Kubernetes entities. The post promises a hands-on tutorial on integrating Kubernetes Cluster Logging with ClickHouse and Grafana using OpenTelemetry.

  11. 11
    Article
    Avatar of last9Last9·2y

    Prometheus Recording Rules: A Developer's Guide to Query Optimization

    High-cardinality metrics can overwhelm Prometheus by generating numerous time series, slowing performance and increasing storage needs. Using recording rules can aggregate these metrics, reducing cardinality and optimizing queries. Key benefits include improved query speed, reduced storage needs, easier dashboard creation, and more reliable alerts. It's important to use clear naming conventions, keep rules simple, and monitor their performance. Advanced techniques like chaining rules, using math in queries, and employing subqueries can further enhance efficiency.

  12. 12
    Article
    Avatar of datadogDatadog·2y

    Operator vs. Helm: Finding the best fit for your Kubernetes applications

    Kubernetes operators and Helm charts are both tools for deploying and managing applications in Kubernetes clusters. Helm simplifies deployments using templates and version-controlled packages, making it ideal for repeatable, simple deployments. Operators offer a more automated and flexible approach, suitable for complex applications requiring custom lifecycle management. Datadog migrated its production clusters to the Datadog Operator for its advanced capabilities and consolidation of configurations. Both Helm and operators have their unique benefits, and the choice depends on the application's complexity and operational needs.