Best of Observability — November 2024

1
Video
Community Picks·2y
12 Logging BEST Practices in 12 minutes
Effective logging is crucial for troubleshooting and maintaining system health. Key practices include having a clear plan, understanding log levels (info, warning, error, fatal), using structured logging, capturing detailed log entries, implementing log sampling to reduce storage costs, using canonical log lines, centralizing logs, setting retention policies, securing sensitive data, and choosing efficient logging libraries to minimize performance impact. Additionally, metrics should be used alongside logs for real-time monitoring.
444
7
2
Article
Grafana Labs·2y
Prometheus 3.0 and OpenTelemetry: a practical guide to storing and querying OTel data
Prometheus 3.0 aims to improve integration with OpenTelemetry by addressing challenges such as resource attributes, UTF-8 support, and temporalities. The Prometheus 3.0 release includes features like promoting resource attributes to metric labels, a new `info` PromQL function, and stable OTLP support for easier data ingestion and querying. Users can also utilize the delta to cumulative processor in OTel Collector for better data handling. Future developments will focus on enhancing interoperability and scalability.
29
3
Article
swizec.com·2y
Why you need observability more than tests
Friday deploys can be daunting, but effective observability can quickly identify and resolve issues. Unlike tests, which can miss production-specific problems, observability provides real-time insights through centralized error logging and alerts. This approach facilitated a rapid response to a SQL error following an update, highlighting the importance of default instrumentation, easy log addition, and self-serve alert creation for maintaining system stability.
27
4
Article
Hacker News·2y
hyperdxio/hyperdx at v2
HyperDX enables engineers to debug production issues quickly by offering an easy way to search and visualize logs and traces on any Clickhouse cluster. It supports schema-agnostic setups, offers blazing-fast searches, intuitive full-text and property search, and native JSON querying. HyperDX is compatible with OpenTelemetry and supports multiple programming languages and platforms. It is available in beta for local use and as a hosted cloud service. HyperDX aims to simplify observability and mitigate the shortcomings of existing tools by being cost-effective, user-friendly, and integrative.
27
1
5
Article
Last9·2y
Kubernetes Observability with OpenTelemetry Operator
The OpenTelemetry Operator enhances observability in Kubernetes clusters by simplifying the deployment and management of telemetry pipelines. It automates tasks such as scaling collectors, exporting telemetry data, and instrumentation. Key features include various deployment modes, custom pipelines, scalability, and auto-instrumentation. Best practices involve starting small, using CRDs for configurations, monitoring collector health, isolating telemetry pipelines by namespaces, and using metrics for scaling.
22
6
Article
AWS Fundamentals·2y
OpenTelemetry on AWS: Observability at Scale with Open-Source
Learn how to implement an observability stack on AWS using OpenTelemetry, CloudWatch, and AWS X-Ray for serverless applications. This guide walks through configuring AWS Lambda for trace and log collection, and how the AWS Distro for OpenTelemetry provides a secure, production-ready solution for instrumenting applications with minimal code changes.
21
7
Article
MongoDB·2y
MongoDB Database Observability: Integrating with Monitoring Tools
Learn how to integrate MongoDB Atlas with popular observability tools like Datadog, Prometheus, PagerDuty, Microsoft Teams, and Slack to streamline operations and enhance visibility. Follow the guide on configuring these integrations through Atlas UI for a unified view of database and application metrics, ensuring efficient incident response and optimal performance. Explore a use-case scenario with an e-commerce company leveraging MongoDB Atlas, Datadog, and Slack for a seamless observability ecosystem.
20
8
Article
Faun·2y
Break Down of Service Mesh Concepts
A service mesh is an infrastructure layer that manages service-to-service communication in a microservices architecture, improving reliability, security, and transparency. Key components include the data plane, with proxies managing service communication, and the control plane, which configures the data plane. Core capabilities include traffic management, security features like mutual TLS, and observability with metrics, logs, and tracing. The transition from monolithic to microservices, coupled with network resilience, policy management, and identity, enhances communication and security. The sidecar pattern separates business logic from network concerns, centralizing management and scalability.
17
9
Article
Community Picks·2y
How observability can increase team velocity instead of slowing you down
Observability empowers development teams to move faster by providing real-time insights into system performance and potential issues. By integrating observability early in the development lifecycle, teams can proactively troubleshoot, maintain stability, and enhance productivity without increasing team size. Key practices include monitoring metrics, logs, and traces to diagnose and address problems swiftly, allowing for efficient and high-velocity development.
15
3
10
Article
Coralogix·2y
Why should you care about architectural differentiators?
Understanding the architectural foundation of a product is crucial as it shapes the overall user experience and functionality. In the SaaS market, this is often overlooked in favor of feature comparison. The architectural philosophy of a product, such as Coralogix's commitment to Indexless Observability, plays a significant role in the product's performance and capabilities. This philosophy can be the key differentiator driving the specific features and outcomes offered by the platform.
15
1
11
Article
Last9·1y
A Complete Guide to Using the Grok Debugger
Log parsing is crucial for modern observability, enabling thorough data analysis from servers and applications. The Grok debugger is a powerful tool for interpreting logs but is often underutilized. This guide delves into Grok debugging, explaining its pattern-matching mechanics, common patterns, and practical implementation. Key tips for developing and debugging Grok patterns are also covered, enhancing your log analysis capabilities.
13
12
Article
Grafana Labs·1y
How to use OpenTelemetry and Grafana Alloy to convert delta to cumulative at scale
Migrating to a Prometheus-based ecosystem with Grafana Alloy, which integrates OpenTelemetry, makes handling metrics easier, especially when converting delta metrics to cumulative ones. This post details the algorithm used for this conversion and outlines the load balancing and container setup necessary for scalable deployment. It also mentions recent updates to the OpenTelemetry Collector, making the process more efficient and scalable.
13
13
Article
OpenTelemetry·2y
Prometheus and OpenTelemetry - Better Together
Prometheus and OpenTelemetry (OTel) are key tools for monitoring complex distributed systems. While OTel focuses on instrumentation without offering backend storage, Prometheus provides a time-series data store and a web interface for visualizing metrics. The integration of OTel with Prometheus, particularly in Kubernetes environments, is explored. The OTel Collector’s Prometheus Receiver ingests Prometheus metrics, and the Target Allocator aids in Prometheus service discovery and ensures even distribution of targets among collectors. This setup removes the need for maintaining Prometheus as a data store and provides flexibility in monitoring Kubernetes systems using OTel tools.
10

See all Observability archives