Dynolog is an open-source system monitoring daemon designed for heterogeneous CPU-GPU systems. It supports always-on performance monitoring and deep-dive profiling modes, integrating with the PyTorch Profiler and Kineto CUDA profiling library. It monitors various hardware and kernel metrics, including CPU, GPU, and network usage, to help optimize AI model training distributed across multiple nodes. Dynolog aims to provide a holistic view of system performance without significant overhead and is actively developed with a focus on Linux platforms and Rust for future components.

7m read timeFrom developers.facebook.com
Post cover image
Table of contents
Key featuresContinuous monitoringProfiling AI applications on demandLogging dataConclusionAdditional Notes

Sort: