NVIDIA has launched Fleet Intelligence, a generally available agent-based managed service for continuous monitoring of NVIDIA data center GPUs. The service provides real-time telemetry on power, temperature, performance, health, and configuration across large GPU fleets. A low-footprint host agent streams data to a managed cloud service, surfacing anomalies, alerts, and health check results via a dashboard on NVIDIA NGC. The agent is open source for auditability and integrates with DCGM and GPUd. A key feature is cryptographic GPU integrity verification using the NVIDIA Attestation SDK and Remote Attestation Service, currently supported on Blackwell and Vera Rubin architectures. The service is free for NVIDIA data center GPU owners and operators.

7m read timeFrom developer.nvidia.com
Post cover image
Table of contents
What are the key focus areas of GPU monitoring?What is NVIDIA Fleet Intelligence?Get started with NVIDIA Fleet Intelligence

Sort: