Building automatic failover for multiple WireGuard VPN endpoints using systemd services and Prometheus-based health monitoring.

/dev/io

A walkthrough of building automatic WireGuard VPN failover on NixOS using systemd services and Prometheus metrics. The solution uses three components: wg-select for initial server selection on boot, wg-failover for switching to the healthiest server, and wg-health-check for continuous monitoring. Two smokeping probers provide health data — one external (on the hypervisor) measuring reachability to all VPN endpoints, and one internal (inside the VPN namespace) measuring actual tunnel quality. The health check runs every minute and triggers failover after three consecutive checks showing over 15% packet loss, with Prometheus serving as the authoritative source of server health data.

Automatic WireGuard Failover with NixOS and Prometheus

Monitoring All VPN Endpoints (External Prober)

Measuring Tunnel Quality (Internal Prober)