A walkthrough of building automatic WireGuard VPN failover on NixOS using systemd services and Prometheus metrics. The solution uses three components: wg-select for initial server selection on boot, wg-failover for switching to the healthiest server, and wg-health-check for continuous monitoring. Two smokeping probers provide health data — one external (on the hypervisor) measuring reachability to all VPN endpoints, and one internal (inside the VPN namespace) measuring actual tunnel quality. The health check runs every minute and triggers failover after three consecutive checks showing over 15% packet loss, with Prometheus serving as the authoritative source of server health data.
Table of contents
Architecture OverviewServer Selection on BootIntelligent FailoverHealth Check TimerTriggering Failover on Service FailureMonitoring All VPN Endpoints (External Prober)Measuring Tunnel Quality (Internal Prober)Monitoring the Current ServerResultsSort: