Kubernetes clusters often add nodes unexpectedly even when CPU and memory dashboards show available headroom. The root cause is typically stale resource requests — values set with safety buffers that were never revisited as workloads evolved. Because the scheduler places pods based on declared requests rather than actual usage, inflated requests cause nodes to appear full before real utilization is reached, triggering the autoscaler unnecessarily. The fix involves comparing requests to observed usage over time, identifying the few namespaces that dominate capacity, and gradually bringing requests back in line with reality. Inference workloads amplify this problem due to fast-scaling replica counts. Practical steps include checking for persistent gaps between requests and usage, using cost allocation views to identify offenders, and rolling out changes incrementally to maintain trust and avoid pager noise.
Table of contents
The confusing part is that metrics look fineThis shows up in well-run clusters tooThe first question to ask when scaling feels wrongHow to confirm drift without turning it into a projectWhere cost allocation helps without turning this into a billing conversationGetting unstuck is usually smaller than it feelsWhat changes when drift is under controlSort: