2.2 KiB
2.2 KiB
2021-09-03
CP crashes
Looking into Control Panel OOM issues in Kubernetes.
Average of ~138mb memory usage per request.
ps aufx |grep apache | awk '{print "cat /proc/" $1 "/statm"}' | sh | grep -v open | awk '{print $0}'
Memory limit should account for the 256MB APC cache.
Actions taken:
- Made per-pod worker count configurable via consul (easier to change if needed)
- Tuned pod worker counts and memory allocations based on memory usage metrics (helps avoid OOMKills)
- Removed the k8s liveness probe (avoids k8s-kills when loaded)
- Revised the k8s readiness probe to just check TCP socket availability (avoids removing pod from rotation during load)
- Load-tested these changes in staging: confirmed that pods stay up under load, and scaling more readily mitigates pod
- Monitoring added to draw attention to load-induced symptoms of control-panel pods.
- Auto-scaling will be investigated next week as an additional load mitigation tool.
(details: 32 workers per pod, 1Gi RAM per pod (for now), 8 pods)