We have an app that automatically sets up Kubernetes HPA to our customers.
We are now using metrics from Prometheus as targets to HPA. These metrics are exported from our Java applications using the JMX exporter. We mostly use JVM internal memory usage metrics, such as prometheus.googleapis.com/jvm_memory_used_bytes/gauge
.
We’re experiencing undesired auto-scaling events triggered by the Horizontal Pod Autoscaler (HPA) during deployments in Kubernetes.
The Problem
We believe this is what is happening:
Using the RollingUpdate
deployment strategy, Kubernetes creates a new pod before terminating an old one. This temporarily increases currentReplicas
, affecting the HPA formula:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
It takes some time to this new pod to export its own metrics, though. Thus, the rate currentMetricValue / desiredMetricValue
stays the same, and now the desired number of replicas is increased.
Example
Suppose I have two pods that are serving well the current demand. I update the version of the image in their deployment, so it starts a new pod with the new image. In this case, the values for the formula will be:
currentReplicas = 3
(includes new pod)currentMetricValue = 6430859672
(same as before, since no metrics of the new pod were imported into Prometheus yet)desiredMetricValue = 9172000000
But I do not need a third pod for the current demand, thus resulting in an undesired desiredReplicas = 3
.
Considered solutions
We tried or considered these solutions:
- Switching to the
Recreate
strategy avoids this issue but we cannot afford the associated downtime. - The
--horizontal-pod-autoscaler-initial-readiness-delay
didn't help, we understood from the Kubernetes documentation it only affects the CPU resource metrics collection. - We are thinking about automatically disable the auto-scaling during deployment, but it is a last resort since it is more complex.
Is there a way, maybe some option in the HorizontalPodAutoscaler
object, to prevent it from happening?