Lesson 3.5: Horizontal Pod Autoscaler

Autoscaling in Kubernetes is a mechanism to dynamically adjust the resources allocated to workloads based on demand. It ensures that applications have the necessary resources to handle traffic spikes while optimizing resource utilization during periods of low demand. Kubernetes provides several autoscaling mechanisms, including:

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed CPU or memory utilization, or custom metrics.

How HPA Works:
- HPA continuously monitors the resource usage (e.g., CPU, memory) or custom metrics of the Pods.
- If the usage exceeds a predefined threshold, HPA increases the number of Pod replicas.
- If the usage falls below the threshold, HPA decreases the number of Pod replicas.
Key Features:
- Metrics: HPA can scale based on CPU, memory, or custom metrics (e.g., requests per second).
- Target Utilization: You define a target utilization percentage (e.g., 80% CPU usage).
- Min/Max Replicas: You specify the minimum and maximum number of replicas to control the scaling range.
Use Cases:
- Scaling stateless applications (e.g., web servers, APIs) to handle varying traffic loads.
- Ensuring high availability and performance during traffic spikes.

[root@master ~]# cd hpa/
[root@master hpa]# ls
loaddocker  php-apache.yml
[root@master hpa]# cat php
cat: php: No such file or directory
[root@master hpa]# cat php-apache.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: treehouses/php-apache:202109232218
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

[root@master hpa]# kubectl get deployments
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
php-apache   1/1     1            1           19s
[root@master hpa]# 
[root@master hpa]# kubectl get pods 
NAME                         READY   STATUS    RESTARTS   AGE
php-apache-559849875-9b7jw   1/1     Running   0          33s

[root@master hpa]# kubectl autoscale deployment php-apache --cpu-percent=10 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled
[root@master hpa]# kubectl get hpa
NAME         REFERENCE               TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   <unknown>/10%   1         10        0          4s

Providing load

[root@master hpa]# kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.001; do wget -q -O- http://php-apache; done"

When the cpu usage is more than 10% then the pods are scaled. So we can see in the output below, showing the increment in pod number when the usage is increased.

[root@master hpa]# kubectl get hpa -w 
NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0%/10%    1         10        1          8m59s
php-apache   Deployment/php-apache   3%/10%    1         10        1          9m8s
php-apache   Deployment/php-apache   22%/10%   1         10        1          9m23s
php-apache   Deployment/php-apache   21%/10%   1         10        3          9m39s
 
[root@master hpa]# kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
load-generator               1/1     Running   0          52s
php-apache-559849875-9b7jw   1/1     Running   0          13m
php-apache-559849875-dqf4n   1/1     Running   0          18s
php-apache-559849875-kgs5t   1/1     Running   0          19s

Key Factors Affecting Scaling Behavior

Stabilization Window:
- The HPA has a stabilization window to prevent rapid scaling up and down (thrashing).
- By default, the stabilization window for scaling down is 5 minutes. This means the HPA will wait for 5 minutes of consistently low CPU utilization before scaling down.
Cool-Down Period:
- After scaling up or down, the HPA waits for a cool-down period before taking further scaling actions. This prevents rapid fluctuations in the number of pods.
Metrics Averaging:
- The HPA uses the average CPU utilization across all pods over a period of time (default is 15 seconds). This ensures that temporary spikes or drops in CPU usage do not trigger unnecessary scaling.

# After 5 minutes
[root@master hpa]# kubectl get hpa 
NAME         REFERENCE               TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   cpu: 0%/10%   1         3         1          8m10s
 
[root@master hpa]# kubectl get pods 
NAME                          READY   STATUS    RESTARTS   AGE
load-generator                0/1     Error     0          7m54s
php-apache-5df7f4868f-rhxsz   1/1     Running   0          7m12s