Eran Kampf
Eran Kampf
5 min read

Zero Downtime Django (gunicorn) Deployments on GKE

thumbnail for this post

We recently switched to Twingate’s GKE load balancer to use Google’s new Container-native load balancer. The premise was good - LB talks directly to pods and saves an extra network hops, (with classic LB, traffic goes from LB to a GKE node which then, based on iptables configured by kube-proxy, get routed to the pod) and should perform better, support more features, and in general we’d rather be on google’s maintained side and not on legacy tech.

However, immediately after making the switch, we started noticing short bursts of 502 errors whenever we’d deploy a new release of our services to the cluster. We tracked it down to the following behavior described in the Container-native load balancing through Ingress docs:

502 errors and rejected connections can also be caused by a container that doesn’t handle SIGTERM.
If a container doesn’t explicitly handle SIGTERM, it immediately terminates and stops handling requests. The load balancer continues to send incoming traffic to the terminated container, leading to errors.

Why do we get 502s on pod restarts?  #

The legacy load balancer relied on Kubernetes’s kube-proxy to do the routing.
kube-proxy configures the iptables on all the cluster’s node with rules on how to distribute traffic to nodes.
When the load balancer receives a request, it sends it to a random node on the cluster which then routes it to the pod (which might be on a different node).
kube-proxy is aware of the different pod’s states and when a pod changed state to Terminating it immediately updates the routing information.

With Container-native load balancing, traffic is routed directly to pods.
This eliminates the extra networking hop but at a cost that it is not aware of the pods state and relies on healthchecks to know when a pod is terminating.

We were getting these 502s bursts because once we deployed a new version, old pods were being terminated and when receiving SIGTERM they’d stop processing new requests. The load balancer, however, would still send them requests until healthcheck fails (it was set to 10s in our case) and it removes it from circulation.

To solve this we need to be able to gracefully terminate our pods - we need some sort of a toggle to tell the pod to start failing its healthcheck while it continues processing other requests regularly for enough time for the load balancer to stop sending traffic its way.

In order to address this issue, we must find a way to gracefully terminate our pods.
This requires some kind of switch that instructs the pod to begin failing its health check, while simultaneously maintaining regular processing of other requests for enough time to allow the load balancer to mark the pod as done and stop sending traffic its way.

To understand how to do this, lets first take a step back and understand Kubernetes’s process for terminating pods…

Whats the termination process for Kubernetes Pod

1. Pod is set to “Terminating” state

The pod is then removed from endpoints list of all services and kube-proxy updates routing rules on all nodes so that they shouldn’t receive traffic.

2. preStop Hook is called

The preStop Hook is a command executed on the containers in the pod.

3. SIGTERM signal is sent to pod

Kubernetes sends a SIGTERM to the containers in the pod to let them know they need to shut down soon.

4. Kubernetes waits for containers to gracefully terminate

Kubernetes wait for a specified time, called termination grace period for containers to gracefully terminate. By default, this period is set to 30 seconds but it can be customized by setting terminationGracePeriodSeconds value as part of the pod spec:

apiVersion: v1
kind: Pod
  name: example-pod
  terminationGracePeriodSeconds: 60
  - name: app
    image: busybox

5. SIGKILL signal is sent to pod and it’s removed

If containers are still running after the grace period, they are sent SIGKILL signal and are forcibly removed. Kubernetes then cleans up its objects store.

Gracefully Terminating Django (gunicorn)  #

Gunicorn has its own definition for graceful timeout - when it receives a SIGTERM it will give workers a grace period (30s by default) to finish processing the current request they’re processing and exit. In our case we need gunicorn to continue serving requests for some time before shutting the worker down:

  1. When pod is terminating, toggle health check (we’re using /health) view
  2. Wait for 25 seconds (We set the LB to healthcheck every 5s and consider a pod down after 2 consecutive failures so 25s should give it enough time to fail)
  3. Send SIGTERM to gunicorn

The simplest way to signal Django to start failing the healthcheck is by using a file - /tmp/shutdown - if the file exists we should start failing the healthcheck.
(We can’t use a variable and\or http call because gunicorn runs multiple workers and doing some multiprocess memory sharing magic is too complex)

So the detailed graceful shutdown process is as follows:

  1. Kubernetes sets pod to “Terminating state”
  2. Kubernetes calls preStop hook 2.1. Create a /tmp/shutdown file 2.2. Sleep for 25s - enough time for load balancer to refresh
  3. Kubernetes sends SIGTERM to container and gunicorn shuts down workers

Our preStop hook is pretty simple: (Note that our LB are configured to healthcheck every 5s and remove target if it fails twice so we need to sleep for at least 10s to make sure pod is removed. These settings may differ on your system…)

            - sh
            - -c
            - echo "shutting down - $(date +%s)" >> /tmp/shutdown && sleep 25

Our Django healthcheck view:

SHUTDOWN_FILE = "/tmp/shutdown"  # nosec

def is_shutting_down() -> bool:
    return os.path.exists(SHUTDOWN_FILE)

def health_check(_request):
    if is_shutting_down():
        return HttpResponse("Shutting Down...", status=503)

    ... Some extra healthcheck logic ...
    return HttpResponse("OK")

References  #