Kubernetes Probes

This document describes how to use the Helidon health check API with Kubernetes.

About Kubernetes probes

Probes is the term used by Kubernetes to describe health checks for containers (Kubernetes documentation).

There are two types of probes:

  • liveness: Indicates whether the container is running

  • readiness: Indicates whether the container is ready to service requests

You can implement probes using the following mechanisms:

  1. Running a command inside a container
  2. Sending an HTTP request to a container
  3. Opening a TCP socket to a container

A microservice exposed to HTTP traffic will typically implement both the liveness probe and the readiness probe using HTTP requests.

You can configure several parameters for probes. The following are the most relevant parameters:

initialDelaySecondsNumber of seconds after the container has started before liveness or readiness probes are initiated.
periodSecondsProbe interval. Default to 10 seconds. Minimum value is 1.
timeoutSecondsNumber of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1
failureThresholdNumber of consecutive failures after which the probe should stop. Default: 3. Minimum: 1.

Liveness probe

The liveness probe is used to verify the container has become unresponsive. For example, it can be used to detect deadlocks or analyze heap usage. When Kubernetes gives up on a liveness probe, the corresponding pod is restarted.

The liveness probe can result in repeated restarts in certain cases. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. Repeated restarts can also occur if timeoutSeconds or periodSeconds is too low.

We recommend the following:

  • Avoid checking dependencies in a liveness probe.

  • Set timeoutSeconds to avoid excessive probe failures.

  • Acknowledge startup times with initialDelaySeconds.

Readiness probe

The readiness probe is used to avoid routing requests to the pod until it is ready to accept traffic. When Kubernetes gives up on a readiness probe, the pod is not restarted, traffic is not routed to the pod anymore.

In certain cases, the readiness probe can cause all the pods to be removed from service routing. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. This issue can also occur if timeoutSeconds or periodSeconds is too low.

We recommend the following:

  • Be conservative when checking shared dependencies.

  • Be aggressive when checking local dependencies.

  • Set failureThreshold according to periodSeconds in order to accommodate temporary errors.

Troubleshooting probes

Failed probes are recorded as events associated with their corresponding pods. The event message contains only the status code.

Get the events of a single pod:
POD_NAME=$(kubectl get pod -l app=acme -o jsonpath='{.items[0].metadata.name}') 
kubectl get event --field-selector involvedObject.name=${POD_NAME} 
Copied
  • Get the effective pod name by filtering pods with the label app=acme.
  • Filter the events for the pod.

Create log messages in your health check implementation when setting a DOWN status. This will allow you to correlate the cause of a failed probe.

Example

This example shows the usage of the Helidon health API in an application that implements health endpoints for the liveness and readiness probes. Note that the application code dissociates the health endpoints from the default routes, so that the health endpoints are not exposed by the service. An example YAML specification is also provided for the Kubernetes service and deployment.

Application code:
Routing healthRouting = Routing.builder()
        .register(JsonSupport.create())
        .register(HealthSupport.builder()
                .webContext("/live") 
                .addLiveness(HealthChecks.healthChecks()) 
                .build())
        .register(HealthSupport.builder()
                .webContext("/ready") 
                .addReadiness(() -> HealthCheckResponse.named("database").up().build()) 
                .build())
        .build();

Routing defaultRouting = Routing.builder()
        .any((req, res) -> res.send("It works!")) 
        .build();

WebServer server = WebServer.builder(defaultRouting)
        .config(ServerConfiguration.builder()
                .port(8080) 
                .addSocket("health", SocketConfiguration.builder() 
                        .port(8081)
                        .build())
                .build())
        .addNamedRouting("health", healthRouting) 
        .build();

server.start();
Copied
  • The health service for the liveness probe is exposed at /live.
  • Using the built-in health checks for the liveness probe.
  • The health service for the readiness probe is exposed at /ready.
  • Using a custom health check for a pseudo database that is always UP.
  • The default route: returns It works! for any request.
  • The server uses port 8080 for the default routes.
  • A socket configuration named health using port 8081.
  • Route the health services exclusively on the health socket.
Kubernetes descriptor:
kind: Service
apiVersion: v1
metadata:
  name: acme 
  labels:
    app: acme
spec:
  type: NodePort
  selector:
    app: acme
  ports:
  - port: 8080
    targetPort: 8080
    name: http
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: acme 
spec:
  replicas: 1
  template:
    metadata:
      name: acme
      labels:
        name: acme
    spec:
      containers:
      - name: acme
        image: acme
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /live 
            port: 8081
          initialDelaySeconds: 3 
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready 
            port: 8081
          initialDelaySeconds: 10 
          periodSeconds: 30
          timeoutSeconds: 10
---
Copied
  • A service of type NodePort that serves the default routes on port 8080.
  • A deployment with one replica of a pod.
  • The HTTP endpoint for the liveness probe.
  • The liveness probe configuration.
  • The HTTP endpoint for the readiness probe.
  • The readiness probe configuration.