- Kubernetes Probes
This document describes how to use the Helidon health check API with Kubernetes.
About Kubernetes probes
Probes is the term used by Kubernetes to describe health checks for containers (Kubernetes documentation).
There are two types of probes:
liveness: Indicates whether the container is runningreadiness: Indicates whether the container is ready to service requests
You can implement probes using the following mechanisms:
- Running a command inside a container
- Sending an
HTTPrequest to a container - Opening a
TCPsocket to a container
A microservice exposed to HTTP traffic will typically implement both the liveness probe and the readiness probe using HTTP requests.
You can configure several parameters for probes. The following are the most relevant parameters:
initialDelaySeconds | Number of seconds after the container has started before liveness or readiness probes are initiated. |
periodSeconds | Probe interval. Default to 10 seconds. Minimum value is 1. |
timeoutSeconds | Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1 |
failureThreshold | Number of consecutive failures after which the probe should stop. Default: 3. Minimum: 1. |
Liveness probe
The liveness probe is used to verify the container has become unresponsive. For example, it can be used to detect deadlocks or analyze heap usage. When Kubernetes gives up on a liveness probe, the corresponding pod is restarted.
The liveness probe can result in repeated restarts in certain cases. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. Repeated restarts can also occur if timeoutSeconds or periodSeconds is too low.
We recommend the following:
Avoid checking dependencies in a liveness probe.
Set
timeoutSecondsto avoid excessive probe failures.Acknowledge startup times with
initialDelaySeconds.
Readiness probe
The readiness probe is used to avoid routing requests to the pod until it is ready to accept traffic. When Kubernetes gives up on a readiness probe, the pod is not restarted, traffic is not routed to the pod anymore.
In certain cases, the readiness probe can cause all the pods to be removed from service routing. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. This issue can also occur if timeoutSeconds or periodSeconds is too low.
We recommend the following:
Be conservative when checking shared dependencies.
Be aggressive when checking local dependencies.
Set
failureThresholdaccording toperiodSecondsin order to accommodate temporary errors.
Troubleshooting probes
Failed probes are recorded as events associated with their corresponding pods. The event message contains only the status code.
POD_NAME=$(kubectl get pod -l app=acme -o jsonpath='{.items[0].metadata.name}')
kubectl get event --field-selector involvedObject.name=${POD_NAME} - Get the effective pod name by filtering pods with the label
app=acme. - Filter the events for the pod.
Create log messages in your health check implementation when setting a DOWN status. This will allow you to correlate the cause of a failed probe.
Example
This example shows the usage of the Helidon health API in an application that implements health endpoints for the liveness and readiness probes. Note that the application code dissociates the health endpoints from the default routes, so that the health endpoints are not exposed by the service. An example YAML specification is also provided for the Kubernetes service and deployment.
Routing healthRouting = Routing.builder()
.register(JsonSupport.create())
.register(HealthSupport.builder()
.webContext("/live")
.addLiveness(HealthChecks.healthChecks())
.build())
.register(HealthSupport.builder()
.webContext("/ready")
.addReadiness(() -> HealthCheckResponse.named("database").up().build())
.build())
.build();
Routing defaultRouting = Routing.builder()
.any((req, res) -> res.send("It works!"))
.build();
WebServer server = WebServer.builder(defaultRouting)
.config(ServerConfiguration.builder()
.port(8080)
.addSocket("health", SocketConfiguration.builder()
.port(8081)
.build())
.build())
.addNamedRouting("health", healthRouting)
.build();
server.start();- The health service for the
livenessprobe is exposed at/live. - Using the built-in health checks for the
livenessprobe. - The health service for the
readinessprobe is exposed at/ready. - Using a custom health check for a pseudo database that is always
UP. - The default route: returns It works! for any request.
- The server uses port 8080 for the default routes.
- A socket configuration named
healthusing port8081. - Route the health services exclusively on the
healthsocket.
kind: Service
apiVersion: v1
metadata:
name: acme
labels:
app: acme
spec:
type: NodePort
selector:
app: acme
ports:
- port: 8080
targetPort: 8080
name: http
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: acme
spec:
replicas: 1
template:
metadata:
name: acme
labels:
name: acme
spec:
containers:
- name: acme
image: acme
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /live
port: 8081
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8081
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 10
---- A service of type
NodePortthat serves the default routes on port8080. - A deployment with one replica of a pod.
- The HTTP endpoint for the liveness probe.
- The liveness probe configuration.
- The HTTP endpoint for the readiness probe.
- The readiness probe configuration.