Contents
Overview
It’s a good practice to monitor your microservice’s health, to ensure that it is available and performs correctly. Applications implement health checks to expose health status that is collected at regular intervals by external tooling, such as orchestrators like Kubernetes. The orchestrator may then take action, such as restarting your application if the health check fails.
A typical health check combines the statuses of all the dependencies that affect availability and the ability to perform correctly:
Network Latency
Storage
Database
Other Services (used by the application)
Maven Coordinates
To enable Health Checks add the following dependency to your project’s pom.xml (see Managing Dependencies).
<dependency>
<groupId>io.helidon.reactive.health</groupId>
<artifactId>helidon-reactive-health</artifactId>
</dependency>Optional dependency to use built-in health checks:
<dependency>
<groupId>io.helidon.health</groupId>
<artifactId>helidon-health-checks</artifactId>
</dependency>API
A health check is a Java functional interface that returns a HealthCheckResponse instance. You can choose to implement a health check inline with a lambda expression or you can reference a method with the double colon operator ::.
HealthCheck hc = () -> HealthCheckResponse
.named("exampleHealthCheck")
.up()
.build();HealthCheckResponse exampleHealthCheck() {
return HealthCheckResponse
.named("exampleHealthCheck")
.up()
.build();
}
HealthCheck hc = this::exampleHealthCheck;HealthSupport is a WebServer service that contains a collection of registered HealthCheck instances. When queried, it invokes the registered health check and returns a response with a status code representing the overall status of the application.
200 | The application is healthy (with health check details in the response). |
204 | The application is healthy (with no health check details in the response). |
503 | The application is not healthy. |
500 | An error occurred while reporting the health. |
HTTP GET responses include JSON content showing the detailed results of all the health checks which the server executed after receiving the request. HTTP HEAD requests return only the status with no payload.
The following code snippets show how to register health checks while building an instance of HealthSupport:
HealthSupport health = HealthSupport.builder()
.addLiveness(hc) // hc created above
.build();HealthSupport health = HealthSupport.builder()
.addLiveness(() -> HealthCheckResponse.named("exampleHealthCheck")
.up()
.withData("time", System.currentTimeMillis())
.build())
.build();The custom health check above returns a status of UP and the current time. After creating the HealthCheck and registering it in a HealthSupport, we must add the latter to the WebServer routes as follows:
Routing.builder()
.register(health)
.build();Here is a sample response to the custom health check registered above:
{
"status": "UP",
"checks": [
{
"name": "exampleHealthCheck",
"status": "UP",
"data": {
"time": 1546958376613
}
}
]
}Balance collecting a lot of information with the need to avoid overloading the application and overwhelming users.
The following table provides a summary of the Health Check API classes.
org.eclipse.microprofile.health.HealthCheck | Java functional interface representing the logic of a single health check |
org.eclipse.microprofile.health.HealthCheckResponse | Result of a health check invocation that contains a status and a description. |
org.eclipse.microprofile.health.HealthCheckResponseBuilder | Builder class to create HealthCheckResponse instances |
io.helidon.reactive.health.HealthSupport | WebServer service that exposes /health and invokes the registered health checks |
io.helidon.reactive.health.HealthSupport.Builder | Builder class to create HealthSupport instances |
Built-In Health Checks
You can use Helidon-provided health checks to report various common health check statuses:
| Built-in health check | Health check name | JavaDoc | Config properties | Default config value |
|---|---|---|---|---|
| deadlock detection † | deadlock | DeadlockHealthCheck | n/a | n/a |
| available disk space † | diskSpace | DiskSpaceHealthCheck | health.checks.diskSpace.thresholdPercenthealth.checks.diskSpace.path | 99.999/ |
| available heap memory | heapMemory | HeapMemoryHealthCheck | health.checks.heapMemory.thresholdPercent | 98 |
† Helidon cannot support the indicated health checks in the GraalVM native image environment, so with native image those health checks do not appear in the health output.
The following code adds the default built-in health checks to your application:
HealthSupport health = HealthSupport.builder()
.add(HealthChecks.healthChecks())
.build();
Routing.builder()
.register(health)
.build();- Add built-in health checks using defaults (requires the
helidon-health-checksdependency). - Register the created
HealthSupportwith web server routing (adds the/healthendpoint).
You can control the thresholds for built-in health checks in either of two ways:
Create the health checks individually using their builders instead of using the
HealthChecksconvenience class. Follow the JavaDoc links in the table above.
Kubernetes Probes
Probes is the term used by Kubernetes to describe health checks for containers (Kubernetes documentation).
There are three types of probes:
liveness: Indicates whether the container is running
readiness: Indicates whether the container is ready to service requests
startup: Indicates whether the application in the container has started
You can implement probes using the following mechanisms:
- Running a command inside a container
- Sending an
HTTPrequest to a container - Opening a
TCPsocket to a container
A microservice exposed to HTTP traffic will typically implement both the liveness probe and the readiness probe using HTTP requests. If the microservice takes a significant time to initialize itself, you can also define a startup probe, in which case Kubernetes does not check liveness or readiness probes until the startup probe returns success.
You can configure several parameters for probes. The following are the most relevant parameters:
initialDelaySeconds | Number of seconds after the container has started before liveness or readiness probes are initiated. |
periodSeconds | Probe interval. Default to 10 seconds. Minimum value is 1. |
timeoutSeconds | Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1 |
failureThreshold | Number of consecutive failures after which the probe should stop. Default: 3. Minimum: 1. |
Liveness Probe
The liveness probe is used to verify the container has become unresponsive. For example, it can be used to detect deadlocks or analyze heap usage. When Kubernetes gives up on a liveness probe, the corresponding pod is restarted.
The liveness probe can result in repeated restarts in certain cases. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. Repeated restarts can also occur if timeoutSeconds or periodSeconds is too low.
We recommend the following:
Avoid checking dependencies in a liveness probe.
Set
timeoutSecondsto avoid excessive probe failures.Acknowledge startup times with
initialDelaySeconds.
Readiness Probe
The readiness probe is used to avoid routing requests to the pod until it is ready to accept traffic. When Kubernetes gives up on a readiness probe, the pod is not restarted, traffic is not routed to the pod anymore.
In certain cases, the readiness probe can cause all the pods to be removed from service routing. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. This issue can also occur if timeoutSeconds or periodSeconds is too low.
We recommend the following:
Be conservative when checking shared dependencies.
Be aggressive when checking local dependencies.
Set
failureThresholdaccording toperiodSecondsin order to accommodate temporary errors.
Startup Probe
The startup probe prevents Kubernetes from prematurely checking the other probes if the application takes a long time to start. Otherwise, Kubernetes might misinterpret a failed liveness or readiness probe and shut down the container when, in fact, the application is still coming up.
Troubleshooting Probes
Failed probes are recorded as events associated with their corresponding pods. The event message contains only the status code.
POD_NAME=$(kubectl get pod -l app=acme -o jsonpath='{.items[0].metadata.name}')
kubectl get event --field-selector involvedObject.name=${POD_NAME} - Get the effective pod name by filtering pods with the label
app=acme. - Filter the events for the pod.
Create log messages in your health check implementation when setting a DOWN status. This will allow you to correlate the cause of a failed probe.
Configuration
Built-in health checks can be configured using the config property keys described in this table. Further, you can suppress one or more of the built-in health checks by setting the configuration item health.exclude to a comma-separated list of the health check names (from this table) you want to exclude.
Examples
JSON Response Example
Accessing the Helidon-provided /health endpoint reports the health of your application as shown below:
{
"status": "UP",
"checks": [
{
"name": "deadlock",
"status": "UP"
},
{
"name": "diskSpace",
"status": "UP",
"data": {
"free": "211.00 GB",
"freeBytes": 226563444736,
"percentFree": "45.31%",
"total": "465.72 GB",
"totalBytes": 500068036608
}
},
{
"name": "heapMemory",
"status": "UP",
"data": {
"free": "215.15 MB",
"freeBytes": 225600496,
"max": "3.56 GB",
"maxBytes": 3817865216,
"percentFree": "99.17%",
"total": "245.50 MB",
"totalBytes": 257425408
}
}
]
}Kubernetes Example
This example shows the usage of the Helidon health API in an application that implements health endpoints for the liveness and readiness probes. Note that the application code dissociates the health endpoints from the default routes, so that the health endpoints are not exposed by the service. An example YAML specification is also provided for the Kubernetes service and deployment.
Routing healthRouting = Routing.builder()
.register(JsonSupport.create())
.register(HealthSupport.builder()
.webContext("/live")
.add(HealthChecks.healthChecks())
.build())
.register(HealthSupport.builder()
.webContext("/ready")
.addReadiness(() -> HealthCheckResponse.named("database").up().build())
.build())
.build();
Routing defaultRouting = Routing.builder()
.any((req, res) -> res.send("It works!"))
.build();
WebServer server = WebServer.builder(defaultRouting)
.config(ServerConfiguration.builder()
.port(8080)
.addSocket("health", SocketConfiguration.builder()
.port(8081)
.build())
.build())
.addNamedRouting("health", healthRouting)
.build();
server.start();- The health service for the
livenessprobe is exposed at/live. - Using the built-in health checks for the
livenessprobe. - The health service for the
readinessprobe is exposed at/ready. - Using a custom health check for a pseudo database that is always
UP. - The default route: returns It works! for any request.
- The server uses port 8080 for the default routes.
- A socket configuration named
healthusing port8081. - Route the health services exclusively on the
healthsocket.
kind: Service
apiVersion: v1
metadata:
name: acme
labels:
app: acme
spec:
type: NodePort
selector:
app: acme
ports:
- port: 8080
targetPort: 8080
name: http
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: acme
spec:
replicas: 1
selector:
matchLabels:
app: acme
template:
metadata:
name: acme
labels:
name: acme
spec:
containers:
- name: acme
image: acme
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /live
port: 8081
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8081
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 10
---- A service of type
NodePortthat serves the default routes on port8080. - A deployment with one replica of a pod.
- The HTTP endpoint for the liveness probe.
- The liveness probe configuration.
- The HTTP endpoint for the readiness probe.
- The readiness probe configuration.