Health Checks

Overview

It’s a good practice to monitor your microservice’s health, to ensure that it is available and performs correctly. Applications implement health checks to expose health status that is collected at regular intervals by external tooling, such as orchestrators like Kubernetes. The orchestrator may then take action, such as restarting your application if the health check fails.

A typical health check combines the statuses of all the dependencies that affect availability and the ability to perform correctly:

Network Latency
Storage
Database
Other Services (used by the application)

Maven Coordinates

To enable Health Checks add the following dependency to your project’s pom.xml (see Managing Dependencies).

<dependency>
    <groupId>io.helidon.reactive.health</groupId>
    <artifactId>helidon-reactive-health</artifactId>
</dependency>

Copied

Optional dependency to use built-in health checks:

<dependency>
    <groupId>io.helidon.health</groupId>
    <artifactId>helidon-health-checks</artifactId>
</dependency>

Copied

API

A health check is a Java functional interface that returns a HealthCheckResponse instance. You can choose to implement a health check inline with a lambda expression or you can reference a method with the double colon operator ::.

Health check with a lambda expression:

HealthCheck hc = () -> HealthCheckResponse
        .named("exampleHealthCheck")
        .up()
        .build();

Copied

Health check with method reference:

HealthCheckResponse exampleHealthCheck() {
    return HealthCheckResponse
        .named("exampleHealthCheck")
        .up()
        .build();
}
HealthCheck hc = this::exampleHealthCheck;

Copied

HealthSupport is a WebServer service that contains a collection of registered HealthCheck instances. When queried, it invokes the registered health check and returns a response with a status code representing the overall status of the application.

Health status codes

`200`	The application is healthy (with health check details in the response).
`204`	The application is healthy (with no health check details in the response).
`503`	The application is not healthy.
`500`	An error occurred while reporting the health.

HTTP GET responses include JSON content showing the detailed results of all the health checks which the server executed after receiving the request. HTTP HEAD requests return only the status with no payload.

The following code snippets show how to register health checks while building an instance of HealthSupport:

Create the health support service:

HealthSupport health = HealthSupport.builder()
    .addLiveness(hc)        // hc created above
    .build();

Copied

Create a custom health check:

HealthSupport health = HealthSupport.builder()
    .addLiveness(() -> HealthCheckResponse.named("exampleHealthCheck")
                 .up()
                 .withData("time", System.currentTimeMillis())
                 .build())
    .build();

Copied

The custom health check above returns a status of UP and the current time. After creating the HealthCheck and registering it in a HealthSupport, we must add the latter to the WebServer routes as follows:

Routing.builder()
        .register(health)
        .build();

Copied

Here is a sample response to the custom health check registered above:

JSON response:

{
    "status": "UP",
    "checks": [
        {
            "name": "exampleHealthCheck",
            "status": "UP",
            "data": {
                "time": 1546958376613
            }
        }
    ]
}

Copied

Balance collecting a lot of information with the need to avoid overloading the application and overwhelming users.

The following table provides a summary of the Health Check API classes.

Health check API classes

`org.eclipse.microprofile.health.HealthCheck`	Java functional interface representing the logic of a single health check
`org.eclipse.microprofile.health.HealthCheckResponse`	Result of a health check invocation that contains a status and a description.
`org.eclipse.microprofile.health.HealthCheckResponseBuilder`	Builder class to create `HealthCheckResponse` instances
`io.helidon.reactive.health.HealthSupport`	WebServer service that exposes `/health` and invokes the registered health checks
`io.helidon.reactive.health.HealthSupport.Builder`	Builder class to create `HealthSupport` instances

Built-In Health Checks

You can use Helidon-provided health checks to report various common health check statuses:

Built-in health check	Health check name	JavaDoc	Config properties	Default config value
deadlock detection †	`deadlock`	`DeadlockHealthCheck`	n/a	n/a
available disk space †	`diskSpace`	`DiskSpaceHealthCheck`	`health.checks.diskSpace.thresholdPercent` `health.checks.diskSpace.path`	`99.999` `/`
available heap memory	`heapMemory`	`HeapMemoryHealthCheck`	`health.checks.heapMemory.thresholdPercent`	`98`

† Helidon cannot support the indicated health checks in the GraalVM native image environment, so with native image those health checks do not appear in the health output.

The following code adds the default built-in health checks to your application:

HealthSupport health = HealthSupport.builder()
    .add(HealthChecks.healthChecks())   
    .build();

Routing.builder()
       .register(health)   
       .build();

Copied

Add built-in health checks using defaults (requires the helidon-health-checks dependency).
Register the created HealthSupport with web server routing (adds the /health endpoint).

You can control the thresholds for built-in health checks in either of two ways:

Create the health checks individually using their builders instead of using the HealthChecks convenience class. Follow the JavaDoc links in the table above.
Using configuration as explained in .

Kubernetes Probes

Probes is the term used by Kubernetes to describe health checks for containers (Kubernetes documentation).

There are three types of probes:

liveness: Indicates whether the container is running
readiness: Indicates whether the container is ready to service requests
startup: Indicates whether the application in the container has started

You can implement probes using the following mechanisms:

Running a command inside a container
Sending an HTTP request to a container
Opening a TCP socket to a container

A microservice exposed to HTTP traffic will typically implement both the liveness probe and the readiness probe using HTTP requests. If the microservice takes a significant time to initialize itself, you can also define a startup probe, in which case Kubernetes does not check liveness or readiness probes until the startup probe returns success.

You can configure several parameters for probes. The following are the most relevant parameters:

`initialDelaySeconds`	Number of seconds after the container has started before liveness or readiness probes are initiated.
`periodSeconds`	Probe interval. Default to 10 seconds. Minimum value is 1.
`timeoutSeconds`	Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1
`failureThreshold`	Number of consecutive failures after which the probe should stop. Default: 3. Minimum: 1.

Liveness Probe

The liveness probe is used to verify the container has become unresponsive. For example, it can be used to detect deadlocks or analyze heap usage. When Kubernetes gives up on a liveness probe, the corresponding pod is restarted.

The liveness probe can result in repeated restarts in certain cases. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. Repeated restarts can also occur if timeoutSeconds or periodSeconds is too low.

We recommend the following:

Avoid checking dependencies in a liveness probe.
Set timeoutSeconds to avoid excessive probe failures.
Acknowledge startup times with initialDelaySeconds.

Readiness Probe

The readiness probe is used to avoid routing requests to the pod until it is ready to accept traffic. When Kubernetes gives up on a readiness probe, the pod is not restarted, traffic is not routed to the pod anymore.

In certain cases, the readiness probe can cause all the pods to be removed from service routing. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. This issue can also occur if timeoutSeconds or periodSeconds is too low.

We recommend the following:

Be conservative when checking shared dependencies.
Be aggressive when checking local dependencies.
Set failureThreshold according to periodSeconds in order to accommodate temporary errors.

Startup Probe

The startup probe prevents Kubernetes from prematurely checking the other probes if the application takes a long time to start. Otherwise, Kubernetes might misinterpret a failed liveness or readiness probe and shut down the container when, in fact, the application is still coming up.

Troubleshooting Probes

Failed probes are recorded as events associated with their corresponding pods. The event message contains only the status code.

Get the events of a single pod:

POD_NAME=$(kubectl get pod -l app=acme -o jsonpath='{.items[0].metadata.name}') 
kubectl get event --field-selector involvedObject.name=${POD_NAME}

Copied

Get the effective pod name by filtering pods with the label app=acme.
Filter the events for the pod.

Create log messages in your health check implementation when setting a DOWN status. This will allow you to correlate the cause of a failed probe.

Configuration

Built-in health checks can be configured using the config property keys described in this table. Further, you can suppress one or more of the built-in health checks by setting the configuration item health.exclude to a comma-separated list of the health check names (from this table) you want to exclude.

Examples

JSON Response Example

Accessing the Helidon-provided /health endpoint reports the health of your application as shown below:

JSON response.

{
    "status": "UP",
    "checks": [
        {
            "name": "deadlock",
            "status": "UP"
        },
        {
            "name": "diskSpace",
            "status": "UP",
            "data": {
                "free": "211.00 GB",
                "freeBytes": 226563444736,
                "percentFree": "45.31%",
                "total": "465.72 GB",
                "totalBytes": 500068036608
            }
        },
        {
            "name": "heapMemory",
            "status": "UP",
            "data": {
                "free": "215.15 MB",
                "freeBytes": 225600496,
                "max": "3.56 GB",
                "maxBytes": 3817865216,
                "percentFree": "99.17%",
                "total": "245.50 MB",
                "totalBytes": 257425408
            }
        }
    ]
}

Copied

Kubernetes Example

This example shows the usage of the Helidon health API in an application that implements health endpoints for the liveness and readiness probes. Note that the application code dissociates the health endpoints from the default routes, so that the health endpoints are not exposed by the service. An example YAML specification is also provided for the Kubernetes service and deployment.

Application code:

Routing healthRouting = Routing.builder()
        .register(JsonSupport.create())
        .register(HealthSupport.builder()
                .webContext("/live") 
                .add(HealthChecks.healthChecks()) 
                .build())
        .register(HealthSupport.builder()
                .webContext("/ready") 
                .addReadiness(() -> HealthCheckResponse.named("database").up().build()) 
                .build())
        .build();

Routing defaultRouting = Routing.builder()
        .any((req, res) -> res.send("It works!")) 
        .build();

WebServer server = WebServer.builder(defaultRouting)
        .config(ServerConfiguration.builder()
                .port(8080) 
                .addSocket("health", SocketConfiguration.builder() 
                        .port(8081)
                        .build())
                .build())
        .addNamedRouting("health", healthRouting) 
        .build();

server.start();

Copied

The health service for the liveness probe is exposed at /live.
Using the built-in health checks for the liveness probe.
The health service for the readiness probe is exposed at /ready.
Using a custom health check for a pseudo database that is always UP.
The default route: returns It works! for any request.
The server uses port 8080 for the default routes.
A socket configuration named health using port 8081.
Route the health services exclusively on the health socket.

Kubernetes descriptor:

kind: Service
apiVersion: v1
metadata:
  name: acme 
  labels:
    app: acme
spec:
  type: NodePort
  selector:
    app: acme
  ports:
  - port: 8080
    targetPort: 8080
    name: http
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: acme 
spec:
  replicas: 1
  selector:
    matchLabels:
      app: acme
  template:
    metadata:
      name: acme
      labels:
        name: acme
    spec:
      containers:
      - name: acme
        image: acme
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /live 
            port: 8081
          initialDelaySeconds: 3 
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready 
            port: 8081
          initialDelaySeconds: 10 
          periodSeconds: 30
          timeoutSeconds: 10
---

Copied

A service of type NodePort that serves the default routes on port 8080.
A deployment with one replica of a pod.
The HTTP endpoint for the liveness probe.
The liveness probe configuration.
The HTTP endpoint for the readiness probe.
The readiness probe configuration.

Additional Information

Health Checks SE API JavaDocs.