Contents

Overview

It’s a good practice to monitor your microservice’s health to ensure that it is available and performs correctly. Applications implement health checks to expose health status that is collected at regular intervals by external tooling, such as orchestrators like Kubernetes. The orchestrator may then take action, such as restarting your application if the health check fails.

A typical health check combines the statuses of all the dependencies that affect availability and the ability to perform correctly:

  • Network Latency

  • Storage

  • Database

  • Other Services (used by your application)

Maven Coordinates

To enable Health Checks, add the following dependency to your project’s pom.xml (see Managing Dependencies).

<dependency>
    <groupId>io.helidon.webserver.observe</groupId>
    <artifactId>helidon-webserver-observe-health</artifactId>
</dependency>
Copied

Optional dependency to use built-in health checks:

<dependency>
    <groupId>io.helidon.health</groupId>
    <artifactId>helidon-health-checks</artifactId>
</dependency>
Copied

API

Enabling Health Support (and Built-in Health Checks) in Your Application

The health subsystem is part of the observability support. As a result, your application includes health support by default provided your project meets several conditions:

  • Your project depends on the helidon-webserver-observe-health component as described above.

  • (Optional) Your project depends on the helidon-health-checks component (if you want the built-in health checks).

  • Your code allows the webserver’s automatic feature discovery (enabled by default).

  • Your code allows the observe feature’s automatic observer discovery (also enabled by default).

If you disable either type of automatic discovery you can add the observe feature to the webserver explicitly, and you can add the health observer to the observe feature explicitly, customizing the behavior of each programmatically if you wish. You can also use configuration to tailor some of the behavior of the health component (such as changing the URI path from /observe/health to something else).

Writing Custom Health Checks

In many cases, the ability of your application to do its job depends on conditions known only to your application: for example, whether certain external resources such as databases are available. You can create custom health checks which reflect those conditions and add them to the overall health assessment of your application.

A health check is a Java functional interface that returns a new HealthCheckResponse instance each time Helidon queries the health check. Each health check also has a fixed name and a fixed health check type (start-up, liveness, or readiness).

Your code registers a custom health check by invoking a method on Helidon-provided types in one of the following ways:

  • Pass the name and type of the health check and a Supplier of a HealthCheckResponse such as a method reference or a lambda expression.

  • Pass an instance of a class which implements the HealthCheck interface.

Within an application different techniques might make sense for different custom health checks, depending on the complexity of the logic for computing the status for each check. The various styles are functionally equivalent; for a given custom health check choose the style which enhances the readability and clarity of your code. The examples below, in no particular order, implement the same custom health check functionality in different ways to illustrate.

Option 1: Using a HealthCheckResponse supplier method

If you gather the logic for computing the health check response into a method, then you can use a method reference to register the health check.

Declaring a health check response supplier method
static HealthCheckResponse slowStartLivenessResponse() {
    long now = System.currentTimeMillis();
    return HealthCheckResponse.builder()
            .detail("time", now)
            .status(now - serverStartTime >= 8000)
            .build();
}
Copied
Registering a health check using a method reference
ObserveFeature observe = ObserveFeature.builder()
        .config(config.get("server.features.observe")) 
        .addObserver(HealthObserver.builder() 
                             .useSystemServices(true) 
                             .addCheck(Main::slowStartLivenessResponse, 
                                       HealthCheckType.LIVENESS, 
                                       "live-after-8-seconds") 
                             .build())
        .build();
Copied
  • Apply configuration to auto-discovered observers (e.g., health, metrics).
  • Augment the web server by adding the ObserveFeature containing the HealthObserver. This replaces the auto-discovered health observer.
  • Include the Helidon-supplied health checks.
  • Add the custom health check, passing a reference to the method which returns the health check responses.
  • Set the type of the custom health check.
  • Set the name of the custom health check.

Option 2: Using an in-line lambda expression

If the logic for computing the health check response is fairly simple, express it as an in-line lambda when you register the health check.

Registering a health check using an in-line lambda expression
ObserveFeature observe = ObserveFeature.builder()
        .config(config.get("server.features.observe"))
        .addObserver(HealthObserver.builder() 
                             .useSystemServices(true) // Include Helidon-provided health checks.
                             .addCheck(() -> HealthCheckResponse.builder() 
                                               .status(System.currentTimeMillis() - serverStartTime >= 8000) 
                                               .detail("time", System.currentTimeMillis()) 
                                               .build(), 
                                       HealthCheckType.READINESS, 
                                       "live-after-8-seconds") 
                             .build())
        .build();
Copied
  • Augment the web server by adding the ObserveFeature containing the HealthObserver.
  • Add the custom health check passing a lambda expression supplying the health check response.
  • In the lambda, set the health check response status.
  • Still in the lambda, set a detail associated with the health check response.
  • Still in the lambda, build the health check response.
  • Set the type of the custom health check.
  • Set the name of the custom health check.

Note that the logic in the lambda expression runs every time Helidon probes the added health check, so the values passed to status and detail are recomputed every time.

Option 3: Using a HealthCheck Instance

If a custom health check requires a lot of information to compute its health check response, it might be clearest to implement it as a class that implements the HealthCheck interface. Your code instantiates the class with all the information, including references to other data, it might need to compute the response each time Helidon probes it.

This example is not complicated in that way, but it’s useful to illustrate this technique of writing a custom health check.

Declaring a concrete HealthCheck implementation
/**
 * A custom readiness health check that reports UP 8 seconds after server start-up.
 */
class SlowStartHealthCheck implements HealthCheck { 

    @Override
    public HealthCheckType type() {
        return HealthCheckType.READINESS; 
    }

    @Override
    public HealthCheckResponse call() {
        long now = System.currentTimeMillis();
        return HealthCheckResponse.builder()
                .detail("time", now) 
                .status(now - serverStartTime >= 8000) 
                .build();
    }
}
Copied
  • Implement the io.helidon.health.HealthCheck interface. The default health check name is the simple class name of the implementing class. Your code can override the name() method to return a different name. (Not shown in this example)
  • The default health check type is LIVENESS so this implementation overrides type() to declare a READINESS check.
  • Sets a detail value time associated with the response to the current time.
  • Reports DOWN until at least eight seconds have passed since the server start-up, then reports UP thereafter.
Registering a HealthCheck instance
ObserveFeature observe = ObserveFeature.builder()
        .config(config.get("server.features.observe"))
        .addObserver(HealthObserver.builder() 
                             .addCheck(new SlowStartHealthCheck()) 
                             .build())
        .build();
Copied
  • Augment the web server by adding the ObserveFeature containing the HealthObserver.
  • Instantiate the custom health check class and add the instance to the HealthObserver.

Adding Observability (including the Custom Health Checks) to Helidon

The code examples above prepare the observe feature instance using the built-in and custom health checks. To activate the health subsystem and other auto-discovered observability subsystems, add that observe instance as a feature to the webserver and start the server.

Register the observe feature with the server and start it
WebServer server = WebServer.builder()
        .featuresDiscoverServices(false)
        .addFeature(observe) 
        .routing(Main::routing)
        .build()
        .start();
Copied
  • Add the previously-prepared health observer to the server as a feature

Triggering and Interpreting Health Check Output

Health support in Helidon is part of the observability feature. HealthObserver is a Helidon-provided observability implementation that contains a collection of registered HealthCheck instances and, when queried, invokes the registered health checks and returns a response with a status code representing the overall status of the application.

Health status codes
200The application is healthy (with health check details in the response).
204The application is healthy (with no health check details in the response).
503The application is not healthy.
500An error occurred while reporting the health.

You control, either using configuration or adding code to your application, whether the HTTP responses to GET requests contain detailed information about each health check. With details enabled, HTTP GET responses include JSON content showing the detailed results of all the health checks which the server executed after receiving the request. With details disabled, HTTP GET responses have no payload. HTTP HEAD requests always return only the status with no payload.

If you add the Helidon health dependency to your pom.xml file, Helidon automatically registers the HelidonObserver service and responds to the default /observe/health endpoint. Further, if you add the built-in health checks dependency, Helidon automatically finds them and adds those checks to the HealthObserver.

Below are parts of health responses which include the custom health check added in the earlier example code. This first response shows the health output within the first eight seconds after start-up. Recall that the custom health check will report DOWN during that time, so the overall health is DOWN and the HTTP response status is 503 Service Unavailable.

Response within 8 seconds: HTTP status 503 (not healthy)
{
  "status": "DOWN",
  "checks": [
    {
      "name": "live-after-8-seconds",
      "status": "DOWN",
      "data": {
        "time": 1701984253071
      }
    }
  ]
}
Copied

The next response shows the health output once the server has been running for at least eight seconds. The custom health check now reports UP so the overall health status is also UP now and the HTTP status is 200.

Response after 8 seconds: HTTP status 200
{
  "status": "UP",
  "checks": [
    {
      "name": "live-after-8-seconds",
      "status": "UP",
      "data": {
        "time": 1701984258292
      }
    }
  ]
}
Copied

Balance collecting a lot of information with the need to avoid overloading the application and overwhelming users.

The following table provides a summary of the Health Check API classes.

Health check API classes
io.helidon.health.HealthCheckJava functional interface representing the logic of a single health check
io.helidon.health.HealthCheckResponseResult of a health check invocation that contains a status
io.helidon.webserver.observe.health.HealthObserverWebServer service that exposes /observe/health and invokes the registered health checks

Built-In Health Checks

You can use Helidon-provided health checks to report various common health check statuses:

Built-in health checkHealth check nameJavaDocConfig properties (within server.features.observe.observers.health)Default config value
deadlock detection †deadlockDeadlockHealthCheckn/an/a
available disk space †diskSpaceDiskSpaceHealthCheckhelidon.health.diskSpace.thresholdPercent99.999
helidon.health.diskSpace.path/
available heap memoryheapMemoryHeapMemoryHealthCheckhelidon.health.heapMemory.thresholdPercent98

† Helidon cannot support the indicated health checks in the GraalVM native image environment, so with native image those health checks do not appear in the health output.

Simply adding the built-in health check dependency is sufficient to register all the built-in health checks automatically. If you want to use only some of the built-in checks in your application, you can disable automatic discovery of the built-in health checks and register only the ones you want.

The following code adds only selected built-in health checks to your application:

Adding selected built-in health checks
WebServer server = WebServer.builder()
        .config(config.get("server"))
        .addFeature(ObserveFeature.create(HealthObserver.builder()
                                                  .useSystemServices(false) 
                                                  .addCheck(HealthChecks.deadlockCheck()) 
                                                  .addCheck(hc) 
                                                  .details(true)
                                                  .build()))
        .routing(Main::routing)
        .build()
        .start();
Copied
  • Disables automatic registration of the built-in health checks.
  • Adds the specific built-in check(s) you want.
  • Adds a custom check (in a previously-prepared variable hc).

You can control the thresholds for built-in health checks in either of two ways:

  • Create the health checks individually using their builders instead of using the HealthChecks convenience class. Follow the JavaDoc links in the table above.

  • Using configuration as explained in Configuration.

Kubernetes Probes

Probes is the term used by Kubernetes to describe health checks for containers (Kubernetes documentation).

There are three types of probes:

  • liveness: Indicates whether the container is running

  • readiness: Indicates whether the container is ready to service requests

  • startup: Indicates whether the application in the container has started

You can implement probes using the following mechanisms:

  1. Running a command inside a container
  2. Sending an HTTP request to a container
  3. Opening a TCP socket to a container

A microservice exposed to HTTP traffic will typically implement both the liveness probe and the readiness probe using HTTP requests. If the microservice takes a significant time to initialize itself, you can also define a startup probe, in which case Kubernetes does not check liveness or readiness probes until the startup probe returns success.

You can configure several parameters for probes. The following are the most relevant parameters:

initialDelaySecondsNumber of seconds after the container has started before liveness or readiness probes are initiated.
periodSecondsProbe interval. Default to 10 seconds. Minimum value is 1.
timeoutSecondsNumber of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1
failureThresholdNumber of consecutive failures after which the probe should stop. Default: 3. Minimum: 1.

Liveness Probe

The liveness probe is used to verify the container has become unresponsive. For example, it can be used to detect deadlocks or analyze heap usage. When Kubernetes gives up on a liveness probe, the corresponding pod is restarted.

The liveness probe can result in repeated restarts in certain cases. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. Repeated restarts can also occur if timeoutSeconds or periodSeconds is too low.

We recommend the following:

  • Avoid checking dependencies in a liveness probe.

  • Set timeoutSeconds to avoid excessive probe failures.

  • Acknowledge startup times with initialDelaySeconds.

Readiness Probe

The readiness probe is used to avoid routing requests to the pod until it is ready to accept traffic. When Kubernetes gives up on a readiness probe, the pod is not restarted, traffic is not routed to the pod anymore.

In certain cases, the readiness probe can cause all the pods to be removed from service routing. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. This issue can also occur if timeoutSeconds or periodSeconds is too low.

We recommend the following:

  • Be conservative when checking shared dependencies.

  • Be aggressive when checking local dependencies.

  • Set failureThreshold according to periodSeconds in order to accommodate temporary errors.

Startup Probe

The startup probe prevents Kubernetes from prematurely checking the other probes if the application takes a long time to start. Otherwise, Kubernetes might misinterpret a failed liveness or readiness probe and shut down the container when, in fact, the application is still coming up.

Troubleshooting Probes

Failed probes are recorded as events associated with their corresponding pods. The event message contains only the status code.

Get the events of a single pod:
POD_NAME=$(kubectl get pod -l app=acme -o jsonpath='{.items[0].metadata.name}') 
kubectl get event --field-selector involvedObject.name=${POD_NAME} 
Copied
  • Get the effective pod name by filtering pods with the label app=acme.
  • Filter the events for the pod.

Create log messages in your health check implementation when setting a DOWN status. This will allow you to correlate the cause of a failed probe.

Configuration

Built-in health checks can be configured using the config property keys described in this table.

Further, you can suppress one or more health checks by setting the configuration item server.features.observe.observers.health.exclude to a comma-separated list of the health check names you want to exclude. The same table lists the name names for the built-in health checks.

Examples

JSON Response Example

Accessing the Helidon-provided /observe/health endpoint reports the health of your application as shown below:

JSON response:
{
    "status": "UP",
    "checks": [
        {
            "name": "deadlock",
            "status": "UP"
        },
        {
            "name": "diskSpace",
            "status": "UP",
            "data": {
                "free": "211.00 GB",
                "freeBytes": 226563444736,
                "percentFree": "45.31%",
                "total": "465.72 GB",
                "totalBytes": 500068036608
            }
        },
        {
            "name": "heapMemory",
            "status": "UP",
            "data": {
                "free": "215.15 MB",
                "freeBytes": 225600496,
                "max": "3.56 GB",
                "maxBytes": 3817865216,
                "percentFree": "99.17%",
                "total": "245.50 MB",
                "totalBytes": 257425408
            }
        }
    ]
}
Copied

Kubernetes Example

This example shows the usage of the Helidon health API in an application that implements health endpoints for the liveness and readiness probes. Note that the application code dissociates the health endpoints from the default routes, so that the health endpoints are not exposed by the service. An example YAML specification is also provided for the Kubernetes service and deployment.

Application code:
ObserveFeature observeFeature = ObserveFeature.builder()
        .addObserver(HealthObserver.builder()
                             .useSystemServices(false)
                             .endpoint("/health/live") 
                             .addChecks(HealthChecks.healthChecks()) 
                             .build())
        .addObserver(HealthObserver.builder()
                             .useSystemServices(false)
                             .endpoint("/health/ready") 
                             .addCheck(() -> HealthCheckResponse.builder() 
                                               .status(true)
                                               .build(),
                                       HealthCheckType.READINESS,
                                       "database")
                             .build())
        .sockets(List.of("observe")) 
        .build();
WebServer server = WebServer.builder()
        .putSocket("@default", socket -> socket
                .port(8080) 
                .routing(r -> r.any((req, res) -> res.send("It works!")))) 
        .addFeature(observeFeature)
        .putSocket("observe", socket -> socket
                .port(8081)) 
        .build()
        .start();
Copied
  • The health service for the liveness probe is exposed at /health/live.
  • Using the built-in health checks for the liveness probe.
  • The health service for the readiness probe is exposed at /health/ready.
  • Using a custom health check for a pseudo database that is always UP.
  • Route the observe feature exclusively on the observe socket.
  • The default socket uses port 8080 for the default routes.
  • The default route: returns It works! for any request.
  • The observe socket uses port 8081 for the "/observe" routes.
Kubernetes descriptor:
kind: Service
apiVersion: v1
metadata:
  name: acme 
  labels:
    app: acme
spec:
  type: NodePort
  selector:
    app: acme
  ports:
  - port: 8080
    targetPort: 8080
    name: http
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: acme 
spec:
  replicas: 1
  selector:
    matchLabels:
      app: acme
  template:
    metadata:
      name: acme
      labels:
        name: acme
    spec:
      containers:
      - name: acme
        image: acme
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /observe/health/live 
            port: 8081
          initialDelaySeconds: 3 
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /observe/health/ready 
            port: 8081
          initialDelaySeconds: 10 
          periodSeconds: 30
          timeoutSeconds: 10
---
Copied
  • A service of type NodePort that serves the default routes on port 8080.
  • A deployment with one replica of a pod.
  • The HTTP endpoint for the liveness probe.
  • The liveness probe configuration.
  • The HTTP endpoint for the readiness probe.
  • The readiness probe configuration.

Additional Information