Contents
Overview
It’s a good practice to monitor your microservice’s health to ensure that it is available and performs correctly. Applications implement health checks to expose health status that is collected at regular intervals by external tooling, such as orchestrators like Kubernetes. The orchestrator may then take action, such as restarting your application if the health check fails.
A typical health check combines the statuses of all the dependencies that affect availability and the ability to perform correctly:
Network Latency
Storage
Database
Other Services (used by your application)
Maven Coordinates
To enable Health Checks, add the following dependency to your project’s pom.xml (see Managing Dependencies).
<dependency>
<groupId>io.helidon.webserver.observe</groupId>
<artifactId>helidon-webserver-observe-health</artifactId>
</dependency>Optional dependency to use built-in health checks:
<dependency>
<groupId>io.helidon.health</groupId>
<artifactId>helidon-health-checks</artifactId>
</dependency>API
Enabling Health Support (and Built-in Health Checks) in Your Application
The health subsystem is part of the observability support. As a result, your application includes health support by default provided your project meets several conditions:
Your project depends on the
helidon-webserver-observe-healthcomponent as described above.(Optional) Your project depends on the
helidon-health-checkscomponent (if you want the built-in health checks).Your code allows the webserver’s automatic feature discovery (enabled by default).
Your code allows the observe feature’s automatic observer discovery (also enabled by default).
If you disable either type of automatic discovery you can add the observe feature to the webserver explicitly, and you can add the health observer to the observe feature explicitly, customizing the behavior of each programmatically if you wish. You can also use configuration to tailor some of the behavior of the health component (such as changing the URI path from /observe/health to something else).
Writing Custom Health Checks
In many cases, the ability of your application to do its job depends on conditions known only to your application: for example, whether certain external resources such as databases are available. You can create custom health checks which reflect those conditions and add them to the overall health assessment of your application.
A health check is a Java functional interface that returns a new HealthCheckResponse instance each time Helidon queries the health check. Each health check also has a fixed name and a fixed health check type (start-up, liveness, or readiness).
Your code registers a custom health check by invoking a method on Helidon-provided types in one of the following ways:
Pass the name and type of the health check and a
Supplierof aHealthCheckResponsesuch as a method reference or a lambda expression.Pass an instance of a class which implements the
HealthCheckinterface.
Within an application different techniques might make sense for different custom health checks, depending on the complexity of the logic for computing the status for each check. The various styles are functionally equivalent; for a given custom health check choose the style which enhances the readability and clarity of your code. The examples below, in no particular order, implement the same custom health check functionality in different ways to illustrate.
Option 1: Using a HealthCheckResponse supplier method
If you gather the logic for computing the health check response into a method, then you can use a method reference to register the health check.
static HealthCheckResponse slowStartLivenessResponse() {
long now = System.currentTimeMillis();
return HealthCheckResponse.builder()
.detail("time", now)
.status(now - serverStartTime >= 8000)
.build();
}ObserveFeature observe = ObserveFeature.builder()
.config(config.get("server.features.observe"))
.addObserver(HealthObserver.builder()
.useSystemServices(true)
.addCheck(Main::slowStartLivenessResponse,
HealthCheckType.LIVENESS,
"live-after-8-seconds")
.build())
.build();- Apply configuration to auto-discovered observers (e.g., health, metrics).
- Augment the web server by adding the
ObserveFeaturecontaining theHealthObserver. This replaces the auto-discovered health observer. - Include the Helidon-supplied health checks.
- Add the custom health check, passing a reference to the method which returns the health check responses.
- Set the type of the custom health check.
- Set the name of the custom health check.
Option 2: Using an in-line lambda expression
If the logic for computing the health check response is fairly simple, express it as an in-line lambda when you register the health check.
ObserveFeature observe = ObserveFeature.builder()
.config(config.get("server.features.observe"))
.addObserver(HealthObserver.builder()
.useSystemServices(true) // Include Helidon-provided health checks.
.addCheck(() -> HealthCheckResponse.builder()
.status(System.currentTimeMillis() - serverStartTime >= 8000)
.detail("time", System.currentTimeMillis())
.build(),
HealthCheckType.READINESS,
"live-after-8-seconds")
.build())
.build();- Augment the web server by adding the
ObserveFeaturecontaining theHealthObserver. - Add the custom health check passing a lambda expression supplying the health check response.
- In the lambda, set the health check response status.
- Still in the lambda, set a detail associated with the health check response.
- Still in the lambda, build the health check response.
- Set the type of the custom health check.
- Set the name of the custom health check.
Note that the logic in the lambda expression runs every time Helidon probes the added health check, so the values passed to status and detail are recomputed every time.
Option 3: Using a HealthCheck Instance
If a custom health check requires a lot of information to compute its health check response, it might be clearest to implement it as a class that implements the HealthCheck interface. Your code instantiates the class with all the information, including references to other data, it might need to compute the response each time Helidon probes it.
This example is not complicated in that way, but it’s useful to illustrate this technique of writing a custom health check.
HealthCheck implementation/**
* A custom readiness health check that reports UP 8 seconds after server start-up.
*/
class SlowStartHealthCheck implements HealthCheck {
@Override
public HealthCheckType type() {
return HealthCheckType.READINESS;
}
@Override
public HealthCheckResponse call() {
long now = System.currentTimeMillis();
return HealthCheckResponse.builder()
.detail("time", now)
.status(now - serverStartTime >= 8000)
.build();
}
}- Implement the
io.helidon.health.HealthCheckinterface. The default health check name is the simple class name of the implementing class. Your code can override thename()method to return a different name. (Not shown in this example) - The default health check type is
LIVENESSso this implementation overridestype()to declare aREADINESScheck. - Sets a detail value
timeassociated with the response to the current time. - Reports
DOWNuntil at least eight seconds have passed since the server start-up, then reportsUPthereafter.
HealthCheck instanceObserveFeature observe = ObserveFeature.builder()
.config(config.get("server.features.observe"))
.addObserver(HealthObserver.builder()
.addCheck(new SlowStartHealthCheck())
.build())
.build();- Augment the web server by adding the
ObserveFeaturecontaining theHealthObserver. - Instantiate the custom health check class and add the instance to the
HealthObserver.
Adding Observability (including the Custom Health Checks) to Helidon
The code examples above prepare the observe feature instance using the built-in and custom health checks. To activate the health subsystem and other auto-discovered observability subsystems, add that observe instance as a feature to the webserver and start the server.
WebServer server = WebServer.builder()
.featuresDiscoverServices(false)
.addFeature(observe)
.routing(Main::routing)
.build()
.start();- Add the previously-prepared health observer to the server as a feature
Triggering and Interpreting Health Check Output
Health support in Helidon is part of the observability feature. HealthObserver is a Helidon-provided observability implementation that contains a collection of registered HealthCheck instances and, when queried, invokes the registered health checks and returns a response with a status code representing the overall status of the application.
200 | The application is healthy (with health check details in the response). |
204 | The application is healthy (with no health check details in the response). |
503 | The application is not healthy. |
500 | An error occurred while reporting the health. |
You control, either using configuration or adding code to your application, whether the HTTP responses to GET requests contain detailed information about each health check. With details enabled, HTTP GET responses include JSON content showing the detailed results of all the health checks which the server executed after receiving the request. With details disabled, HTTP GET responses have no payload. HTTP HEAD requests always return only the status with no payload.
If you add the Helidon health dependency to your pom.xml file, Helidon automatically registers the HelidonObserver service and responds to the default /observe/health endpoint. Further, if you add the built-in health checks dependency, Helidon automatically finds them and adds those checks to the HealthObserver.
Below are parts of health responses which include the custom health check added in the earlier example code. This first response shows the health output within the first eight seconds after start-up. Recall that the custom health check will report DOWN during that time, so the overall health is DOWN and the HTTP response status is 503 Service Unavailable.
{
"status": "DOWN",
"checks": [
{
"name": "live-after-8-seconds",
"status": "DOWN",
"data": {
"time": 1701984253071
}
}
]
}The next response shows the health output once the server has been running for at least eight seconds. The custom health check now reports UP so the overall health status is also UP now and the HTTP status is 200.
{
"status": "UP",
"checks": [
{
"name": "live-after-8-seconds",
"status": "UP",
"data": {
"time": 1701984258292
}
}
]
}Balance collecting a lot of information with the need to avoid overloading the application and overwhelming users.
The following table provides a summary of the Health Check API classes.
io.helidon.health.HealthCheck | Java functional interface representing the logic of a single health check |
io.helidon.health.HealthCheckResponse | Result of a health check invocation that contains a status |
io.helidon.webserver.observe.health.HealthObserver | WebServer service that exposes /observe/health and invokes the registered health checks |
Built-In Health Checks
You can use Helidon-provided health checks to report various common health check statuses:
| Built-in health check | Health check name | JavaDoc | Config properties (within server.features.observe.observers.health) | Default config value |
|---|---|---|---|---|
| deadlock detection † | deadlock | DeadlockHealthCheck | n/a | n/a |
| available disk space † | diskSpace | DiskSpaceHealthCheck | helidon.health.diskSpace.thresholdPercent | 99.999 |
helidon.health.diskSpace.path | / | |||
| available heap memory | heapMemory | HeapMemoryHealthCheck | helidon.health.heapMemory.thresholdPercent | 98 |
† Helidon cannot support the indicated health checks in the GraalVM native image environment, so with native image those health checks do not appear in the health output.
Simply adding the built-in health check dependency is sufficient to register all the built-in health checks automatically. If you want to use only some of the built-in checks in your application, you can disable automatic discovery of the built-in health checks and register only the ones you want.
The following code adds only selected built-in health checks to your application:
WebServer server = WebServer.builder()
.config(config.get("server"))
.addFeature(ObserveFeature.create(HealthObserver.builder()
.useSystemServices(false)
.addCheck(HealthChecks.deadlockCheck())
.addCheck(hc)
.details(true)
.build()))
.routing(Main::routing)
.build()
.start();- Disables automatic registration of the built-in health checks.
- Adds the specific built-in check(s) you want.
- Adds a custom check (in a previously-prepared variable
hc).
You can control the thresholds for built-in health checks in either of two ways:
Create the health checks individually using their builders instead of using the
HealthChecksconvenience class. Follow the JavaDoc links in the table above.Using configuration as explained in Configuration.
Kubernetes Probes
Probes is the term used by Kubernetes to describe health checks for containers (Kubernetes documentation).
There are three types of probes:
liveness: Indicates whether the container is running
readiness: Indicates whether the container is ready to service requests
startup: Indicates whether the application in the container has started
You can implement probes using the following mechanisms:
- Running a command inside a container
- Sending an
HTTPrequest to a container - Opening a
TCPsocket to a container
A microservice exposed to HTTP traffic will typically implement both the liveness probe and the readiness probe using HTTP requests. If the microservice takes a significant time to initialize itself, you can also define a startup probe, in which case Kubernetes does not check liveness or readiness probes until the startup probe returns success.
You can configure several parameters for probes. The following are the most relevant parameters:
initialDelaySeconds | Number of seconds after the container has started before liveness or readiness probes are initiated. |
periodSeconds | Probe interval. Default to 10 seconds. Minimum value is 1. |
timeoutSeconds | Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1 |
failureThreshold | Number of consecutive failures after which the probe should stop. Default: 3. Minimum: 1. |
Liveness Probe
The liveness probe is used to verify the container has become unresponsive. For example, it can be used to detect deadlocks or analyze heap usage. When Kubernetes gives up on a liveness probe, the corresponding pod is restarted.
The liveness probe can result in repeated restarts in certain cases. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. Repeated restarts can also occur if timeoutSeconds or periodSeconds is too low.
We recommend the following:
Avoid checking dependencies in a liveness probe.
Set
timeoutSecondsto avoid excessive probe failures.Acknowledge startup times with
initialDelaySeconds.
Readiness Probe
The readiness probe is used to avoid routing requests to the pod until it is ready to accept traffic. When Kubernetes gives up on a readiness probe, the pod is not restarted, traffic is not routed to the pod anymore.
In certain cases, the readiness probe can cause all the pods to be removed from service routing. For example, if the probe is implemented to check all the dependencies strictly, then it can fail repeatedly for temporary issues. This issue can also occur if timeoutSeconds or periodSeconds is too low.
We recommend the following:
Be conservative when checking shared dependencies.
Be aggressive when checking local dependencies.
Set
failureThresholdaccording toperiodSecondsin order to accommodate temporary errors.
Startup Probe
The startup probe prevents Kubernetes from prematurely checking the other probes if the application takes a long time to start. Otherwise, Kubernetes might misinterpret a failed liveness or readiness probe and shut down the container when, in fact, the application is still coming up.
Troubleshooting Probes
Failed probes are recorded as events associated with their corresponding pods. The event message contains only the status code.
POD_NAME=$(kubectl get pod -l app=acme -o jsonpath='{.items[0].metadata.name}')
kubectl get event --field-selector involvedObject.name=${POD_NAME} - Get the effective pod name by filtering pods with the label
app=acme. - Filter the events for the pod.
Create log messages in your health check implementation when setting a DOWN status. This will allow you to correlate the cause of a failed probe.
Configuration
Built-in health checks can be configured using the config property keys described in this table.
Further, you can suppress one or more health checks by setting the configuration item server.features.observe.observers.health.exclude to a comma-separated list of the health check names you want to exclude. The same table lists the name names for the built-in health checks.
Examples
JSON Response Example
Accessing the Helidon-provided /observe/health endpoint reports the health of your application as shown below:
{
"status": "UP",
"checks": [
{
"name": "deadlock",
"status": "UP"
},
{
"name": "diskSpace",
"status": "UP",
"data": {
"free": "211.00 GB",
"freeBytes": 226563444736,
"percentFree": "45.31%",
"total": "465.72 GB",
"totalBytes": 500068036608
}
},
{
"name": "heapMemory",
"status": "UP",
"data": {
"free": "215.15 MB",
"freeBytes": 225600496,
"max": "3.56 GB",
"maxBytes": 3817865216,
"percentFree": "99.17%",
"total": "245.50 MB",
"totalBytes": 257425408
}
}
]
}Kubernetes Example
This example shows the usage of the Helidon health API in an application that implements health endpoints for the liveness and readiness probes. Note that the application code dissociates the health endpoints from the default routes, so that the health endpoints are not exposed by the service. An example YAML specification is also provided for the Kubernetes service and deployment.
ObserveFeature observeFeature = ObserveFeature.builder()
.addObserver(HealthObserver.builder()
.useSystemServices(false)
.endpoint("/health/live")
.addChecks(HealthChecks.healthChecks())
.build())
.addObserver(HealthObserver.builder()
.useSystemServices(false)
.endpoint("/health/ready")
.addCheck(() -> HealthCheckResponse.builder()
.status(true)
.build(),
HealthCheckType.READINESS,
"database")
.build())
.sockets(List.of("observe"))
.build();
WebServer server = WebServer.builder()
.putSocket("@default", socket -> socket
.port(8080)
.routing(r -> r.any((req, res) -> res.send("It works!"))))
.addFeature(observeFeature)
.putSocket("observe", socket -> socket
.port(8081))
.build()
.start();- The health service for the
livenessprobe is exposed at/health/live. - Using the built-in health checks for the
livenessprobe. - The health service for the
readinessprobe is exposed at/health/ready. - Using a custom health check for a pseudo database that is always
UP. - Route the
observefeature exclusively on theobservesocket. - The default socket uses port 8080 for the default routes.
- The default route: returns It works! for any request.
- The
observesocket uses port 8081 for the "/observe" routes.
kind: Service
apiVersion: v1
metadata:
name: acme
labels:
app: acme
spec:
type: NodePort
selector:
app: acme
ports:
- port: 8080
targetPort: 8080
name: http
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: acme
spec:
replicas: 1
selector:
matchLabels:
app: acme
template:
metadata:
name: acme
labels:
name: acme
spec:
containers:
- name: acme
image: acme
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /observe/health/live
port: 8081
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
readinessProbe:
httpGet:
path: /observe/health/ready
port: 8081
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 10
---- A service of type
NodePortthat serves the default routes on port8080. - A deployment with one replica of a pod.
- The HTTP endpoint for the liveness probe.
- The liveness probe configuration.
- The HTTP endpoint for the readiness probe.
- The readiness probe configuration.