feat(health): add runtime‑agnostic Health SPI and Run /health endpoint#1582
feat(health): add runtime‑agnostic Health SPI and Run /health endpoint#1582Tomnm1 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a runtime-agnostic health check system for Operaton by adding a new Health SPI and implementing a /health endpoint across deployment models.
- Creates core health abstractions (
HealthService,HealthResult,FrontendHealthContributor) in the engine module - Implements
DefaultHealthServiceto check database connectivity, job executor status, and frontend availability - Integrates health checks into Spring Boot via
OperatonHealthIndicatorfor Actuator compatibility - Exposes a standalone
/healthendpoint in Operaton Run viaHealthController
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
engine/src/main/java/org/operaton/bpm/health/HealthService.java |
New SPI interface for health check operations |
engine/src/main/java/org/operaton/bpm/health/HealthResult.java |
Immutable record holding health check results |
engine/src/main/java/org/operaton/bpm/health/FrontendHealthContributor.java |
SPI interface for frontend health contributions |
engine/src/main/java/org/operaton/bpm/health/DefaultHealthService.java |
Default implementation checking database, job executor, and frontend |
spring-boot-starter/starter/src/main/java/org/operaton/bpm/spring/boot/starter/OperatonBpmHealthServiceConfiguration.java |
Auto-configuration registering DefaultHealthService bean |
spring-boot-starter/starter/src/main/java/org/operaton/bpm/spring/boot/starter/actuator/OperatonHealthIndicator.java |
Actuator adapter delegating to HealthService |
spring-boot-starter/starter/src/main/java/org/operaton/bpm/spring/boot/starter/OperatonBpmActuatorConfiguration.java |
Registers OperatonHealthIndicator bean |
spring-boot-starter/starter/src/main/java/org/operaton/bpm/spring/boot/starter/OperatonBpmAutoConfiguration.java |
Imports health service configuration |
spring-boot-starter/starter-webapp-core/src/main/java/org/operaton/bpm/spring/boot/starter/webapp/SpringWebappFrontendHealthContributor.java |
Implementation checking webapp resource availability |
spring-boot-starter/starter-webapp-core/src/main/java/org/operaton/bpm/spring/boot/starter/webapp/OperatonBpmWebappAutoConfiguration.java |
Registers webapp frontend health contributor |
distro/run/core/src/main/java/org/operaton/bpm/run/health/HealthController.java |
REST controller exposing /health endpoint |
distro/run/core/src/test/java/org/operaton/bpm/run/test/health/HealthEndpointTest.java |
Integration test for /health endpoint |
| frontend = new LinkedHashMap<>(frontendHealthContributor.frontendDetails()); | ||
| } else { | ||
| frontend = new LinkedHashMap<>(); | ||
| frontend.put("operational", "unknown"); |
There was a problem hiding this comment.
The "operational" field is documented to be a boolean in the FrontendHealthContributor interface (line 31 comment), but here it's set to the string "unknown". This inconsistency can confuse API consumers. Consider using a boolean with a separate field for "status" or consistently using strings.
| frontend.put("operational", "unknown"); | |
| frontend.put("operational", false); |
There was a problem hiding this comment.
seems legit. please rethink how this case should be handled. Either do not set the value, set it to a boolean, or introduce a tri-state.
There was a problem hiding this comment.
After reconsideration, I’d say keeping it as a boolean is a reasonable choice. Since we first check whether we have a frontendHealthContributor—meaning, colloquially, “do I even have a frontend?”—we can assume that if there is no frontend, then it is not operational. On the other hand, using a tri-state value would provide clearer information about whether the frontend actually exists
kthoms
left a comment
There was a problem hiding this comment.
In general, a very valuable addition. Let's make this now ready. There are some nitpicks from Copilot and a general discussion related the package structure.
In general for such an addition extensive tests are required. There should be at least one integration test in spring-boot-starter/starter/src/test/java. And unit tests for each added component.
Please press "Re-request Review" when you are ready for the next review.
There was a problem hiding this comment.
The REST API is usually defined in the engine-rest API. This would make the health endpoint just available for the Operaton distribution.
I think that is fine, as installations on Tomcat or Wildfly would have their own endpoints. Am I correct?
There was a problem hiding this comment.
That's correct — only the Operation Run distro keeps the HTTP health. The only thing I'm considering now is whether it should be made ConditionalOnProperty, since that would give users the option to disable it if they want. On the other hand, I think we should assume that the health check should be enabled by default.
There was a problem hiding this comment.
Sounds good. Having it ConditionalOnProperty where the property is assumed to be true by default seems good.
There was a problem hiding this comment.
IMO this should be merged into ProcessEngineHealthIndicator.
The JobExecutorHealthIndicator could be made obsolete, but its check goes beyond what the HealthService is adding. Maybe the HealthService should be extended?
There was a problem hiding this comment.
The package structure should be changed. The current package is too high-level and details are not hidden.
I think the both interfaces and the result record should go to org.operaton.bpm.engine.health and the DefaultHealthService to org.operaton.bpm.engine.impl.health.
Also mark the public interfaces/record with "since 1.1" in Javadoc
javahippie
left a comment
There was a problem hiding this comment.
Hi, thanks for your contribution!
I'm wondering if the way this health check is structured is grouping too many things together. A database health check and the engine health check might be better treated as two different things. The engine can only be healthy if the database is healthy.
As an example: If I were to write my own Spring Boot application using the Operaton Starter, I already have an actuator in place which does readiness and liveness checks on the database. Duplicating this database health check into the DefaultHealthService would then not be necessary, because another subsystem is already taking care of this. I could reuse this in my health check, stating "Operaton can only be healthy if the database is healthy".
For the service checks themselves, I'm worndering how the check for the JobExecutor and the check for the Webapps could ever be negative. Have you checked if it is even possible to start an Operaton based application if the JobExecutor cannot be created? This is a central part of the system and my guess would be that the whole engine initialization failed in this case – including the HealthService.
For the webapps, the test checks if the WebJars are on the classpath. From my understanding this would only be possible if a user failed to include the dependencies during build time, so this would not really be a check for correct configuration and runtime behavior.
There was a problem hiding this comment.
The run distribution is based on Spring Boot, I'm wondering what the benefit is of providing a custom health endpoint over the Spring Boot actuator? The OperatonHealthIndicator was added in the code below, this would be a good fit in my opinion
There was a problem hiding this comment.
Prefer Spring Boot Actuator for Spring Boot apps. The custom endpoint is only useful as a lightweight /health for Run (e.g., load balancer pings) or in environments where Actuator isn’t enabled and could be a baseline for user-specific implementation of health. The OperatonHealthIndicator integrates Operaton health into Actuator and is the recommended approach in Spring Boot apps.
| details.put("frontend", frontend); | ||
|
|
||
| boolean dbOk = (dataSource == null) || dbConnected; | ||
| boolean jobOk = (jobExecutor == null) || jobExecutorActive; |
There was a problem hiding this comment.
Having an inactive Job Executor might be a valid choice, I would not expect the health check to fail in this case
There was a problem hiding this comment.
I was considering the use of Operation in a HA environment and assuming that the JobExecutor would be turned off to avoid split-brain scenarios, but that wouldn’t necessarily mean the application is unhealthy. Under that assumption, it could give a completely incorrect result, so right that shall be removed.
|
I am still not sure about the details being hidden enough and if it's still not trying to group too many things together, but would like to check out with you and exchange thoughts |
|
The build error will disappear after rebase. There was a problem in restructuring the integration build. |
@kthoms Sorted out |
|
I've fix the mess I've caused with the rebase, PR looks clear, howver there are some issues with tests - also on when building from clear main. |
|
@kthoms if you could please take a look at this one :) |
|
This PR has become obsolete by #2730 |
Introduce a lightweight, runtime-agnostic health surface and integrate it across modules so health checks work consistently in different deployment models.
related to #1245, closes #1245