Bug
(This issue was first observed by @goynam - many thanks for raising it during our offline discussions!)
The webui container's Node.js process can hit the V8 heap limit ("FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory") but remain running in a degraded state (GC-thrashing, high CPU, unresponsive to HTTP requests) instead of exiting. Because the health checks in both Docker Compose and Helm only test TCP port connectivity rather than HTTP responsiveness, the container is never marked unhealthy and is never restarted.
Observed behavior:
- The Node.js process prints V8 OOM errors to stderr but does not exit (PID 1 stays alive).
- The process enters a GC-thrashing loop (~50% CPU, ~5 GB RSS), unable to serve any HTTP requests.
docker inspect reports the container as running and healthy.
- HTTP requests to the webui time out, making the UI completely unresponsive.
- The container is never restarted because the restart policy (
on-failure) only triggers on process exit, and the healthcheck (< /dev/tcp/webui/4000) passes as long as the TCP port is open.
Expected behavior:
- The webui process should either exit on OOM (so that the container restarts), or the health check should detect the unresponsive state and trigger a restart.
Affected configurations:
-
Docker Compose (tools/deployment/package/docker-compose-all.yaml, lines 403-410):
healthcheck:
test: ["CMD", "bash", "-c", "< /dev/tcp/webui/4000"]
TCP-only check; does not verify the application can serve requests.
-
Helm (tools/deployment/package-helm/templates/webui-deployment.yaml, lines 96-102):
readinessProbe:
tcpSocket:
port: "webui"
livenessProbe:
tcpSocket:
port: "webui"
Same TCP-only approach. Kubernetes will not restart the pod since the liveness probe passes even
when the application is unresponsive.
Possible fixes (non-exhaustive):
- Change healthchecks/probes to use HTTP (e.g.,
httpGet on a known route in Helm, or curl -f --max-time 2 http://webui:4000/ in Compose).
- Set
--max-old-space-size on the Node.js command to make V8 abort on OOM rather than GC-thrash indefinitely.
- Set a container memory limit (
deploy.resources.limits.memory in Compose, or resources.limits.memory in Helm) so the kernel OOM-killer terminates the process.
CLP version
3b4d13f
Environment
- Docker Compose (
tools/deployment/package/) and Helm (tools/deployment/package-helm/)
- Observed on Linux 6.8.0-106-generic with Docker
Reproduction steps
- Deploy the CLP package using Docker Compose or Helm with default configuration.
- Use the webui under a workload that causes memory pressure on the Node.js server process (e.g., large result sets, many concurrent socket connections, or repeated searches that accumulate in-memory state).
- Wait until the Node.js process hits the V8 heap limit and prints "FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory" to stderr.
- Observe that the container/pod remains in a
running/healthy state despite the webui being completely unresponsive to HTTP requests.
- Confirm with
docker exec <container> ps aux that the Node.js process is still alive, consuming high CPU (GC-thrashing) and ~5 GB RSS.
- Confirm with
curl --max-time 5 http://<webui-host>:<port>/ that the HTTP endpoint times out.
Bug
(This issue was first observed by @goynam - many thanks for raising it during our offline discussions!)
The webui container's Node.js process can hit the V8 heap limit ("FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory") but remain running in a degraded state (GC-thrashing, high CPU, unresponsive to HTTP requests) instead of exiting. Because the health checks in both Docker Compose and Helm only test TCP port connectivity rather than HTTP responsiveness, the container is never marked unhealthy and is never restarted.
Observed behavior:
docker inspectreports the container asrunningandhealthy.on-failure) only triggers on process exit, and the healthcheck (< /dev/tcp/webui/4000) passes as long as the TCP port is open.Expected behavior:
Affected configurations:
Docker Compose (
tools/deployment/package/docker-compose-all.yaml, lines 403-410):TCP-only check; does not verify the application can serve requests.
Helm (
tools/deployment/package-helm/templates/webui-deployment.yaml, lines 96-102):Same TCP-only approach. Kubernetes will not restart the pod since the liveness probe passes even
when the application is unresponsive.
Possible fixes (non-exhaustive):
httpGeton a known route in Helm, orcurl -f --max-time 2 http://webui:4000/in Compose).--max-old-space-sizeon the Node.js command to make V8 abort on OOM rather than GC-thrash indefinitely.deploy.resources.limits.memoryin Compose, orresources.limits.memoryin Helm) so the kernel OOM-killer terminates the process.CLP version
3b4d13f
Environment
tools/deployment/package/) and Helm (tools/deployment/package-helm/)Reproduction steps
running/healthystate despite the webui being completely unresponsive to HTTP requests.docker exec <container> ps auxthat the Node.js process is still alive, consuming high CPU (GC-thrashing) and ~5 GB RSS.curl --max-time 5 http://<webui-host>:<port>/that the HTTP endpoint times out.