fix(engine): prevent GraalJS memory leak from polyglot Context accumulation#2768
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses sustained-load memory growth/OOMs caused by GraalJS polyglot Context accumulation during script evaluation, and adds a dedicated QA load-test module to reproduce and validate the fix.
Changes:
- Adjust GraalJS bindings creation and close non-cached script engines after evaluation (engine + DMN).
- Improve connector/HTTP response resource cleanup to reduce retained buffers/resources.
- Add
qa/load-testmodule with processes, WireMock stubs, and a memory-trend load test reproducing #2761.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
qa/pom.xml |
Adds the new qa/load-test module to the QA reactor. |
qa/load-test/pom.xml |
Defines the load-test module dependencies and failsafe profile execution. |
qa/load-test/README.md |
Documents how to run/configure the new load test module. |
qa/load-test/src/test/java/org/operaton/bpm/qa/loadtest/LoadTestApplication.java |
Minimal Spring Boot test application to host embedded Operaton for load tests. |
qa/load-test/src/test/java/org/operaton/bpm/qa/loadtest/MemoryLeakLoadTest.java |
Concurrent start-process load test with heap sampling assertions to detect leaks/regressions. |
qa/load-test/src/test/resources/application.yml |
Test app configuration (H2, plugins, history, thread limits). |
qa/load-test/src/test/resources/processes/simple-process.bpmn |
Minimal BPMN used as a baseline scenario. |
qa/load-test/src/test/resources/processes/pure-js-process.bpmn |
Script-only BPMN scenario exercising pure JS execution. |
qa/load-test/src/test/resources/processes/script-only-process.bpmn |
Script-only BPMN scenario using Spin JSON access patterns. |
qa/load-test/src/test/resources/processes/http-only-process.bpmn |
HTTP-connector-only BPMN scenario to isolate connector memory behavior. |
qa/load-test/src/test/resources/processes/vale-antecipado-elegibility.bpmn |
Full reproduction BPMN resembling the reported real-world process (scripts + HTTP + DMN). |
qa/load-test/src/test/resources/processes/dmn_ValeAntecipado_Policy_Age.dmn |
DMN decision used by the reproduction process. |
qa/load-test/src/test/resources/processes/dmn_ValeAntecipado_Policy_Fee.dmn |
DMN decision resource for the reproduction set. |
qa/load-test/src/test/resources/processes/dmn_ValeAntecipado_Policy_TOJ.dmn |
DMN decision used by the reproduction process. |
engine/src/main/java/org/operaton/bpm/engine/impl/scripting/engine/ScriptingEngines.java |
Special-case binding creation for non-cachable engines (GraalJS) to avoid extra polyglot contexts. |
engine/src/main/java/org/operaton/bpm/engine/impl/scripting/env/ScriptingEnvironment.java |
Closes non-cached AutoCloseable script engines after execution. |
engine/src/main/java/org/operaton/bpm/engine/impl/scripting/ScriptLogger.java |
Adds a log message for failures when closing a script engine. |
engine-dmn/engine/src/main/java/org/operaton/bpm/dmn/engine/impl/evaluation/ExpressionEvaluationHandler.java |
Closes non-cached engines after eval and guards compilation to cachable engines. |
engine-dmn/engine/src/main/java/org/operaton/bpm/dmn/engine/impl/DmnEngineLogger.java |
Adds a log message for failures when closing a DMN script engine. |
engine-dmn/engine/src/test/java/org/operaton/bpm/dmn/engine/el/ExpressionCachingTest.java |
Updates test stubbing for the new cachability check via THREADING. |
connect/http-client/src/main/java/org/operaton/connect/httpclient/impl/HttpResponseImpl.java |
Ensures the underlying HTTP response is closed and dereferenced after parameters are collected. |
connect/http-client/src/test/java/org/operaton/connect/httpclient/HttpResponseTest.java |
Adds tests ensuring the response remains closable in both “read” and “unread” cases. |
engine-plugins/connect-plugin/src/main/java/org/operaton/connect/plugin/impl/ServiceTaskConnectorActivityBehavior.java |
Closes connector responses after output parameter mapping to prevent resource retention. |
Comment on lines
59
to
+65
| ConnectorResponse response = request.execute(); | ||
| applyOutputParameters(execution, response); | ||
| try { | ||
| applyOutputParameters(execution, response); | ||
| } finally { | ||
| if (response instanceof CloseableConnectorResponse closeableResponse) { | ||
| closeableResponse.close(); | ||
| } |
1419479 to
11652f4
Compare
48baff2 to
e3c437b
Compare
…lation Under sustained load with JavaScript script tasks, each script evaluation created two new GraalJS polyglot Contexts that were never closed, causing progressive memory growth until OutOfMemoryError. Root cause: ScriptingEngines.createBindings() called scriptEngine.createBindings() which returns GraalJSBindings with a lazy polyglot Context, then wrapped it in ScriptBindings. During eval(), GraalJS checks if ENGINE_SCOPE bindings are instanceof GraalJSBindings. Since ScriptBindings is not GraalJSBindings, a second polyglot Context was created. Neither Context was ever closed. Fix: For non-cachable engines (THREADING=null, e.g., GraalJS), use the engine's default ENGINE_SCOPE bindings directly (which ARE GraalJSBindings), populated with resolver values. This ensures GraalJS reuses the existing polyglot Context instead of creating a new one. Additional fixes: - Close non-cached script engines after evaluation in both the engine scripting environment and the DMN expression evaluation handler - Release HTTP response reference after parameter collection in HttpResponseImpl to allow GC of buffered response bodies - Close connector response after output parameter mapping in ServiceTaskConnectorActivityBehavior Load test results (before/after fix): - Pure JS process: 893 MB growth -> 38 MB growth (50 users, 90s) - Full process: OOM crash -> stable at ~166 MB (30 users, 60s) closes #2761 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e3c437b to
bc3286a
Compare
|
kthoms
added a commit
that referenced
this pull request
Apr 22, 2026
…lation (#2768) Under sustained load with JavaScript script tasks, each script evaluation created two new GraalJS polyglot Contexts that were never closed, causing progressive memory growth until OutOfMemoryError. Root cause: ScriptingEngines.createBindings() called scriptEngine.createBindings() which returns GraalJSBindings with a lazy polyglot Context, then wrapped it in ScriptBindings. During eval(), GraalJS checks if ENGINE_SCOPE bindings are instanceof GraalJSBindings. Since ScriptBindings is not GraalJSBindings, a second polyglot Context was created. Neither Context was ever closed. Fix: For non-cachable engines (THREADING=null, e.g., GraalJS), use the engine's default ENGINE_SCOPE bindings directly (which ARE GraalJSBindings), populated with resolver values. This ensures GraalJS reuses the existing polyglot Context instead of creating a new one. Additional fixes: - Close non-cached script engines after evaluation in both the engine scripting environment and the DMN expression evaluation handler - Release HTTP response reference after parameter collection in HttpResponseImpl to allow GC of buffered response bodies - Close connector response after output parameter mapping in ServiceTaskConnectorActivityBehavior Load test results (before/after fix): - Pure JS process: 893 MB growth -> 38 MB growth (50 users, 90s) - Full process: OOM crash -> stable at ~166 MB (30 users, 60s) closes #2761 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> (cherry picked from commit 58ff693)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Fixes #2761 — progressive memory growth under sustained load with JavaScript script tasks (e.g. Spin/JSON transformations), leading to OutOfMemoryError and container restart.
The issue was introduced with Operaton's switch from Nashorn to GraalJS. Camunda 7.24 (Nashorn) did not have this problem.
Root Cause
Every script evaluation created two new GraalJS polyglot Contexts that were never closed:
ScriptingEngines.createBindings()calledscriptEngine.createBindings()→ returnsGraalJSBindingswith a lazy polyglot Context AScriptBindings(which is notinstanceof GraalJSBindings)engine.eval(script, scriptBindings), GraalJS internally callsgetOrCreateGraalJSBindings()which checksbindings instanceof GraalJSBindingsScriptBindingsfails the check → GraalJS creates another new polyglot Context BGraalJS reports
THREADING=null, meaning it is not cachable and creates a fresh engine per evaluation. Nashorn reportedTHREADING=MULTITHREADEDand was cached — single engine, single context, no leak.Fix
Primary fix in
ScriptingEngines.createBindings()for non-cachable engines (GraalJS):GraalJSBindings)ScriptBindingsThis ensures
getOrCreateGraalJSBindings()finds theinstanceof GraalJSBindingscheck passing → reuses the existing polyglot Context instead of creating a new one.Supplementary fixes:
ScriptingEnvironment: close non-cached engines (AutoCloseable) after evaluationExpressionEvaluationHandler(DMN): same close-after-eval pattern + guard compilation to cachable engines onlyHttpResponseImpl: null out the HTTP response reference after reading to allow GC of buffered bodiesServiceTaskConnectorActivityBehavior: close connector response in try-finally after output mappingLoad Test Results
A new
qa/load-testmodule reproduces the issue with the exact process and conditions from the report.Memory trend between first and last quarter of sustained load:
Changes
Engine Core
ScriptingEngines:createBindingsForNonCachableEngine()uses engine's default GraalJSBindings;isNonCachableEngine()helper checks THREADINGScriptingEnvironment:closeNonCachedEngine()closes non-cached AutoCloseable engines after evalScriptLogger: addedlogClosingScriptEngineFailed()log messageDMN Engine
ExpressionEvaluationHandler: close non-cached engines after eval; guard script compilation to cachable enginesDmnEngineLogger: addedlogClosingScriptEngineFailed()log messageExpressionCachingTest: fixed mock to stub ScriptEngineFactory with THREADING=MULTITHREADED (required by the new cachability check)HTTP Connector
HttpResponseImpl: null outhttpResponseaftercollectResponseParameters()so the Apache HttpClient response body buffer is eligible for GCHttpResponseTest: added tests for the null-out behaviour and the no-op Closeable fallbackConnect Plugin
ServiceTaskConnectorActivityBehavior: added try-finally to closeCloseableConnectorResponseafter output parameter mapping