fix: resolve open CodeQL alerts (workflow permissions, ReDoS, path-injection) by robfrank · Pull Request #4388 · ArcadeData/arcadedb

robfrank · 2026-05-28T12:01:14Z

Summary

Resolves the 18 open CodeQL alerts on main (filtered to tool:CodeQL):

actions/missing-workflow-permissions (15, medium) - added least-privilege permissions blocks to mvn-test.yml, mvn-deploy.yml, meterian.yml and license-compliance.yml. Top-level contents: read, with per-job elevations only where actions require them:
- checks: write for the dorny/test-reporter jobs (unit/slow-unit/integration/ha-integration/java-e2e, and the already-restricted studio-e2e so its reporter keeps posting),
- security-events: write for the Meterian SARIF upload,
- pull-requests: write for the license-compliance PR comment.
py/redos (2, high) - the docker-compose sanitizing regex in bindings/python/examples/11_vector_index_build.py and 12_vector_search.py used (?ms)...(?:.*\n)*?..., where DOTALL lets . consume the \n delimiter and the loop becomes ambiguous (exponential backtracking). Rewrote so every newline is explicit and the loop body is [^\n]*\n. Output is byte-identical on valid input; pathological input now runs in sub-millisecond time.
java/path-injection (1, high) - Neo4jImporter derives on-disk bucket file names from untrusted import labels. The blocklist guard added in fix: resolve CodeQL Java code-scanning alerts at their true sources #4383 did not clear the alert because URLEncoder.encode encodes / and \ but leaves .. intact, and CodeQL does not accept that guard shape as a sanitizer. Replaced it with an ASCII allow-list Pattern.compile("[A-Za-z0-9_ -]+") validated via matcher(label).matches(), which CodeQL's DirectoryCharactersGuard recognizes as a path-traversal sanitizer.

Note / trade-off

The path-injection fix is intentionally ASCII-only for labels - this is what makes CodeQL recognize the sanitizer (predefined classes like \p{L} contain a backslash that the query's heuristic reads as permitting \). Existing Neo4j exports use ASCII identifier labels, so behavior is unchanged; a non-ASCII label is now rejected with a clear ImportException.

Test plan

mvn -pl integration test -Dtest=Neo4jImporterIT - all 8 tests pass, including rejectNodeLabelWithPathTraversal, rejectEdgeLabelWithPathTraversal, and the large/multi-type import tests.
Python examples py_compile clean; verified the new regex produces identical output to the old on a representative compose file and is ReDoS-resistant on adversarial input.
All four workflow YAML files validated; permissions verified against each job's actual actions.
Confirm CodeQL re-scan closes all 18 alerts after merge.

…jection) Address the 18 open CodeQL alerts on main: - actions/missing-workflow-permissions (15): add least-privilege permissions blocks to mvn-test, mvn-deploy, meterian and license-compliance workflows. Top-level contents:read plus per-job checks:write for dorny/test-reporter jobs, security-events:write for the Meterian SARIF upload and pull-requests:write for the license PR comment. - py/redos (2): rewrite the docker-compose sanitizing regex in the vector index/search Python examples so newlines are explicit and the loop body cannot consume the delimiter, eliminating exponential backtracking. Output is unchanged. - java/path-injection (1): replace the Neo4jImporter label blocklist with an ASCII allow-list (Pattern [A-Za-z0-9_ -]+), which CodeQL recognises as a path-traversal sanitizer. The previous blocklist did not clear the alert because URLEncoder.encode leaves '..' intact.

codacy-production · 2026-05-28T12:02:24Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics -4 complexity

Metric Results

Complexity -4

View in Codacy

🟢 Coverage 100.00% diff coverage · -7.37% coverage variation

Metric Results

Coverage variation ✅ -7.37% coverage variation

Diff coverage ✅ 100.00% diff coverage

View coverage diff in Codacy

Coverage variation details

Coverable lines Covered lines Coverage

Common ancestor commit (3f6c078) 127835 93820 73.39%

Head commit (5e9b2ce) 159514 (+31679) 105316 (+11496) 66.02% (-7.37%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

Coverable lines Covered lines Diff coverage

Pull request (#4388) 2 2 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

gemini-code-assist

Code Review

This pull request updates regular expressions in Python examples to prevent ReDoS and tightens label validation in the Java Neo4j importer using a strict allow-list pattern to prevent path traversal. The review feedback highlights that the updated Python regexes will fail on Windows-style CRLF line endings and suggests a robust pattern to handle both LF and CRLF. Additionally, for the Java importer, the reviewer recommends restricting the label validation pattern to disallow leading, trailing, or consecutive spaces, which could otherwise cause filesystem issues on Windows.

gemini-code-assist · 2026-05-28T12:02:53Z

+    sanitized = re.sub(r"(?m)^[ \t]*container_name:[ \t]*[^\n]*\n", "", raw)
    sanitized = re.sub(
-        r"(?ms)(^\s*networks:\s*\n(?:.*\n)*?^\s*milvus:\s*\n)(\s*name:\s*milvus\s*\n)",
+        r"(?m)(^[ \t]*networks:[ \t]*\n(?:[^\n]*\n)*?^[ \t]*milvus:[ \t]*\n)([ \t]*name:[ \t]*milvus[ \t]*\n)",
        r"\1",
        sanitized,
    )


The updated regular expressions use [ \t]* and explicit \n instead of \s* and \s*\n. While this successfully prevents ReDoS, it will fail to match on files with Windows-style line endings (\r\n) because [ \t]* does not match \r. To ensure robustness across different platforms, use \r?\n and [^\r\n]* to correctly handle both LF and CRLF line endings.

Suggested change

sanitized = re.sub(r"(?m)^[ \t]*container_name:[ \t]*[^\n]*\n", "", raw)

sanitized = re.sub(

r"(?ms)(^\s*networks:\s*\n(?:.*\n)*?^\s*milvus:\s*\n)(\s*name:\s*milvus\s*\n)",

r"(?m)(^[ \t]*networks:[ \t]*\n(?:[^\n]*\n)*?^[ \t]*milvus:[ \t]*\n)([ \t]*name:[ \t]*milvus[ \t]*\n)",

r"\1",

sanitized,

)

sanitized = re.sub(r"(?m)^[ \t]*container_name:[ \t]*[^\r\n]*\r?\n", "", raw)

sanitized = re.sub(

r"(?m)(^[ \t]*networks:[ \t]*\r?\n(?:[^\r\n]*\r?\n)*?^[ \t]*milvus:[ \t]*\r?\n)([ \t]*name:[ \t]*milvus[ \t]*\r?\n)",

r"\1",

sanitized,

)

gemini-code-assist · 2026-05-28T12:02:53Z

+    sanitized = re.sub(r"(?m)^[ \t]*version[ \t]*:[ \t]*[^\n]*\n", "", raw)
+    sanitized = re.sub(r"(?m)^[ \t]*container_name:[ \t]*[^\n]*\n", "", sanitized)
    sanitized = re.sub(
-        r"(?ms)(^\s*networks:\s*\n(?:.*\n)*?^\s*milvus:\s*\n)(\s*name:\s*milvus\s*\n)",
+        r"(?m)(^[ \t]*networks:[ \t]*\n(?:[^\n]*\n)*?^[ \t]*milvus:[ \t]*\n)([ \t]*name:[ \t]*milvus[ \t]*\n)",
        r"\1",
        sanitized,
    )


The updated regular expressions use [ \t]* and explicit \n instead of \s* and \s*\n. While this successfully prevents ReDoS, it will fail to match on files with Windows-style line endings (\r\n) because [ \t]* does not match \r. To ensure robustness across different platforms, use \r?\n and [^\r\n]* to correctly handle both LF and CRLF line endings.

Suggested change

sanitized = re.sub(r"(?m)^[ \t]*version[ \t]*:[ \t]*[^\n]*\n", "", raw)

sanitized = re.sub(r"(?m)^[ \t]*container_name:[ \t]*[^\n]*\n", "", sanitized)

sanitized = re.sub(

r"(?ms)(^\s*networks:\s*\n(?:.*\n)*?^\s*milvus:\s*\n)(\s*name:\s*milvus\s*\n)",

r"(?m)(^[ \t]*networks:[ \t]*\n(?:[^\n]*\n)*?^[ \t]*milvus:[ \t]*\n)([ \t]*name:[ \t]*milvus[ \t]*\n)",

r"\1",

sanitized,

)

sanitized = re.sub(r"(?m)^[ \t]*version[ \t]*:[ \t]*[^\r\n]*\r?\n", "", raw)

sanitized = re.sub(r"(?m)^[ \t]*container_name:[ \t]*[^\r\n]*\r?\n", "", sanitized)

sanitized = re.sub(

r"(?m)(^[ \t]*networks:[ \t]*\r?\n(?:[^\r\n]*\r?\n)*?^[ \t]*milvus:[ \t]*\r?\n)([ \t]*name:[ \t]*milvus[ \t]*\r?\n)",

r"\1",

sanitized,

)

gemini-code-assist · 2026-05-28T12:02:54Z

  private final static SimpleDateFormat               dateTimeISO8601Format    = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
+  // Allow-list for imported labels: ASCII letters, digits, underscore, hyphen and space only. Excluding '.', '/' and '\'
+  // makes path-traversal sequences structurally impossible in the on-disk bucket file names derived from these labels.
+  private final static Pattern                        SAFE_LABEL               = Pattern.compile("[A-Za-z0-9_ -]+");


The SAFE_LABEL pattern [A-Za-z0-9_ -]+ allows leading, trailing, and consecutive spaces. Since these labels are used to derive on-disk bucket filenames, labels with leading/trailing spaces or only spaces can cause filesystem errors or compatibility issues (especially on Windows, which strips trailing spaces and prohibits certain space-only names). Restricting the pattern to disallow leading/trailing/consecutive spaces improves robustness.

Suggested change

private final static Pattern SAFE_LABEL = Pattern.compile("[A-Za-z0-9_ -]+");

private final static Pattern SAFE_LABEL = Pattern.compile("[A-Za-z0-9_-]+( [A-Za-z0-9_-]+)*");

claude · 2026-05-28T12:04:51Z

Code Review

Overview

This PR resolves 18 open CodeQL alerts across three categories: GitHub Actions workflow over-permissions (15 alerts), Python ReDoS vulnerabilities (2 alerts), and a Java path-injection issue in Neo4jImporter (1 alert). The changes are focused, well-scoped, and the PR description clearly explains each trade-off.

GitHub Actions Workflow Permissions

Positive. All four workflow files correctly adopt the pattern of a restrictive top-level default (contents: read) with per-job elevations only where the job's actions require them. The job-level checks: write additions for dorny/test-reporter jobs follow least-privilege correctly, and mvn-deploy.yml only needs contents: read because deploy credentials travel via secrets, not GITHUB_TOKEN.

Python ReDoS Fixes

Positive. The fix is technically correct. Removing the s (DOTALL) flag and replacing .* with [^\n]* eliminates the ambiguity that caused catastrophic backtracking: with [^\n]*\n, the newline delimiter cannot be consumed by the body alternation, so the NFA has no exponential branch to explore. Output is byte-identical on valid YAML input.

Minor note. The original \s* matched any Unicode whitespace (including \r, \f, etc.); the replacement [ \t]* matches only horizontal ASCII whitespace. For YAML compose files fetched from a well-known URL this is fine in practice, but worth noting in case the tool ever processes CRLF content from Windows hosts.

Java Path Injection Fix (Neo4jImporter)

Positive overall. Replacing the blocklist with a strict allow-list is the right direction - blocklists require enumerating every bad character, allow-lists are safe by construction. CodeQL rightly recognizes matcher(label).matches() as a DirectoryCharactersGuard, so the alert clears cleanly.

Bug: error message does not match the new behaviour (medium)

The exception message at line 692 still says "Invalid label: must not contain path separators or '..'", which describes the old blocklist semantics. The new allow-list rejects far more than just path separators - it also rejects labels containing Unicode letters, accented characters, dots, colons, brackets, etc. A user with a label like Nodo or MyType.v2 will see an error blaming "path separators" when neither is present, which is confusing.

Suggested replacement:

throw new ImportException("Invalid label '" + label + "': only ASCII letters, digits, underscore, hyphen and space are allowed");

The test assertions at lines 333 and 352 in Neo4jImporterIT use hasMessageContaining("path separators") and will need updating to match the new message text.

Minor: regex character class readability

In Pattern.compile("[A-Za-z0-9_ -]+"), the hyphen at the end of the class (before ]) is treated as a literal, which is correct. However, _ - can look like a range from space to something, and some tooling flags trailing hyphens inconsistently. The comment above mentions hyphen explicitly, which helps. An unambiguous alternative: [-A-Za-z0-9_ ]+ or [A-Za-z0-9_\\- ]+.

Missing test: the stricter allow-list behaviour is not covered

The two existing tests (rejectNodeLabelWithPathTraversal, rejectEdgeLabelWithPathTraversal) use labels like ../../etc/evil, which both the old blocklist and new allow-list reject. There is no test for a label that the old code accepted but the new allow-list rejects - for example a label with a Unicode character or a dot (MyType.v2). Adding one regression test for this case would verify the allow-list tightening and lock in the expected exception message.

Summary

Area	Status	Blocking?
Workflow permissions	Clean, least-privilege, correct	No
Python ReDoS fix	Correct, linear-time on adversarial input	No
Error message mismatch	Message describes old blocklist, not new allow-list	Yes - misleads users; test assertion passes only by coincidence
SAFE_LABEL regex readability	Hyphen placement is correct but non-obvious	No (nit)
Missing allow-list regression test	No test for labels rejected by allow-list but not blocklist	Recommended

The path-injection fix and workflow permissions are the right approach. The main item to address is aligning the exception message (and test assertion) with what the allow-list actually enforces.

codecov · 2026-05-28T12:42:40Z

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 64.35%. Comparing base (3f6c078) to head (5e9b2ce).

Files with missing lines	Patch %	Lines
...m/arcadedb/integration/importer/Neo4jImporter.java	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #4388      +/-   ##
============================================
+ Coverage     64.32%   64.35%   +0.02%     
- Complexity      424      431       +7     
============================================
  Files          1647     1647              
  Lines        127835   127836       +1     
  Branches      27403    27403              
============================================
+ Hits          82233    82263      +30     
+ Misses        34046    34020      -26     
+ Partials      11556    11553       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Since engine PR #4343, findNeighborsFromVector exposes a squared Euclidean distance (lower is better) for the EUCLIDEAN metric, matching COSINE and DOT_PRODUCT which already return distances. The Python binding still treated EUCLIDEAN as a similarity: it sorted EUCLIDEAN results in reverse (returning the worst matches first) and test_lsm_euclidean_distance asserted a 1/(1+d²) similarity. Drop the EUCLIDEAN reverse-sort special case so every metric sorts ascending by distance, update the class docstring, and assert the squared-distance values (origin -> 0.0, point -> 25.0) in the test. Verified against freshly built 26.6.1 engine jars: the full tests/test_vector.py suite passes (38 passed, 4 skipped).

claude · 2026-05-28T15:37:22Z

Code Review

Overview

This PR resolves 18 open CodeQL alerts across three categories: workflow permissions (15), Python ReDoS (2), and Java path-injection (1). It also bundles an undocumented behavioral change to the Python vector embedding API. The security fixes are well-motivated and technically sound; one issue requires attention before merging.

Workflow Permissions (`.github/workflows/*.yml`) - Approved

The layered permission model is correct:

Top-level contents: read establishes a restrictive default that caps every job.
Per-job elevations (checks: write, security-events: write, pull-requests: write) are scoped precisely to the jobs that need them.
The pre-existing studio-e2e job that already had contents: read simply gains checks: write — no regression there.

Python ReDoS Fix (`11_vector_index_build.py`, `12_vector_search.py`) - Approved

The root cause is correctly identified: (?ms) lets . swallow \n, making (?:.*\n)*? ambiguous because the \n delimiter can also be consumed by ., forcing the engine into exponential backtracking on adversarial input.

The fix is the right approach:

\s* → [ \t]* (whitespace without newlines)
.* → [^\n]* (match characters up to, but not including, newline)
Drop the (?s) (DOTALL) flag entirely

Each [^\n]*\n token is now unambiguous — there is exactly one way to match it — eliminating the backtracking. The PR description's claim that output is byte-identical on valid input is correct.

Java Path-Injection Fix (`Neo4jImporter.java`) - Minor issue

The allow-list approach ([A-Za-z0-9_ -]+) is strictly more secure than the deny-list it replaces and is the correct fix for CodeQL's DirectoryCharactersGuard recognizer. Compiling the pattern as a static final constant is good practice.

Issue: error message no longer matches the validation constraint.

The existing error message — "Invalid label: must not contain path separators or '..'" — described the old deny-list. The new allow-list rejects far more characters: any period (.), any non-ASCII character, @, #, etc. A user whose label is "Person.Employee" or "Ärzte" will get a message that doesn't explain why their label was rejected.

Suggested fix:

throw new ImportException("Invalid label '" + label + "': only ASCII letters, digits, underscore, hyphen and space are allowed");

Minor: empty string behaviour changed. The old code accepted empty strings (condition required at least one forbidden character); [A-Za-z0-9_ -]+ rejects them (+ requires ≥1 character). This is almost certainly the right behaviour, but worth noting.

Note on the character class: [A-Za-z0-9_ -] — in Java regex, - immediately before ] is a literal hyphen, so the class is A-Z | a-z | 0-9 | _ | space | -. This is correct.

Python Vector API change (`vector.py`, `test_vector.py`) - Needs documentation

This change is not mentioned in the PR description but is a breaking behavioral change:

	Old behavior	New behavior
EUCLIDEAN score	Similarity: `1 / (1 + d²)` — higher is better	Squared distance: `d²` — lower is better
Sort order	Descending by similarity	Ascending by distance

The _uses_reverse_score_sort() helper is removed entirely; _sort_results now always sorts ascending. The nearest-neighbor ranking is unchanged (nearest item is always first), but the numeric score values returned to callers change completely for EUCLIDEAN queries. Any user code comparing or thresholding these scores will break silently.

This belongs in the PR description with an explicit "breaking change" callout and should probably appear in the project's changelog / release notes. If this is intentional, please add a note to the description explaining the motivation (the code comment says the engine already returns distances natively, so the old conversion was superfluous — that rationale should be visible at the PR level).

Summary

Area	Status
Workflow permissions	Approved
Python ReDoS	Approved
Java path-injection logic	Approved
Java error message	Needs update
Vector API breaking change	Needs PR description update

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

robfrank merged commit 949aae1 into main May 28, 2026
131 of 142 checks passed

robfrank deleted the fix/codeql-workflow-perms-redos-pathinjection branch May 28, 2026 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve open CodeQL alerts (workflow permissions, ReDoS, path-injection)#4388

fix: resolve open CodeQL alerts (workflow permissions, ReDoS, path-injection)#4388
robfrank merged 2 commits into
mainfrom
fix/codeql-workflow-perms-redos-pathinjection

robfrank commented May 28, 2026

Uh oh!

codacy-production Bot commented May 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

claude Bot commented May 28, 2026

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	private final static Pattern SAFE_LABEL = Pattern.compile("[A-Za-z0-9_ -]+");
	private final static Pattern SAFE_LABEL = Pattern.compile("[A-Za-z0-9_-]+( [A-Za-z0-9_-]+)*");

Uh oh!

Conversation

robfrank commented May 28, 2026

Summary

Note / trade-off

Test plan

Uh oh!

codacy-production Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 28, 2026

Code Review

Overview

GitHub Actions Workflow Permissions

Python ReDoS Fixes

Java Path Injection Fix (Neo4jImporter)

Summary

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude Bot commented May 28, 2026

Code Review

Overview

Workflow Permissions (.github/workflows/*.yml) - Approved

Python ReDoS Fix (11_vector_index_build.py, 12_vector_search.py) - Approved

Java Path-Injection Fix (Neo4jImporter.java) - Minor issue

Python Vector API change (vector.py, test_vector.py) - Needs documentation

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codacy-production Bot commented May 28, 2026 •

edited

Loading

codecov Bot commented May 28, 2026 •

edited

Loading

Workflow Permissions (`.github/workflows/*.yml`) - Approved

Python ReDoS Fix (`11_vector_index_build.py`, `12_vector_search.py`) - Approved

Java Path-Injection Fix (`Neo4jImporter.java`) - Minor issue

Python Vector API change (`vector.py`, `test_vector.py`) - Needs documentation