Skip to content

ci(l1): loop HTML comment removal to prevent incomplete sanitization#6322

Merged
iovoid merged 1 commit into
mainfrom
fix/codeql-119-incomplete-sanitization
Mar 5, 2026
Merged

ci(l1): loop HTML comment removal to prevent incomplete sanitization#6322
iovoid merged 1 commit into
mainfrom
fix/codeql-119-incomplete-sanitization

Conversation

@klaus993

@klaus993 klaus993 commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes CodeQL alert #119 (js/incomplete-multi-character-sanitization).

Problem

In .github/scripts/set-pr-status.js, extractLinkedIssueNumbers strips HTML comments from PR bodies before extracting "closes #N" keywords. The regex used a single-pass replacement:

const withoutComments = body.replace(/<!--[\s\S]*?-->/g, "");

A single-pass replacement of <!--...--> can leave behind a valid HTML comment if the input is crafted so that removing one match creates a new one. For example, input <!-<!-- foo -->-> would produce <!-- foo --> after one pass.

This means a malicious PR author could hide closing keywords inside crafted comment nesting to manipulate project board status. The practical risk is low (CI context, trusted-ish input), but the fix is trivial.

Fix

Replace the single-pass .replace() with a loop that repeats until stable:

let withoutComments = body;
let previous;
do {
    previous = withoutComments;
    withoutComments = withoutComments.replace(/<!--[\s\S]*?-->/g, "");
} while (withoutComments !== previous);

This is the approach recommended by CodeQL's own documentation for this exact alert type.

Test plan

  • No behavioral change for normal PR bodies — the single pass already removes all comments; the loop just adds a no-op second iteration confirming stability
  • Edge case: nested/overlapping comment markers like <!-<!-- foo -->-> are now fully stripped across multiple iterations
  • Resolves CodeQL alert js/incomplete-multi-character-sanitization

Fixes CodeQL alert #119 (js/incomplete-multi-character-sanitization).
@klaus993 klaus993 requested a review from a team as a code owner March 5, 2026 16:04
@github-actions

github-actions Bot commented Mar 5, 2026

Copy link
Copy Markdown

🤖 Kimi Code Review

Review of .github/scripts/set-pr-status.js

The change addresses a potential edge case where nested HTML comments might not be fully removed from PR descriptions. However, there's a more efficient and safer approach.

Issues:

  1. Performance concern (lines 154-159): The do-while loop could potentially become an infinite loop if the regex replacement doesn't converge. While unlikely with the current regex, it's a theoretical risk.

  2. Inefficient approach: The iterative replacement is unnecessary. A single regex replacement with a more comprehensive pattern can handle nested comments in one pass.

Suggested improvement:

Replace the do-while loop with a single, more robust regex that handles nested comments:

// Instead of the do-while loop
const withoutComments = body.replace(/<!--(?:[^-]|-(?!->))*-->/g, "");

This regex uses a negative lookahead to properly handle nested comments without the need for iteration. It's more efficient and eliminates the infinite loop risk.

The rest of the function logic appears correct for extracting linked issue numbers from PR descriptions.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions

github-actions Bot commented Mar 5, 2026

Copy link
Copy Markdown

🤖 Claude Code Review

The diff is small and self-contained. I have enough context to write the review.


PR Review: fix(ci): loop HTML comment removal to prevent incomplete sanitization

Scope: .github/scripts/set-pr-status.js — CI scripting only, no Rust/EVM code changed.


Summary

The fix correctly addresses the CodeQL js/incomplete-multi-character-sanitization alert. The loop-until-stable approach is the canonical recommendation for this class of issue.


Correctness

The loop is sound. A crafted input like <!-<!-- foo -->-> would produce <!-- foo --> after the first pass, and the loop catches that in the second iteration. The do...while structure guarantees at least one replacement pass, and the termination condition (withoutComments !== previous) correctly detects a fixed point.

One minor observation: the regex uses a non-greedy [\s\S]*? match, which means on each iteration it matches the innermost comment first. This is the correct behavior for iteratively unwrapping nested constructs — no concern here.


ReDoS / performance

The regex <!--[\s\S]*?--> has no catastrophic backtracking risk. With non-greedy matching and a clear terminating delimiter -->, the engine won't exhibit exponential behavior. For a PR body (bounded, human-authored text), performance is not a concern.


Termination

The loop always terminates: each iteration either removes at least one <!--...--> match (strictly shortening the string) or leaves the string unchanged (and the loop exits). The string length is non-increasing and bounded below by 0, so the invariant holds.


Minor suggestions

  • The previous variable is declared with let but only used inside the loop for comparison. This is fine stylistically, but declaring it as let previous = body before the loop (matching the initial value of withoutComments) would make the intent slightly more explicit — though the do...while makes the logic clear enough as-is.

  • Consider whether a maximum iteration guard (e.g. capping at 100 iterations) is warranted for defense-in-depth against a hypothetical deeply-nested adversarial input. Given that each iteration strictly removes content, even a deeply nested 10-level structure would only need 10 passes — this is unlikely to matter in practice, but it could be a minor hardening note.


Verdict

Approve. The fix is minimal, correct, and directly addresses the reported vulnerability class. The logic is easy to follow and introduces no regressions for normal PR bodies.


Automated review by Claude (Anthropic) · custom prompt

@github-actions

github-actions Bot commented Mar 5, 2026

Copy link
Copy Markdown

🤖 Codex Code Review

No blocking findings in this PR.

  • The change in set-pr-status.js is behaviorally safe: repeated stripping is idempotent and avoids some malformed/nested-comment edge cases where a single pass may miss hidden segments.
  • I did not identify correctness, security, or performance regressions from this diff in practical CI conditions.

Optional improvement:

  1. Add a small regression test around extractLinkedIssueNumbers to lock in expected behavior for comment-heavy PR bodies (especially malformed/nested comment patterns).

Automated review by OpenAI Codex · custom prompt

@greptile-apps

greptile-apps Bot commented Mar 5, 2026

Copy link
Copy Markdown

Greptile Summary

This PR hardens the extractLinkedIssueNumbers helper in .github/scripts/set-pr-status.js by replacing a single-pass HTML comment strip with a do-while loop that repeats until the string stabilises, addressing CodeQL alert js/incomplete-multi-character-sanitization.

  • Change is correct and safe: Each iteration of the loop can only shorten the string (by removing one or more <!--…--> matches); the loop terminates as soon as no match is found, so there is no risk of an infinite loop.
  • No behavioural change for normal inputs: On typical PR bodies the loop performs exactly two iterations — one removal pass followed by one no-op confirmation pass.
  • PR description example is slightly misleading: The description states that <!-<!-- foo -->-> produces <!-- foo --> after one pass, but tracing the regex shows it actually produces <!--> (five characters), which is neither a valid HTML comment nor a closes #N keyword. The practical risk was therefore even lower than described, though the fix is still the correct approach recommended by CodeQL.
  • No ReDoS risk: The lazy [\s\S]*? quantifier is not susceptible to catastrophic backtracking; combined with GitHub's PR body size limit, performance is not a concern.

Confidence Score: 5/5

  • This PR is safe to merge — it is a minimal, well-reasoned security hardening change with no functional risk.
  • The change is a four-line, self-contained loop that is provably terminating and produces identical results for all normal inputs. It correctly resolves the CodeQL alert without introducing any new complexity or side-effects.
  • No files require special attention.

Important Files Changed

Filename Overview
.github/scripts/set-pr-status.js Replaces single-pass HTML comment removal with a stable do-while loop to prevent incomplete multi-character sanitization; the rest of the file is unchanged.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Start: prBody]) --> B[withoutComments = body]
    B --> C[previous = withoutComments]
    C --> D["withoutComments = withoutComments\n.replace(/<!--[\s\S]*?-->/g, '')"]
    D --> E{withoutComments\n=== previous?}
    E -- No, matches found --> C
    E -- Yes, stable --> F["matchAll closes/fixes/resolves #N"]
    F --> G([Return issue numbers])
Loading

Last reviewed commit: f737859

Comment thread .github/scripts/set-pr-status.js
@klaus993

klaus993 commented Mar 5, 2026

Copy link
Copy Markdown
Contributor Author

The do-while loop could potentially become an infinite loop if the regex replacement doesn't converge.

The loop always terminates: each iteration either removes at least one <!--...--> match (strictly shortening the string) or leaves the string unchanged (and the loop exits). Since the string length is non-increasing and bounded below by 0, convergence is guaranteed — there's no infinite loop risk.

A single regex replacement with a more comprehensive pattern can handle nested comments in one pass.

The loop-until-stable approach is the one recommended by CodeQL's own documentation for this exact alert type. It's also easier to reason about correctness compared to a more complex single-pass regex.

@ilitteri ilitteri requested a review from iovoid March 5, 2026 16:23
@iovoid iovoid changed the title fix(ci): loop HTML comment removal to prevent incomplete sanitization ci(l1): loop HTML comment removal to prevent incomplete sanitization Mar 5, 2026
@iovoid iovoid enabled auto-merge March 5, 2026 17:24
@iovoid iovoid added this pull request to the merge queue Mar 5, 2026
@github-actions github-actions Bot added the L1 Ethereum client label Mar 5, 2026
@github-project-automation github-project-automation Bot moved this to In Review in ethrex_l1 Mar 5, 2026
@github-actions github-actions Bot removed the L1 Ethereum client label Mar 5, 2026
Merged via the queue into main with commit 4b47358 Mar 5, 2026
84 of 88 checks passed
@iovoid iovoid deleted the fix/codeql-119-incomplete-sanitization branch March 5, 2026 17:55
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 Mar 5, 2026
lakshya-sky pushed a commit to lakshya-sky/ethrex that referenced this pull request Mar 10, 2026
…ambdaclass#6322)

## Summary

Fixes [CodeQL alert
lambdaclass#119](https://github.com/lambdaclass/ethrex/security/code-scanning/119)
(`js/incomplete-multi-character-sanitization`).

### Problem

In `.github/scripts/set-pr-status.js`, `extractLinkedIssueNumbers`
strips HTML comments from PR bodies before extracting "closes #N"
keywords. The regex used a single-pass replacement:

```js
const withoutComments = body.replace(/<!--[\s\S]*?-->/g, "");
```

A single-pass replacement of `<!--...-->` can leave behind a valid HTML
comment if the input is crafted so that removing one match creates a new
one. For example, input `<!-<!-- foo -->->` would produce `<!-- foo -->`
after one pass.

This means a malicious PR author could hide closing keywords inside
crafted comment nesting to manipulate project board status. The
practical risk is low (CI context, trusted-ish input), but the fix is
trivial.

### Fix

Replace the single-pass `.replace()` with a loop that repeats until
stable:

```js
let withoutComments = body;
let previous;
do {
    previous = withoutComments;
    withoutComments = withoutComments.replace(/<!--[\s\S]*?-->/g, "");
} while (withoutComments !== previous);
```

This is the approach recommended by [CodeQL's own
documentation](https://codeql.github.com/codeql-query-help/javascript/js-incomplete-multi-character-sanitization/)
for this exact alert type.

## Test plan

- No behavioral change for normal PR bodies — the single pass already
removes all comments; the loop just adds a no-op second iteration
confirming stability
- Edge case: nested/overlapping comment markers like `<!-<!-- foo -->->`
are now fully stripped across multiple iterations
- Resolves CodeQL alert `js/incomplete-multi-character-sanitization`
iovoid pushed a commit that referenced this pull request Mar 16, 2026
…6322)

## Summary

Fixes [CodeQL alert
#119](https://github.com/lambdaclass/ethrex/security/code-scanning/119)
(`js/incomplete-multi-character-sanitization`).

### Problem

In `.github/scripts/set-pr-status.js`, `extractLinkedIssueNumbers`
strips HTML comments from PR bodies before extracting "closes #N"
keywords. The regex used a single-pass replacement:

```js
const withoutComments = body.replace(/<!--[\s\S]*?-->/g, "");
```

A single-pass replacement of `<!--...-->` can leave behind a valid HTML
comment if the input is crafted so that removing one match creates a new
one. For example, input `<!-<!-- foo -->->` would produce `<!-- foo -->`
after one pass.

This means a malicious PR author could hide closing keywords inside
crafted comment nesting to manipulate project board status. The
practical risk is low (CI context, trusted-ish input), but the fix is
trivial.

### Fix

Replace the single-pass `.replace()` with a loop that repeats until
stable:

```js
let withoutComments = body;
let previous;
do {
    previous = withoutComments;
    withoutComments = withoutComments.replace(/<!--[\s\S]*?-->/g, "");
} while (withoutComments !== previous);
```

This is the approach recommended by [CodeQL's own
documentation](https://codeql.github.com/codeql-query-help/javascript/js-incomplete-multi-character-sanitization/)
for this exact alert type.

## Test plan

- No behavioral change for normal PR bodies — the single pass already
removes all comments; the loop just adds a no-op second iteration
confirming stability
- Edge case: nested/overlapping comment markers like `<!-<!-- foo -->->`
are now fully stripped across multiple iterations
- Resolves CodeQL alert `js/incomplete-multi-character-sanitization`
Muzry pushed a commit to Muzry/ethrex that referenced this pull request Mar 17, 2026
…ambdaclass#6322)

## Summary

Fixes [CodeQL alert
lambdaclass#119](https://github.com/lambdaclass/ethrex/security/code-scanning/119)
(`js/incomplete-multi-character-sanitization`).

### Problem

In `.github/scripts/set-pr-status.js`, `extractLinkedIssueNumbers`
strips HTML comments from PR bodies before extracting "closes #N"
keywords. The regex used a single-pass replacement:

```js
const withoutComments = body.replace(/<!--[\s\S]*?-->/g, "");
```

A single-pass replacement of `<!--...-->` can leave behind a valid HTML
comment if the input is crafted so that removing one match creates a new
one. For example, input `<!-<!-- foo -->->` would produce `<!-- foo -->`
after one pass.

This means a malicious PR author could hide closing keywords inside
crafted comment nesting to manipulate project board status. The
practical risk is low (CI context, trusted-ish input), but the fix is
trivial.

### Fix

Replace the single-pass `.replace()` with a loop that repeats until
stable:

```js
let withoutComments = body;
let previous;
do {
    previous = withoutComments;
    withoutComments = withoutComments.replace(/<!--[\s\S]*?-->/g, "");
} while (withoutComments !== previous);
```

This is the approach recommended by [CodeQL's own
documentation](https://codeql.github.com/codeql-query-help/javascript/js-incomplete-multi-character-sanitization/)
for this exact alert type.

## Test plan

- No behavioral change for normal PR bodies — the single pass already
removes all comments; the loop just adds a no-op second iteration
confirming stability
- Edge case: nested/overlapping comment markers like `<!-<!-- foo -->->`
are now fully stripped across multiple iterations
- Resolves CodeQL alert `js/incomplete-multi-character-sanitization`
edg-l added a commit that referenced this pull request May 11, 2026
Prior commit (2496c2b) trimmed `test/tests/levm/eip7708_tests.rs` to
a single constants check, on the assumption the file was fully new in
this PR. It wasn't — 25 of the 28 tests had been on `main` for a while
(landed in PR #6322, well before this branch diverged). Only 3 tests
were genuinely added on this PR:
  - test_burn_logs_emitted_in_lex_ascending_order_three_accounts
  - test_coinbase_priority_fee_does_not_emit_transfer_log
  - test_multi_selfdestruct_dest_emits_single_burn_log_with_combined_balance

Those three are covered by the EELS burn/transfer-logs ef-tests we now
run via the blockchain runner, so dropping them is correct. The other
25 belong to main and should not have been touched here. Restored to
main verbatim and added the new `Environment::is_system_call` field
introduced by this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants