ci(l1): loop HTML comment removal to prevent incomplete sanitization by klaus993 · Pull Request #6322 · lambdaclass/ethrex

klaus993 · 2026-03-05T16:04:56Z

Summary

Fixes CodeQL alert #119 (js/incomplete-multi-character-sanitization).

Problem

In .github/scripts/set-pr-status.js, extractLinkedIssueNumbers strips HTML comments from PR bodies before extracting "closes #N" keywords. The regex used a single-pass replacement:

const withoutComments = body.replace(/<!--[\s\S]*?-->/g, "");

A single-pass replacement of  can leave behind a valid HTML comment if the input is crafted so that removing one match creates a new one. For example, input <!--> would produce  after one pass.

This means a malicious PR author could hide closing keywords inside crafted comment nesting to manipulate project board status. The practical risk is low (CI context, trusted-ish input), but the fix is trivial.

Fix

Replace the single-pass .replace() with a loop that repeats until stable:

let withoutComments = body;
let previous;
do {
    previous = withoutComments;
    withoutComments = withoutComments.replace(/<!--[\s\S]*?-->/g, "");
} while (withoutComments !== previous);

This is the approach recommended by CodeQL's own documentation for this exact alert type.

Test plan

No behavioral change for normal PR bodies — the single pass already removes all comments; the loop just adds a no-op second iteration confirming stability
Edge case: nested/overlapping comment markers like <!--> are now fully stripped across multiple iterations
Resolves CodeQL alert js/incomplete-multi-character-sanitization

Fixes CodeQL alert #119 (js/incomplete-multi-character-sanitization).

github-actions · 2026-03-05T16:05:24Z

🤖 Kimi Code Review

Review of `.github/scripts/set-pr-status.js`

The change addresses a potential edge case where nested HTML comments might not be fully removed from PR descriptions. However, there's a more efficient and safer approach.

Issues:

Performance concern (lines 154-159): The do-while loop could potentially become an infinite loop if the regex replacement doesn't converge. While unlikely with the current regex, it's a theoretical risk.
Inefficient approach: The iterative replacement is unnecessary. A single regex replacement with a more comprehensive pattern can handle nested comments in one pass.

Suggested improvement:

Replace the do-while loop with a single, more robust regex that handles nested comments:

// Instead of the do-while loop
const withoutComments = body.replace(/<!--(?:[^-]|-(?!->))*-->/g, "");

This regex uses a negative lookahead to properly handle nested comments without the need for iteration. It's more efficient and eliminates the infinite loop risk.

The rest of the function logic appears correct for extracting linked issue numbers from PR descriptions.

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-03-05T16:05:58Z

🤖 Claude Code Review

The diff is small and self-contained. I have enough context to write the review.

PR Review: `fix(ci): loop HTML comment removal to prevent incomplete sanitization`

Scope: .github/scripts/set-pr-status.js — CI scripting only, no Rust/EVM code changed.

Summary

The fix correctly addresses the CodeQL js/incomplete-multi-character-sanitization alert. The loop-until-stable approach is the canonical recommendation for this class of issue.

Correctness

The loop is sound. A crafted input like <!--> would produce  after the first pass, and the loop catches that in the second iteration. The do...while structure guarantees at least one replacement pass, and the termination condition (withoutComments !== previous) correctly detects a fixed point.

One minor observation: the regex uses a non-greedy [\s\S]*? match, which means on each iteration it matches the innermost comment first. This is the correct behavior for iteratively unwrapping nested constructs — no concern here.

ReDoS / performance

The regex  has no catastrophic backtracking risk. With non-greedy matching and a clear terminating delimiter -->, the engine won't exhibit exponential behavior. For a PR body (bounded, human-authored text), performance is not a concern.

Termination

The loop always terminates: each iteration either removes at least one  match (strictly shortening the string) or leaves the string unchanged (and the loop exits). The string length is non-increasing and bounded below by 0, so the invariant holds.

Minor suggestions

The previous variable is declared with let but only used inside the loop for comparison. This is fine stylistically, but declaring it as let previous = body before the loop (matching the initial value of withoutComments) would make the intent slightly more explicit — though the do...while makes the logic clear enough as-is.
Consider whether a maximum iteration guard (e.g. capping at 100 iterations) is warranted for defense-in-depth against a hypothetical deeply-nested adversarial input. Given that each iteration strictly removes content, even a deeply nested 10-level structure would only need 10 passes — this is unlikely to matter in practice, but it could be a minor hardening note.

Verdict

Approve. The fix is minimal, correct, and directly addresses the reported vulnerability class. The logic is easy to follow and introduces no regressions for normal PR bodies.

Automated review by Claude (Anthropic) · custom prompt

github-actions · 2026-03-05T16:06:41Z

🤖 Codex Code Review

No blocking findings in this PR.

The change in set-pr-status.js is behaviorally safe: repeated stripping is idempotent and avoids some malformed/nested-comment edge cases where a single pass may miss hidden segments.
I did not identify correctness, security, or performance regressions from this diff in practical CI conditions.

Optional improvement:

Add a small regression test around extractLinkedIssueNumbers to lock in expected behavior for comment-heavy PR bodies (especially malformed/nested comment patterns).

Automated review by OpenAI Codex · custom prompt

greptile-apps · 2026-03-05T16:07:46Z

Greptile Summary

This PR hardens the extractLinkedIssueNumbers helper in .github/scripts/set-pr-status.js by replacing a single-pass HTML comment strip with a do-while loop that repeats until the string stabilises, addressing CodeQL alert js/incomplete-multi-character-sanitization.

Change is correct and safe: Each iteration of the loop can only shorten the string (by removing one or more  matches); the loop terminates as soon as no match is found, so there is no risk of an infinite loop.
No behavioural change for normal inputs: On typical PR bodies the loop performs exactly two iterations — one removal pass followed by one no-op confirmation pass.
PR description example is slightly misleading: The description states that <!--> produces  after one pass, but tracing the regex shows it actually produces <!--> (five characters), which is neither a valid HTML comment nor a closes #N keyword. The practical risk was therefore even lower than described, though the fix is still the correct approach recommended by CodeQL.
No ReDoS risk: The lazy [\s\S]*? quantifier is not susceptible to catastrophic backtracking; combined with GitHub's PR body size limit, performance is not a concern.

Confidence Score: 5/5

This PR is safe to merge — it is a minimal, well-reasoned security hardening change with no functional risk.
The change is a four-line, self-contained loop that is provably terminating and produces identical results for all normal inputs. It correctly resolves the CodeQL alert without introducing any new complexity or side-effects.
No files require special attention.

Important Files Changed

Filename	Overview
.github/scripts/set-pr-status.js	Replaces single-pass HTML comment removal with a stable do-while loop to prevent incomplete multi-character sanitization; the rest of the file is unchanged.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Start: prBody]) --> B[withoutComments = body]
    B --> C[previous = withoutComments]
    C --> D["withoutComments = withoutComments\n.replace(/<!--[\s\S]*?-->/g, '')"]
    D --> E{withoutComments\n=== previous?}
    E -- No, matches found --> C
    E -- Yes, stable --> F["matchAll closes/fixes/resolves #N"]
    F --> G([Return issue numbers])

_{Last reviewed commit: f737859}

klaus993 · 2026-03-05T16:12:14Z

The do-while loop could potentially become an infinite loop if the regex replacement doesn't converge.

The loop always terminates: each iteration either removes at least one  match (strictly shortening the string) or leaves the string unchanged (and the loop exits). Since the string length is non-increasing and bounded below by 0, convergence is guaranteed — there's no infinite loop risk.

A single regex replacement with a more comprehensive pattern can handle nested comments in one pass.

The loop-until-stable approach is the one recommended by CodeQL's own documentation for this exact alert type. It's also easier to reason about correctness compared to a more complex single-pass regex.

…ambdaclass#6322) ## Summary Fixes [CodeQL alert lambdaclass#119](https://github.com/lambdaclass/ethrex/security/code-scanning/119) (`js/incomplete-multi-character-sanitization`). ### Problem In `.github/scripts/set-pr-status.js`, `extractLinkedIssueNumbers` strips HTML comments from PR bodies before extracting "closes #N" keywords. The regex used a single-pass replacement: ```js const withoutComments = body.replace(//g, ""); ``` A single-pass replacement of `` can leave behind a valid HTML comment if the input is crafted so that removing one match creates a new one. For example, input `<!-->` would produce `` after one pass. This means a malicious PR author could hide closing keywords inside crafted comment nesting to manipulate project board status. The practical risk is low (CI context, trusted-ish input), but the fix is trivial. ### Fix Replace the single-pass `.replace()` with a loop that repeats until stable: ```js let withoutComments = body; let previous; do { previous = withoutComments; withoutComments = withoutComments.replace(//g, ""); } while (withoutComments !== previous); ``` This is the approach recommended by [CodeQL's own documentation](https://codeql.github.com/codeql-query-help/javascript/js-incomplete-multi-character-sanitization/) for this exact alert type. ## Test plan - No behavioral change for normal PR bodies — the single pass already removes all comments; the loop just adds a no-op second iteration confirming stability - Edge case: nested/overlapping comment markers like `<!-->` are now fully stripped across multiple iterations - Resolves CodeQL alert `js/incomplete-multi-character-sanitization`

…6322) ## Summary Fixes [CodeQL alert #119](https://github.com/lambdaclass/ethrex/security/code-scanning/119) (`js/incomplete-multi-character-sanitization`). ### Problem In `.github/scripts/set-pr-status.js`, `extractLinkedIssueNumbers` strips HTML comments from PR bodies before extracting "closes #N" keywords. The regex used a single-pass replacement: ```js const withoutComments = body.replace(//g, ""); ``` A single-pass replacement of `` can leave behind a valid HTML comment if the input is crafted so that removing one match creates a new one. For example, input `<!-->` would produce `` after one pass. This means a malicious PR author could hide closing keywords inside crafted comment nesting to manipulate project board status. The practical risk is low (CI context, trusted-ish input), but the fix is trivial. ### Fix Replace the single-pass `.replace()` with a loop that repeats until stable: ```js let withoutComments = body; let previous; do { previous = withoutComments; withoutComments = withoutComments.replace(//g, ""); } while (withoutComments !== previous); ``` This is the approach recommended by [CodeQL's own documentation](https://codeql.github.com/codeql-query-help/javascript/js-incomplete-multi-character-sanitization/) for this exact alert type. ## Test plan - No behavioral change for normal PR bodies — the single pass already removes all comments; the loop just adds a no-op second iteration confirming stability - Edge case: nested/overlapping comment markers like `<!-->` are now fully stripped across multiple iterations - Resolves CodeQL alert `js/incomplete-multi-character-sanitization`

…ambdaclass#6322) ## Summary Fixes [CodeQL alert lambdaclass#119](https://github.com/lambdaclass/ethrex/security/code-scanning/119) (`js/incomplete-multi-character-sanitization`). ### Problem In `.github/scripts/set-pr-status.js`, `extractLinkedIssueNumbers` strips HTML comments from PR bodies before extracting "closes #N" keywords. The regex used a single-pass replacement: ```js const withoutComments = body.replace(//g, ""); ``` A single-pass replacement of `` can leave behind a valid HTML comment if the input is crafted so that removing one match creates a new one. For example, input `<!-->` would produce `` after one pass. This means a malicious PR author could hide closing keywords inside crafted comment nesting to manipulate project board status. The practical risk is low (CI context, trusted-ish input), but the fix is trivial. ### Fix Replace the single-pass `.replace()` with a loop that repeats until stable: ```js let withoutComments = body; let previous; do { previous = withoutComments; withoutComments = withoutComments.replace(//g, ""); } while (withoutComments !== previous); ``` This is the approach recommended by [CodeQL's own documentation](https://codeql.github.com/codeql-query-help/javascript/js-incomplete-multi-character-sanitization/) for this exact alert type. ## Test plan - No behavioral change for normal PR bodies — the single pass already removes all comments; the loop just adds a no-op second iteration confirming stability - Edge case: nested/overlapping comment markers like `<!-->` are now fully stripped across multiple iterations - Resolves CodeQL alert `js/incomplete-multi-character-sanitization`

Prior commit (2496c2b) trimmed `test/tests/levm/eip7708_tests.rs` to a single constants check, on the assumption the file was fully new in this PR. It wasn't — 25 of the 28 tests had been on `main` for a while (landed in PR #6322, well before this branch diverged). Only 3 tests were genuinely added on this PR: - test_burn_logs_emitted_in_lex_ascending_order_three_accounts - test_coinbase_priority_fee_does_not_emit_transfer_log - test_multi_selfdestruct_dest_emits_single_burn_log_with_combined_balance Those three are covered by the EELS burn/transfer-logs ef-tests we now run via the blockchain runner, so dropping them is correct. The other 25 belong to main and should not have been touched here. Restored to main verbatim and added the new `Environment::is_system_call` field introduced by this PR.

fix(ci): loop HTML comment removal to prevent incomplete sanitization

f737859

Fixes CodeQL alert #119 (js/incomplete-multi-character-sanitization).

klaus993 requested a review from a team as a code owner March 5, 2026 16:04

github-actions Bot assigned klaus993 Mar 5, 2026

ilitteri approved these changes Mar 5, 2026

View reviewed changes

edg-l approved these changes Mar 5, 2026

View reviewed changes

iovoid reviewed Mar 5, 2026

View reviewed changes

Comment thread .github/scripts/set-pr-status.js

ilitteri requested a review from iovoid March 5, 2026 16:23

iovoid approved these changes Mar 5, 2026

View reviewed changes

iovoid changed the title ~~fix(ci): loop HTML comment removal to prevent incomplete sanitization~~ ci(l1): loop HTML comment removal to prevent incomplete sanitization Mar 5, 2026

iovoid enabled auto-merge March 5, 2026 17:24

iovoid added this pull request to the merge queue Mar 5, 2026

github-actions Bot added the L1 Ethereum client label Mar 5, 2026

github-project-automation Bot added this to ethrex_l1 Mar 5, 2026

github-project-automation Bot moved this to In Review in ethrex_l1 Mar 5, 2026

github-actions Bot removed the L1 Ethereum client label Mar 5, 2026

Merged via the queue into main with commit 4b47358 Mar 5, 2026
84 of 88 checks passed

iovoid deleted the fix/codeql-119-incomplete-sanitization branch March 5, 2026 17:55

github-project-automation Bot moved this from In Review to Done in ethrex_l1 Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(l1): loop HTML comment removal to prevent incomplete sanitization#6322

ci(l1): loop HTML comment removal to prevent incomplete sanitization#6322
iovoid merged 1 commit into
mainfrom
fix/codeql-119-incomplete-sanitization

klaus993 commented Mar 5, 2026

Uh oh!

github-actions Bot commented Mar 5, 2026

Uh oh!

github-actions Bot commented Mar 5, 2026

Uh oh!

github-actions Bot commented Mar 5, 2026

Uh oh!

greptile-apps Bot commented Mar 5, 2026

Important Files Changed

Uh oh!

Uh oh!

klaus993 commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

klaus993 commented Mar 5, 2026

Summary

Problem

Fix

Test plan

Uh oh!

github-actions Bot commented Mar 5, 2026

🤖 Kimi Code Review

Review of .github/scripts/set-pr-status.js

Issues:

Suggested improvement:

Uh oh!

github-actions Bot commented Mar 5, 2026

🤖 Claude Code Review

PR Review: fix(ci): loop HTML comment removal to prevent incomplete sanitization

Summary

Correctness

ReDoS / performance

Termination

Minor suggestions

Verdict

Uh oh!

github-actions Bot commented Mar 5, 2026

🤖 Codex Code Review

Uh oh!

greptile-apps Bot commented Mar 5, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

klaus993 commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Review of `.github/scripts/set-pr-status.js`

PR Review: `fix(ci): loop HTML comment removal to prevent incomplete sanitization`