Skip to content

fix: skip bash line comments at word boundary (#25)#26

Merged
Aaronontheweb merged 2 commits into
devfrom
fix/bash-line-comments-25
May 12, 2026
Merged

fix: skip bash line comments at word boundary (#25)#26
Aaronontheweb merged 2 commits into
devfrom
fix/bash-line-comments-25

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Owner

Summary

  • Closes Bash line comments not recognized — verb chain extracted from comment text #25. BashLexer now recognizes # at a word boundary (start of input, or after whitespace, a newline, or any operator) and skips to EOL via a new internal BashTokenKind.Comment. The parser drops Comment tokens in FilterSignificant alongside Whitespace / Continuation, so comments contribute nothing to verb chains, args, redirects, or flags.
  • Honors bash semantics for non-comment #: literal inside single or double quotes, literal in the interior of an unquoted word (abc#def), literal when backslash-escaped (\#).
  • Ships as 0.1.3-alpha (Directory.Build.props + new RELEASE_NOTES.md section).

Why this matters

Before the fix, # Extract worktree branches\ngit worktree list parsed to one clause with verb [#, Extract]. Two consumer-side failure modes surfaced in Netclaw integration:

  1. Approval prompts rendered comment text as the verb ("Approve # Extract in ..."), confusing operators about what they were approving.
  2. Approval-state cache desync — two-pass verb extraction (coarse persistence-time vs. per-pipeline-segment retry-authorization) saw different verb sets for the same input, causing ToolApprovalRequiredException after the user had already clicked Approve.

What's in the change

Layer Change
Lexer BashTokenKind.Comment (internal); ConsumeLineComment helper + # dispatch in main scan loop
Parser FilterSignificant drops Comment (one OR clause)
SPEC §5 new "Comment handling" subsection; §4 BNF note that comments are whitespace-equivalent
Corpus 9 new entries (123–131) covering every case from the issue + both Netclaw repros (paths sanitized per §14)
Tests 10 lexer + 8 parser unit tests
Release VersionPrefix 0.1.2 → 0.1.3, new RELEASE_NOTES.md 0.1.3-alpha section

Two commits on the branch:

  1. fix: skip bash line comments at word boundary (#25) — main implementation
  2. chore: simplify Comment token emission — post-review cleanup: drops a per-comment ToString() allocation by matching the Value = "" pattern that Whitespace / Continuation already use; trims a verbose dispatch comment; removes two redundant test assertions.

Scope note

v0.1 does NOT treat top-level newlines as statement separators (parser splits only on && / || / ; / |). The corpus entry needing two clauses with a comment between them uses an explicit ;. The newline-as-separator gap is tracked separately in IMPLEMENTATION_PLAN.md NEXT.

Test plan

  • dotnet build -c Release — clean, 0 warnings
  • dotnet test -c Release — 387 / 387 passing (was 360; +27 new: 10 lexer, 8 parser, 9 corpus)
  • PII audit [Fact] passes on the sanitized Netclaw repros
  • PublicApiSnapshotTests passes — public API surface unchanged
  • pwsh ./scripts/Add-FileHeaders.ps1 -Verify exits 0
  • dotnet pack -c Release produces a clean ShellSyntaxTree.0.1.3-alpha.nupkg
  • CI green on this PR

`BashLexer` had no handling for `#`-line comments, so leading or trailing
explanatory text leaked into clause verb chains — `# Extract\ngit
worktree list` parsed as verb `[#, Extract]`. Downstream consumers doing
asymmetric verb-chain extraction (persistence-time vs. retry-auth) saw
different verb sets for the same input, causing approval-state cache
misses and tool failures after the user had already clicked Approve.

Lexer now recognizes `#` at a word boundary (start-of-input, or after
whitespace, a newline, or any operator) and skips to EOL, emitting a new
internal `BashTokenKind.Comment` token for source fidelity. Parser drops
Comment tokens in `FilterSignificant` alongside Whitespace/Continuation.
Quoting and escape rules are honored: `#` inside single or double
quotes is literal, mid-word `#` is literal, and `\#` is literal.

SPEC.md §4 + §5 document the rule; corpus entries 123–131 pin every
case from the issue plus both Netclaw repros (paths sanitized per §14).
Ships as 0.1.3-alpha.
Drops the per-comment `src.Slice(...).ToString()` allocation in
`ConsumeLineComment` so the Comment token uses `Value = ""` to match
the existing Whitespace / Continuation pattern — callers that need
the literal text can slice the original input via SourceStart /
SourceLength. Also tightens the dispatch comment in BashLexer's
outer scan loop and removes redundant `Assert.DoesNotContain` checks
in two lexer tests where the existing token-count assertion already
proves no Comment token is present.
@Aaronontheweb Aaronontheweb merged commit ab03596 into dev May 12, 2026
2 checks passed
@Aaronontheweb Aaronontheweb deleted the fix/bash-line-comments-25 branch May 12, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bash line comments not recognized — verb chain extracted from comment text

1 participant