fix: skip bash line comments at word boundary (#25)#26
Merged
Conversation
`BashLexer` had no handling for `#`-line comments, so leading or trailing explanatory text leaked into clause verb chains — `# Extract\ngit worktree list` parsed as verb `[#, Extract]`. Downstream consumers doing asymmetric verb-chain extraction (persistence-time vs. retry-auth) saw different verb sets for the same input, causing approval-state cache misses and tool failures after the user had already clicked Approve. Lexer now recognizes `#` at a word boundary (start-of-input, or after whitespace, a newline, or any operator) and skips to EOL, emitting a new internal `BashTokenKind.Comment` token for source fidelity. Parser drops Comment tokens in `FilterSignificant` alongside Whitespace/Continuation. Quoting and escape rules are honored: `#` inside single or double quotes is literal, mid-word `#` is literal, and `\#` is literal. SPEC.md §4 + §5 document the rule; corpus entries 123–131 pin every case from the issue plus both Netclaw repros (paths sanitized per §14). Ships as 0.1.3-alpha.
Drops the per-comment `src.Slice(...).ToString()` allocation in `ConsumeLineComment` so the Comment token uses `Value = ""` to match the existing Whitespace / Continuation pattern — callers that need the literal text can slice the original input via SourceStart / SourceLength. Also tightens the dispatch comment in BashLexer's outer scan loop and removes redundant `Assert.DoesNotContain` checks in two lexer tests where the existing token-count assertion already proves no Comment token is present.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BashLexernow recognizes#at a word boundary (start of input, or after whitespace, a newline, or any operator) and skips to EOL via a new internalBashTokenKind.Comment. The parser drops Comment tokens inFilterSignificantalongsideWhitespace/Continuation, so comments contribute nothing to verb chains, args, redirects, or flags.#: literal inside single or double quotes, literal in the interior of an unquoted word (abc#def), literal when backslash-escaped (\#).0.1.3-alpha(Directory.Build.props+ newRELEASE_NOTES.mdsection).Why this matters
Before the fix,
# Extract worktree branches\ngit worktree listparsed to one clause with verb[#, Extract]. Two consumer-side failure modes surfaced in Netclaw integration:# Extractin ..."), confusing operators about what they were approving.ToolApprovalRequiredExceptionafter the user had already clicked Approve.What's in the change
BashTokenKind.Comment(internal);ConsumeLineCommenthelper +#dispatch in main scan loopFilterSignificantdropsComment(one OR clause)VersionPrefix0.1.2 → 0.1.3, newRELEASE_NOTES.md0.1.3-alpha sectionTwo commits on the branch:
fix: skip bash line comments at word boundary (#25)— main implementationchore: simplify Comment token emission— post-review cleanup: drops a per-commentToString()allocation by matching theValue = ""pattern thatWhitespace/Continuationalready use; trims a verbose dispatch comment; removes two redundant test assertions.Scope note
v0.1 does NOT treat top-level newlines as statement separators (parser splits only on
&&/||/;/|). The corpus entry needing two clauses with a comment between them uses an explicit;. The newline-as-separator gap is tracked separately inIMPLEMENTATION_PLAN.mdNEXT.Test plan
dotnet build -c Release— clean, 0 warningsdotnet test -c Release— 387 / 387 passing (was 360; +27 new: 10 lexer, 8 parser, 9 corpus)[Fact]passes on the sanitized Netclaw reprosPublicApiSnapshotTestspasses — public API surface unchangedpwsh ./scripts/Add-FileHeaders.ps1 -Verifyexits 0dotnet pack -c Releaseproduces a cleanShellSyntaxTree.0.1.3-alpha.nupkg