Observed
When the input contains a bash line comment (# starting a token, comment runs to end-of-line per POSIX), the parser appears to treat the comment text as regular tokens. The verb chain extracted from a multi-line script with a leading comment is the comment text rather than the actual command.
Repro from production (Netclaw consumer)
The agent issued a shell_execute call with this command body (from a real Slack session, 2026-05-12):
# Extract all unique branch names from worktrees
git -C ~/repositories/stannardlabs/netclaw worktree list | awk '{print $NF}' | tr -d '[]' | sort -u
The downstream approval prompt rendered as:
Approve # Extract in ~/repositories/stannardlabs/netclaw?
The displayed verb is # Extract — the first two whitespace-separated tokens of the comment line. The expected verb is git worktree list (BashArity collapses git to a 2-token verb, plus the actual worktree subcommand following the -C <path> flag-with-value).
Note: this specific report was traced through Netclaw's v2 ShellTokenizer (the legacy parser still consulted by the prompt builder), but ShellSyntaxTree's BashLexer would have the same gap if comments aren't handled — and consumers migrating to ShellSyntaxTree as the primary parser need this fixed.
Expected behavior
Per POSIX (and bash) shell grammar:
- A
# outside of single quotes, double quotes, or word characters starts a comment that runs to the next newline.
- Comments produce no tokens — the lexer should skip them entirely.
- A multi-line input where the first non-comment statement is
git -C <path> worktree list | awk ... should parse as that statement (with the pipe-chained awk/tr/sort clauses), with no influence from the preceding comment.
What this affects
BashLexer token output: comment text should not produce WORD tokens.
BashParser: should not see comment content as part of any clause.
ParsedCommand.Source should retain the original input verbatim per the existing contract; it's only the structured AST that excludes comment content.
- Consumers that walk
ParsedCommand.Clauses to extract verb chains for security gates or audit logs get the wrong verb chain when the input has a leading or interleaved comment.
Suggested test cases for the corpus
# Leading line comment, single command
input: "# fetch the latest\ngit pull"
expected: one clause with verb [git, pull]
# Inline mid-line comment (after whitespace, before newline)
input: "git pull # update local"
expected: one clause with verb [git, pull]; comment after the command is dropped
# Comment-only input
input: "# just a note"
expected: zero clauses (or IsUnparseable=true with reason "no executable statements")
# Comment between commands in a script
input: "git pull\n# now build\ndotnet build"
expected: two clauses with verbs [git, pull] and [dotnet, build]
# `#` inside double quotes — NOT a comment
input: "echo \"hash is #1234\""
expected: one clause with verb [echo], one literal arg \"hash is #1234\"
# `#` inside single quotes — NOT a comment
input: "echo 'use #foo'"
expected: one clause with verb [echo], one literal arg 'use #foo'
# `#` at start of a token mid-command (not preceded by whitespace) — debatable
# but bash treats this as a comment iff `#` is the first character of a "word"
# AND it's preceded by whitespace OR start-of-input.
# e.g. "echo abc#def" → echo gets one arg "abc#def" (no comment)
# e.g. "echo abc #def" → echo gets one arg "abc", "#def" starts comment
SPEC.md gap
SPEC.md §5 (Tokenization Rules) doesn't currently mention comments. The fix should:
- Add a "Comment handling" subsection to §5 documenting the rule (
# starting a word and not inside quotes begins a comment to end-of-line; comments produce no tokens).
- Update the BNF in §4 to be explicit that comments are whitespace-equivalent at the lexer level (don't appear in the grammar).
- Add the test cases above to
tests/Corpus/bash/.
Severity
Low for parser correctness (comments are degenerate input), but medium for downstream consumers — agents naturally include explanatory comments in scripts they author for human review (especially scripts they propose to run via a single approval per the scripts-as-units-of-approval pattern), and surfacing comment text in approval prompts confuses the user about what they're actually approving.
Observed
When the input contains a bash line comment (
#starting a token, comment runs to end-of-line per POSIX), the parser appears to treat the comment text as regular tokens. The verb chain extracted from a multi-line script with a leading comment is the comment text rather than the actual command.Repro from production (Netclaw consumer)
The agent issued a
shell_executecall with this command body (from a real Slack session, 2026-05-12):The downstream approval prompt rendered as:
The displayed verb is
# Extract— the first two whitespace-separated tokens of the comment line. The expected verb isgit worktree list(BashArity collapsesgitto a 2-token verb, plus the actualworktreesubcommand following the-C <path>flag-with-value).Note: this specific report was traced through Netclaw's v2 ShellTokenizer (the legacy parser still consulted by the prompt builder), but ShellSyntaxTree's
BashLexerwould have the same gap if comments aren't handled — and consumers migrating to ShellSyntaxTree as the primary parser need this fixed.Expected behavior
Per POSIX (and bash) shell grammar:
#outside of single quotes, double quotes, or word characters starts a comment that runs to the next newline.git -C <path> worktree list | awk ...should parse as that statement (with the pipe-chainedawk/tr/sortclauses), with no influence from the preceding comment.What this affects
BashLexertoken output: comment text should not produce WORD tokens.BashParser: should not see comment content as part of any clause.ParsedCommand.Sourceshould retain the original input verbatim per the existing contract; it's only the structured AST that excludes comment content.ParsedCommand.Clausesto extract verb chains for security gates or audit logs get the wrong verb chain when the input has a leading or interleaved comment.Suggested test cases for the corpus
SPEC.md gap
SPEC.md§5 (Tokenization Rules) doesn't currently mention comments. The fix should:#starting a word and not inside quotes begins a comment to end-of-line; comments produce no tokens).tests/Corpus/bash/.Severity
Low for parser correctness (comments are degenerate input), but medium for downstream consumers — agents naturally include explanatory comments in scripts they author for human review (especially scripts they propose to run via a single approval per the scripts-as-units-of-approval pattern), and surfacing comment text in approval prompts confuses the user about what they're actually approving.