Bash line comments not recognized — verb chain extracted from comment text

## Observed

When the input contains a bash line comment (`#` starting a token, comment runs to end-of-line per POSIX), the parser appears to treat the comment text as regular tokens. The verb chain extracted from a multi-line script with a leading comment is the comment text rather than the actual command.

## Repro from production (Netclaw consumer)

The agent issued a `shell_execute` call with this command body (from a real Slack session, 2026-05-12):

```bash
# Extract all unique branch names from worktrees
git -C ~/repositories/stannardlabs/netclaw worktree list | awk '{print $NF}' | tr -d '[]' | sort -u
```

The downstream approval prompt rendered as:

> Approve `# Extract` in `~/repositories/stannardlabs/netclaw`?

The displayed verb is `# Extract` — the first two whitespace-separated tokens of the comment line. The expected verb is `git worktree list` (BashArity collapses `git` to a 2-token verb, plus the actual `worktree` subcommand following the `-C <path>` flag-with-value).

Note: this specific report was traced through Netclaw's v2 ShellTokenizer (the legacy parser still consulted by the prompt builder), but ShellSyntaxTree's `BashLexer` would have the same gap if comments aren't handled — and consumers migrating to ShellSyntaxTree as the primary parser need this fixed.

## Expected behavior

Per POSIX (and bash) shell grammar:

- A `#` outside of single quotes, double quotes, or word characters starts a comment that runs to the next newline.
- Comments produce no tokens — the lexer should skip them entirely.
- A multi-line input where the first non-comment statement is `git -C <path> worktree list | awk ...` should parse as that statement (with the pipe-chained `awk`/`tr`/`sort` clauses), with no influence from the preceding comment.

## What this affects

- `BashLexer` token output: comment text should not produce WORD tokens.
- `BashParser`: should not see comment content as part of any clause.
- `ParsedCommand.Source` should retain the original input verbatim per the existing contract; it's only the structured AST that excludes comment content.
- Consumers that walk `ParsedCommand.Clauses` to extract verb chains for security gates or audit logs get the wrong verb chain when the input has a leading or interleaved comment.

## Suggested test cases for the corpus

```
# Leading line comment, single command
input:    "# fetch the latest\ngit pull"
expected: one clause with verb [git, pull]

# Inline mid-line comment (after whitespace, before newline)
input:    "git pull   # update local"
expected: one clause with verb [git, pull]; comment after the command is dropped

# Comment-only input
input:    "# just a note"
expected: zero clauses (or IsUnparseable=true with reason "no executable statements")

# Comment between commands in a script
input:    "git pull\n# now build\ndotnet build"
expected: two clauses with verbs [git, pull] and [dotnet, build]

# `#` inside double quotes — NOT a comment
input:    "echo \"hash is #1234\""
expected: one clause with verb [echo], one literal arg \"hash is #1234\"

# `#` inside single quotes — NOT a comment
input:    "echo 'use #foo'"
expected: one clause with verb [echo], one literal arg 'use #foo'

# `#` at start of a token mid-command (not preceded by whitespace) — debatable
# but bash treats this as a comment iff `#` is the first character of a "word"
# AND it's preceded by whitespace OR start-of-input.
# e.g. "echo abc#def" → echo gets one arg "abc#def" (no comment)
# e.g. "echo abc #def" → echo gets one arg "abc", "#def" starts comment
```

## SPEC.md gap

`SPEC.md` §5 (Tokenization Rules) doesn't currently mention comments. The fix should:

1. Add a "Comment handling" subsection to §5 documenting the rule (`#` starting a word and not inside quotes begins a comment to end-of-line; comments produce no tokens).
2. Update the BNF in §4 to be explicit that comments are whitespace-equivalent at the lexer level (don't appear in the grammar).
3. Add the test cases above to `tests/Corpus/bash/`.

## Severity

Low for parser correctness (comments are degenerate input), but medium for downstream consumers — agents naturally include explanatory comments in scripts they author for human review (especially scripts they propose to run via a single approval per the scripts-as-units-of-approval pattern), and surfacing comment text in approval prompts confuses the user about what they're actually approving.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bash line comments not recognized — verb chain extracted from comment text #25

Observed

Repro from production (Netclaw consumer)

Expected behavior

What this affects

Suggested test cases for the corpus

SPEC.md gap

Severity

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bash line comments not recognized — verb chain extracted from comment text #25

Description

Observed

Repro from production (Netclaw consumer)

Expected behavior

What this affects

Suggested test cases for the corpus

SPEC.md gap

Severity

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions