Steer agents to MCP and load user PATH in setup drop-through#205
Conversation
Two fixes for an awkward agent UX where the agent took ~10 steps through the `aimx` CLI to send a single email — including triggering errors to discover the configured domain — instead of using the MCP tools. Skill docs: - mcp-tools.md: document the `address` field that `mailbox_list` already returns. The substring after `@` is the configured primary domain; one MCP call gives both mailboxes and domain. Update the example JSON. Add a comment on the `email_send` example clarifying that `from_mailbox` is the local part only and the daemon derives the full address. - troubleshooting.md: rewrite the `DOMAIN` row and the matching prose entry to note the error is unreachable via MCP and point at `mailbox_list().address` instead of root-only `/etc/aimx/config.toml`. - aimx-primer.md: add a top "Use MCP, not the CLI" section, hoist "`from_mailbox` is the local part only" to lead the Send workflow, and add a "Do not invoke the `aimx` CLI" bullet under "What you must not do" with the registration command for the no-MCP case. - claude-code/SKILL.md.header: tighten the frontmatter description to name the `mcp__aimx__*` tools and explicitly say "Use the MCP tools, not the `aimx` CLI". Setup drop-through (root cause for the failed install): - setup.rs: switch `runuser -u USER -- argv` to `runuser -l USER -c "<quoted>"`. The login flag (`-l`) initialises HOME, SHELL, USER, LOGNAME, and PATH from the user's profile chain per runuser(1). Without it, runuser inherits sudo's scrubbed PATH and `claude` (typically at ~/.npm-global/bin or under ~/.nvm/...) is unreachable, so MCP auto-registration silently CliMissings. Reuses the existing `posix_single_quote` helper for shell-safe argv joining; promote it from private to `pub(crate)`.
uzyn
left a comment
There was a problem hiding this comment.
Changes Overview
Two-part PR. Part A (docs): Steers agents toward mcp__aimx__* tools and away from the aimx CLI — documents the address field returned by mailbox_list so agents never need to discover the configured domain via CLI errors, hoists "Use MCP, not the CLI" to the top of aimx-primer.md, rewrites the DOMAIN troubleshooting entry to note it's unreachable via MCP, and tightens the Claude Code skill frontmatter description to name the MCP tools explicitly. Part B (drop-through): Switches the aimx setup → aimx agents setup re-exec from runuser -u USER -- argv to runuser -l USER -c "<shell-quoted argv>", with the stated goal of inheriting the invoking user's PATH so claude is found and MCP auto-registration succeeds.
Scope Alignment
Part A is fully delivered and internally consistent. Verified that the address field exists in MailboxListRow (src/mailbox_list_handler.rs:54), that email_send over MCP routes through lookup_mailbox_row(...).address so DOMAIN truly is unreachable from the MCP path (src/mcp.rs:361-373), and that the catchall (*@domain) is rejected before it can break the "substring after @ = configured domain" claim.
Part B's intent is delivered as code, but its effectiveness on the canonical npm-global install path is unverified and likely incomplete — see the Blocker below. The corresponding end-to-end verification box in the PR description is left unchecked, which I read as the author flagging the same gap.
Potential Bugs
Blocker — runuser -l does not reliably populate ~/.npm-global/bin on Ubuntu defaults
src/setup.rs:2836-2855 and the doc comment at src/setup.rs:2837-2844 claim the login flag "initialises HOME, SHELL, USER, LOGNAME, and PATH from the user's profile chain (~/.bash_profile → ~/.profile → ~/.bashrc on Ubuntu defaults)." That chain is correct only if .bashrc actually runs. On stock Ubuntu, the default ~/.bashrc (seeded from /etc/skel/.bashrc) starts with:
# If not running interactively, don't do anything
case $- in
*i*) ;;
*) return;;
esacrunuser -l USER -c CMD invokes the user's shell non-interactively. $- does not contain i, so this guard returns immediately and any PATH-setting code below it never runs. That includes:
export PATH=~/.npm-global/bin:$PATH— the canonical placement npm itself recommends afternpm config set prefix '~/.npm-global', which is the documented workaround for "no sudo for npm global". This is the exact case the PR description names: "claude(typically at~/.npm-global/bin…) is unreachable".- The NVM loader (
[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"), which the official NVM installer puts in.bashrc. Same case the PR description names: "or under~/.nvm/...".
Net effect: for the two install paths the PR description explicitly calls out, the fix has no effect — claude is still off PATH inside the dropped-through process and MCP auto-registration still falls into CliMissing silently. The unchecked end-to-end test plan item ("install Claude Code via npm i -g @anthropic-ai/claude-code") is exactly the scenario this gap manifests in.
The fix does work for users whose claude lives in a directory .profile adds unconditionally (e.g. ~/.local/bin via the official Anthropic curl installer's default), and for hosts where the operator manually moved their npm-global / nvm export into .profile. But for the most common npm path it doesn't.
Suggested investigations (pick whichever survives end-to-end on a fresh Ubuntu VM with npm i -g @anthropic-ai/claude-code):
- Run an interactive shell explicitly:
runuser -l USER -s /bin/bash -- -ic "<quoted-argv>", or itsrunuser -l USER -c "bash -ic '<quoted-argv>'"equivalent.-imakes$-containiso the early-return guard falls through and.bashrcruns in full. Cost: prints PS1-related noise to stderr on some setups, may complain about job control without a tty. - Source
.bashrcexplicitly:runuser -l USER -c "source ~/.bashrc 2>/dev/null; <quoted-argv>". More targeted than-i, but couples to bash specifically (breaks for users on zsh/fish login shells). - Probe
claudevia the user'scommand -vbefore re-exec, and if missing, surface the explicit registration command instead of silently dropping through. This is a fallback for the cases (1) and (2) still miss.
I'd want option 1 or 2 verified with an actual npm config set prefix '~/.npm-global' install on a fresh Ubuntu VM before merging — the test plan box for that scenario is the right one to require.
Non-blocker — runuser -l changes cwd to the user's HOME, but a relative --data-dir is passed through verbatim
src/setup.rs:2410 stores explicit_data_dir from clap as a PathBuf with no canonicalization, and build_agents_setup_argv (src/setup.rs:2755-2758) appends it verbatim. Under the previous runuser -u USER -- argv form, the dropped-through process inherited the parent's cwd, so a relative --data-dir resolved against the same cwd. With -l, the dropped-through process starts in the invoking user's HOME, so aimx setup --data-dir ./mydata would resolve ./mydata against $HOME instead of root's cwd.
In practice operators pass absolute paths, and aimx setup is run with sudo so the difference is rarely observable. Still, the assumption is now load-bearing. Trivial hardening: let explicit_data_dir = data_dir.map(|p| p.canonicalize().unwrap_or_else(|_| p.to_path_buf())); (or fail-fast on a relative path that doesn't exist yet).
Test Coverage
posix_single_quote_escapes_embedded_quoteand thebuild_agents_setup_argv*tests cover the helpers, but they exercise the inputs to the newrunuserinvocation, not the choice of-lover-uor the shape of the spawned command.- The previous regression — passing the literal string
/proc/self/exetorunuser— is locked in by a source-grep test (build_agents_setup_argv_uses_resolved_exe_pathatsrc/setup.rs:6278-6314). The-lswitch deserves an analogous guard so a future refactor can't silently revert to-u --(which would reproduce the original bug). A simple body-grep that thedrop_through_to_agents_setupbody containsarg("-l")and does NOT contain the pairarg("-u").arg(&sudo_user).arg("--")is sufficient. Non-blocker. - The Blocker scenario above (
.bashrcearly-returns on non-interactive shell) cannot easily be unit-tested, but it's exactly what an end-to-end test on a fresh Ubuntu VM would catch — and that's the unchecked item in the test plan. Don't tick that box without actually running it.
Code Quality
- Stale doc comments. The function-level doc on
drop_through_to_agents_setupstill describes the old shape:(/// 1. If `$SUDO_USER` is set and non-empty → `runuser -u $SUDO_USER -- /// /proc/self/exe agents setup [--data-dir …]` via `Command::status()`src/setup.rs:2762-2768). Note this also contains/proc/self/exe, which the regression-guard test forbids in the function body — the doc comment should not perpetuate the outdated example either. The header onbuild_agents_setup_argv(src/setup.rs:2726) similarly says "Build the argv we'd hand torunuser -u <sudo_user> --". Both should be updated to describe therunuser -l USER -c "<shell-quoted>"shape and explain why-lis required (PATH, profile chain, the.bashrcguard caveat from the Blocker if you choose to keep-lafter addressing it). - Promotion of
posix_single_quotetopub(crate)and reuse fromsetup.rsis the right call. to_string_lossy()on theOsStringargv elements is a small lossiness on paths with non-UTF-8 bytes; forcurrent_exe()plus--data-dirthis is fine in practice on Linux installs. Not worth changing.
Summary and Recommended Actions
Overall verdict: Needs minor fixes — the docs half is solid, but the setup drop-through fix likely doesn't address the install path it's named after.
Blockers:
- Verify the
runuser -lchange actually fixes MCP auto-registration on a fresh Ubuntu VM where Claude Code was installed vianpm i -g @anthropic-ai/claude-codewith thenpm config set prefix '~/.npm-global'pattern. If it doesn't (which is what stock-Ubuntu.bashrc's non-interactive early-return suggests), switch to an interactive shell invocation (bash -icorrunuser -l ... -s /bin/bash -- -ic) or otherwise force.bashrcto run, and re-verify. Until that VM run produces the success line for MCP registration (not the silent fallback hint), the PR has not actually shipped its fix.
Non-blockers:
- Update the stale doc comments on
drop_through_to_agents_setup(src/setup.rs:2762-2768) andbuild_agents_setup_argv(src/setup.rs:2726) to describe the newrunuser -l USER -c "<shell-quoted>"shape; drop the outdated/proc/self/exeexample in the doc. - Add a regression guard test analogous to
build_agents_setup_argv_uses_resolved_exe_paththat source-greps thedrop_through_to_agents_setupbody forarg("-l")and asserts the old-u … --shape is absent. - Canonicalize
explicit_data_dirat the top ofrun_setupso the cwd switch under-ldoesn't change relative-path resolution semantics.
Nice-to-haves:
- None.
…oads
The previous fix used `runuser -l USER -c CMD`, which the doc comments
claimed runs the user's full profile chain. On stock Ubuntu this is
only true for `~/.bash_profile` / `~/.profile` — `~/.bashrc` (seeded
from `/etc/skel/.bashrc`) opens with a non-interactive guard:
case $- in *i*) ;; *) return;; esac
`runuser -l ... -c ...` runs the user's login shell non-interactively,
so `$-` does not contain `i`, the guard returns immediately, and any
PATH-setting code below it never runs. That includes the canonical
`export PATH=~/.npm-global/bin:$PATH` placement after
`npm config set prefix '~/.npm-global'` (the path Claude Code's
official install lands `claude` at) and the NVM loader. Net effect:
for the install paths this PR is named after, the previous fix had
no effect — MCP auto-registration still silently fell into
`CliMissing`.
Switch the invocation to:
runuser -l USER -s /bin/bash -- -ic "<shell-quoted argv>"
The `-i` flag forces an interactive bash, so `$-` contains `i`, the
non-interactive guard falls through, and `~/.bashrc` runs in full.
Pin the shell to `/bin/bash` via `-s` rather than letting `runuser`
pick the user's login shell from `/etc/passwd` — a user on zsh or
fish wouldn't source `~/.bashrc` and would re-introduce the gap.
The existing `aimx agents setup` `CliMissing` branch already serves
as the option-3 safety net the reviewer suggested: if `claude` is
still unreachable after the bash-interactive load, the operator sees
the explicit `claude mcp add --scope user aimx -- ...` command rather
than a silent fallback.
Other fixes in this commit:
- Update both stale doc comments on `drop_through_to_agents_setup`
and `build_agents_setup_argv` to describe the new shape and
document the `~/.bashrc` non-interactive guard caveat that
motivated the change. The previous comments still referenced
`/proc/self/exe` (forbidden by an existing regression-guard test)
and the older `runuser -u ... -- argv` form.
- Add `drop_through_uses_interactive_bash_login_shell` regression
guard test that source-greps the function body, asserts the new
shape (`-l`, `-s /bin/bash`, `-ic`) is present, and forbids the
old shape (`-u USER -- argv` and the literal `/proc/self/exe`
string). Locks both this fix and the original cycle-5 fix in.
- Canonicalize `explicit_data_dir` so the dropped-through child
resolves it identically regardless of cwd. Under `runuser -l` the
child's cwd switches to the user's HOME, so a relative
`--data-dir ./foo` would resolve against `$HOME` instead of root's
cwd. Falls back to the original `PathBuf` when the path doesn't
exist yet (fresh-install case) so the operator's intended absolute
path still passes through.
|
Cycle 2: addressing the four review items. Blocker —
|
uzyn
left a comment
There was a problem hiding this comment.
Re-Review (after b9c1fa1 — addressing prior feedback on c853a57)
Resolution of previously-flagged issues
Blocker (runuser -l does not reliably populate ~/.npm-global/bin) — Resolved
The new shape is:
runuser -l <sudo_user> -s /bin/bash -- -ic "<shell-quoted argv>"
(src/setup.rs:2884-2892).
The -i forces an interactive bash, so $- contains i, the stock-Ubuntu ~/.bashrc non-interactive guard (case $- in *i*) ;; *) return;; esac) falls through, and the canonical PATH edits below it (notably export PATH=~/.npm-global/bin:$PATH and the NVM loader) actually run. The -s /bin/bash pin is the right call — without it, runuser would consult /etc/passwd's shell field and a user on zsh/fish would skip ~/.bashrc entirely, re-opening the gap. I walked the quoting chain mentally (outer runuser argv → bash -ic <cmd> argument → bash parses single-quoted tokens) and it's bulletproof for paths containing spaces, double quotes, and embedded single quotes — posix_single_quote correctly emits the '\'' close-escape-reopen pattern, and bash's single-quote parser turns it back into the literal byte sequence.
The implementer's decision to skip option 3 (command -v claude pre-flight probe) is defensible: I traced the agents_setup::run_with_env flow and confirmed that when bash -ic succeeds but claude is still off PATH (e.g., npm-global never set up at all), try_auto_register_mcp for claude-code returns McpRegistration::CliMissing (src/agents_setup.rs:1045-1050), which prints the explicit claude mcp add --scope user aimx -- /usr/local/bin/aimx mcp registration command. Duplicating the probe in the parent would print the same hint twice. Accept.
NB1 (stale doc comments) — Resolved
- Function-level doc on
drop_through_to_agents_setup(src/setup.rs:2773-2802) now describes the new shape and explains the-irequirement and the~/.bashrcnon-interactive guard caveat. - Header on
build_agents_setup_argv(src/setup.rs:2736-2740) now readsrunuser -l <sudo_user> -s /bin/bash -- -ic "<...>". - Remaining
/proc/self/exementions are all (a) inside thecurrent_exe()call-site comment that explains why we resolve viacurrent_exe()instead of passing the literal string, or (b) in test bodies that explicitly forbid the literal — both legitimate.
NB2 (regression guard test) — Resolved
drop_through_uses_interactive_bash_login_shell (src/setup.rs:6367-6453) source-greps the function body and asserts:
- Present:
.arg("-l"),.arg("-s")+/bin/bash,.arg("-ic") - Absent: the combined old shape (
.arg("-u")+.arg(&sudo_user)+.arg("--")+.args(&argv)) - Absent in code (not comments):
.arg("/proc/self/exe"),Path::new("/proc/self/exe"),PathBuf::from("/proc/self/exe")
I mentally ran the test against (a) a stub revert to runuser -u USER -- argv — fails the absent-old-shape assertion, fails the present--l assertion, fails the present--ic assertion. Catches the revert. (b) A refactor that splits the runuser invocation into a helper function called from drop_through_to_agents_setup — the body grep would no longer find the .arg(...) calls and the test would falsely fail. This is a known tradeoff of source-grep regression guards but acceptable for this case. Test passes locally (cargo test drop_through_uses_interactive_bash_login_shell).
NB3 (canonicalize explicit_data_dir) — Partially resolved
run_setup (src/setup.rs:2419-2420) now canonicalizes with a fallback:
let explicit_data_dir: Option<PathBuf> =
data_dir.map(|p| p.canonicalize().unwrap_or_else(|_| p.to_path_buf()));This fixes the relative-path-that-exists case (e.g., --data-dir ./existing-dir against root's cwd → resolved to the absolute path before drop-through). The fresh-install case (--data-dir ./new-dir where the path doesn't exist yet) still falls back to the original relative PathBuf and resolves against the user's $HOME after the runuser -l cwd switch. The implementer documented this tradeoff in the comment. In practice operators pass absolute paths to sudo aimx setup, so this edge case is minor. Acceptable.
Build/test verification
cargo build --bin aimx: clean.cargo clippy -- -D warnings: clean.cargo fmt -- --check: clean.cargo test setup::: 318 passed, 0 failed.cargo test agents_setup::: 118 passed, 0 failed.cargo test drop_through_uses_interactive_bash_login_shell: passes.- Banned-token regex on PR HEAD
src/setup.rs: zero hits.
Outstanding
The only un-checked test plan box is the Ubuntu VM end-to-end verification (npm i -g @anthropic-ai/claude-code → aimx setup → confirm MCP registration succeeds end-to-end). The implementer correctly notes this is environmentally infeasible from the impl box (macOS dev) and that AIMX targets ubuntu-latest only per CI policy. Recommend the maintainer run that smoke test on a fresh Ubuntu VM after merge — the failure mode (silent CliMissing fallback) is observable from the wizard output (operator would see the manual hint instead of the green success line), so the post-merge verification is low-cost.
New issues found
None. The diff is tightly scoped to the three fix areas and introduces no incidental changes.
Summary and Recommended Actions
Overall verdict: Ready to merge. The Blocker is genuinely addressed (interactive bash forces ~/.bashrc to run, which is the only way the canonical npm-global / NVM PATH edits get sourced on stock Ubuntu), all three non-blockers are resolved, the regression guard locks the new shape in, and the build is green. The remaining un-checked Ubuntu VM smoke test is environmentally appropriate to defer to post-merge.
Recommended merge commit message:
Steer agents to MCP and load user PATH in setup drop-through (#205)
Two fixes for an awkward agent UX where the agent took ~10 CLI steps
to send a single email — including triggering errors to discover the
configured domain — instead of using the MCP tools.
Skill docs:
- mcp-tools.md: document the `address` field returned by
`mailbox_list` (one MCP call yields both mailboxes and the
configured domain).
- troubleshooting.md: rewrite the `DOMAIN` row to point at
`mailbox_list().address` instead of root-only
`/etc/aimx/config.toml`.
- aimx-primer.md: hoist "Use MCP, not the CLI" to the top, lead the
Send workflow with "`from_mailbox` is the local part only", forbid
invoking the `aimx` CLI under "What you must not do".
- claude-code/SKILL.md.header: name the `mcp__aimx__*` tools in the
frontmatter description and explicitly direct agents to MCP, not
the CLI.
Setup drop-through:
- setup.rs: replace `runuser -u USER -- argv` with
`runuser -l USER -s /bin/bash -- -ic "<shell-quoted argv>"`.
The login flag (`-l`) initialises HOME/SHELL/USER/LOGNAME and
the interactive flag (`-i`) forces `~/.bashrc` to run in full —
bypassing stock Ubuntu's non-interactive early-return guard so the
canonical PATH edits (`~/.npm-global/bin`, NVM loader) reach the
child. Pin the shell to `/bin/bash` via `-s` so users on zsh/fish
don't reintroduce the gap. Promote `posix_single_quote` to
`pub(crate)` for shell-safe argv joining.
- setup.rs: canonicalize `explicit_data_dir` so the dropped-through
child resolves it identically regardless of cwd (with a fallback
to the original PathBuf for fresh-install paths that don't exist
yet).
- Regression guard test `drop_through_uses_interactive_bash_login_shell`
source-greps the function body for `-l`, `-s /bin/bash`, `-ic`
and forbids the old `-u USER -- argv` shape plus the literal
`/proc/self/exe` string.
Pulls in main's worktree-skill-improv changes (PR #205) and resolves the conflict in agents/common/aimx-primer.md where Sprint 4's Markdown-by-default rewrite of the "Send an email" section overlapped with main's update describing `from_mailbox` as the local part and the `sudo aimx mailboxes create --owner` cross-user provisioning path. Resolution keeps both substantive changes: the Markdown semantics explanation from the html-email track, the cross-user mailbox provisioning note from main, and the existing primer-wide statement about `from_mailbox` being the local part (already documented in the "Use MCP, not the CLI" section, so the duplicate paragraph from main's section 2 is dropped). Tightened the email_send / email_reply tool bullets and the worked example's escape-hatches paragraph to keep the primer at exactly 500 lines (the soft cap enforced by agents_setup::tests::primer_line_count_within_target_range).
Summary
Two fixes for an awkward agent UX where the agent took ~10 CLI steps to send a single email — including triggering errors to discover the configured domain — instead of using the MCP tools.
Part A — Skill docs make MCP the obvious path:
mcp-tools.md: document theaddressfield thatmailbox_listalready returns. One MCP call gives both mailboxes and the configured domain (the substring after@). Update the example JSON. Add a comment on theemail_sendexample clarifying thatfrom_mailboxis the local part only and the daemon derives the full address.troubleshooting.md: rewrite theDOMAINrow and the matching prose entry to note the error is unreachable via MCP and point atmailbox_list().addressinstead of root-only/etc/aimx/config.toml.aimx-primer.md: add a top "Use MCP, not the CLI" section, hoist "from_mailboxis the local part only" to lead the Send workflow, and add a "Do not invoke theaimxCLI" bullet under "What you must not do" with the registration command for the no-MCP case.claude-code/SKILL.md.header: tighten the frontmatter description to name themcp__aimx__*tools and explicitly say to use the MCP tools, not theaimxCLI.Part B — Setup drop-through (root cause: when
install.shdrops back to the user,claudewas missing from PATH so MCP auto-registration silently fell into theCliMissingbranch):setup.rs: switchrunuser -u USER -- argvtorunuser -l USER -c "<quoted>". The login flag (-l) initialises HOME, SHELL, USER, LOGNAME, and PATH from the user's profile chain perrunuser(1). Without it, runuser inherits sudo's scrubbedsecure_pathandclaude(typically at~/.npm-global/binor under~/.nvm/...) is unreachable. Reuses the existingposix_single_quotehelper (promoted topub(crate)) for shell-safe argv joining.Test plan
cargo buildcleancargo clippy -- -D warningscleancargo fmt -- --checkcleanposix_single_quote_escapes_embedded_quote, allbuild_agents_setup_argv*testsagents/aimx agents setup claude-code --printagainst a tempdir HOME shows: new frontmatter description leads, "Use MCP, not the CLI" section appears,addressrow inmailbox_listtable, rewrittenDOMAINtroubleshooting entries, "Do not invoke theaimxCLI" bulletnpm i -g @anthropic-ai/claude-code(soclaudelives at~/.npm-global/bin/), then run the install script. Expect the success line for MCP registration (not the silent fallback hint), andmcp__aimx__*tools visible in a fresh Claude Code session asubuntuwith no manual intervention.mcp__aimx__mailbox_create, second ismcp__aimx__email_sendwithfrom_mailbox: "daily.finance", total Bash calls related to aimx: zero.Note: 3 pre-existing
uds_authztest failures on macOS (uid 4294967294 resolves tonobody) are unrelated and per CI policy AIMX targets ubuntu-latest only.