[Sprint 5] Adjacent cleanups + end-to-end CI smoke#198
Conversation
Renames the test-env escape hatch AIMX_TEST_SKIP_ROOT_CHECK to
AIMX_TEST_SKIP_AUTHZ_CHECK so the name matches what it actually does
(bypass the authorize() call gating Action::HookCrud, not just a
root check). Clean break — no deprecation alias.
Finishes the format_auth_error consolidation. Deletes the
format_hook_auth_error shim in src/hooks.rs and routes both inline
authz errors and the socket-missing + non-root hook CRUD path through
auth::format_auth_error directly. The single canonical renderer in
auth.rs is now the only authz-error rendering surface.
Adds three sudo-lane integration tests that collectively exercise all
12 MCP tools against a chmod 0600 root:root config + running aimx
serve, with the MCP process spawned under runuser -u
aimx-test-alice so the bug class ("non-root MCP cannot read
root-owned config") is in scope:
- mailbox_list_full_cycle_against_root_owned_config covers
mailbox_list / mailbox_create / mailbox_delete
- email_list_full_cycle_against_root_owned_config covers
email_list / email_read / email_mark_read / email_mark_unread /
email_send / email_reply
- hook_list_full_cycle_against_root_owned_config covers
hook_list / hook_create / hook_delete
Tests are gated behind both #[ignore] and AIMX_INTEGRATION_SUDO=1.
The existing mailbox-dir-perms-isolation CI lane is extended with a
new step that builds the integration test binary and runs the three
new tests under sudo. No new CI lanes or infrastructure introduced.
uzyn
left a comment
There was a problem hiding this comment.
Sprint Goal Assessment
The sprint goal is met. The env-var rename (S5-1) is a clean break with no leftover references; format_auth_error is now the single rendering surface for every authz-error path in src/hooks.rs (S5-2); and three sudo-lane integration tests collectively exercise all 12 MCP tools against a chmod 0600 root:root config + running daemon, wired into the existing mailbox-dir-perms-isolation CI lane (S5-3). No new CI infrastructure introduced.
Acceptance Criteria Checklist
S5-1 — Rename AIMX_TEST_SKIP_ROOT_CHECK → AIMX_TEST_SKIP_AUTHZ_CHECK
- All read sites in
src/hooks.rsuse the new name (lines 163, 293). - All test fixtures use the new name (
tests/integration.rs:4785, 5117, 5179). -
CLAUDE.md"Test environment escape hatches" updated to describe what it actually skips (theauthorize()call gatingAction::HookCrud). -
grep -rn 'AIMX_TEST_SKIP_ROOT_CHECK'returns zero matches across the repo. -
cargo testgreen (1034 passed, 8 ignored on default lane).
S5-2 — Finish format_auth_error consolidation
-
src/hooks.rs::format_hook_auth_errordeleted; both pre-flight call sites (createline 167,deleteline 301) now callauth::format_auth_errordirectly. - The "socket missing + non-root" error path in both
create(line 220) anddelete(line 325) routes through the canonical renderer withAuthError::NotRoot+AuthErrorContext { surface: "aimx hooks", verb: "create"|"delete", resource: "resource" }. -
grep -nE 'fn format_.*auth_error' src/returns only the canonical renderer plus test functions. - User-visible wording for the pre-flight authz paths is unchanged (the existing unit tests
hook_auth_error_*still assert on the rendered substrings and pass via the new inline rendering helperrender_for_hooks). - [PARTIAL] Wording for the socket-missing + non-root path changed from
"daemon not running, non-root hook CRUD requires daemon"to"not authorized: aimx hooks create requires root (run with sudo)". The AC says "unchanged where reasonable" — there is no test snapshot pinning the old wording, so this is permissible, but the new message loses the "daemon is down, restart it" hint that was in the old one. Implementer flagged this for a Sprint 6 docs callout; agreed — see non-blockers. -
cargo testgreen.
S5-3 — End-to-end CI smoke against production-perm config
- Three new tests in
tests/integration.rsexercise every MCP tool againstchmod 0600 root:rootconfig + runningaimx servewith the MCP process spawned underrunuser -u aimx-test-alice:mailbox_list_full_cycle_against_root_owned_configcoversmailbox_list,mailbox_create,mailbox_delete(3)email_list_full_cycle_against_root_owned_configcoversemail_list,email_read,email_mark_read,email_mark_unread,email_send,email_reply(6)hook_list_full_cycle_against_root_owned_configcovershook_list,hook_create,hook_delete(3)- Total: 12 tools — matches PRD R44 exactly.
- Tests gated behind
AIMX_INTEGRATION_SUDO=1+geteuid() == 0+aimx-test-aliceuser existence +#[ignore](defense-in-depth quadruple gate). - CI step "Run end-to-end production-perm MCP smoke (under sudo)" added to
.github/workflows/ci.ymllines 112–125, runs only the three named tests with--test-threads=1, reusing the existingmailbox-dir-perms-isolationlane andaimx-test-aliceuser. -
prodperm_assert_no_eaccesasserts neither"Permission denied"nor"os error 13"appears in any tool response — that's the regression guard for the bug class. - No new CI lanes or infrastructure introduced.
Sprint 5 exit gate
-
cargo fmt -- --checkclean (verified locally). -
cargo clippy --all-targets -- -D warningsclean on both crates (verified locally). -
cargo testgreen on default lane (1034 passed, 8 ignored, verified locally). - Sudo lane CI green — implementer flagged this as not yet exercised locally (no root + provisioned
aimx-test-aliceon dev host); first real exercise will be CI. Acceptable risk: the binary compiles, tests are recognized, and the skip-when-not-sudo path runs cleanly under non-root.
Test Coverage
Coverage is appropriate. Three observations on the sudo-lane tests:
hook_list_full_cycle_against_root_owned_configasserts the empty list asassert_eq!(text, "[]", ...). This is a tighter assertion than the others and assumes the formatter returns[]exactly (no whitespace, no JSON pretty-print). That matchesmcp.rs::hook_list's currentserde_json::to_string(&rows)output, but is more brittle than necessary. If a future change pretty-printshook_listoutput the test will fail loudly — non-blocker, and arguably a good thing.email_list_full_cycle_against_root_owned_configseeds an.mdfile with hand-rolled frontmatter (prodperm_seed_email). Several optional fields (in_reply_to,references) are written as empty strings rather than omitted; serde will deserialize these asSome("")rather thanNone. Not a bug, but a minor inconsistency with how the ingest path writes inbound mail (which omits empty optional fields). The test should still pass.- No assertion that the daemon's sent-copy persistence actually wrote a file after
email_send. The test only asserts no EACCES surfaces; it does not look for the resulting file insent/alice/. This is fine for the bug class being guarded (config-load EACCES on the MCP side), but a future regression where the daemon's sent-copy write silently no-ops would not be caught. Non-blocker.
The unit test rename in src/hooks.rs::tests (hook_auth_error_* from format_hook_auth_error_*) keeps the rendering invariants asserted via the local render_for_hooks helper. Good — the wording-regression guard for the pre-flight paths survives the consolidation.
Potential Bugs
None of crash/correctness concern. The implementation is mechanical and the new code paths are straightforward delegation to existing surfaces.
Two minor observations:
prodperm_setup_envuseradd aimx-catchalldiscards the result.let _ = StdCommand::new("useradd")...status();runs and ignores any failure. If the system lacksuseraddor the user already exists with conflicting attributes, the test will silently proceed and may fail later in a less-obvious place. The pattern mirrors the rest of the file (CI-only sudo lane), so this is acceptable, but is worth noting if the test ever needs to run on a non-Debian-family host.prodperm_spawn_mcpdoes not setAIMX_TEST_MAIL_DROPon the spawned MCP child. It does not need to — the env var is read by the daemon, not the MCP process — but the parameter shape (prodperm_setup_envseparate fromstart_serve_with_env) means a future test author might add a transport-affecting env var to the MCP spawn instead of the daemon spawn. Minor maintainability concern only.
Security Issues
None. The PR makes no privilege-related changes:
- Env var rename is a cosmetic name change with no semantic shift in what's gated.
format_auth_errorconsolidation moves rendering, not authz logic. Theauthorize()predicate and theis_root()gate are untouched.- New tests run only under
AIMX_INTEGRATION_SUDO=1+geteuid() == 0; the quadruple gate (cfg(unix)+#[ignore]+ env var + euid check +id <user>check) is appropriately defensive.
Code Quality
- The four inline
AuthErrorContextconstructions insrc/hooks.rs::create/deleteare identical across the create/delete pre-flight paths (modulo the verb string). The previousformat_hook_auth_errorshim was specifically rejected by the PRD ("delete the helper, route through canonical directly"), so the duplication is intentional. If a fifth call site appears, a tiny private helper at the module level would be reasonable — but as-is, four copies is fine. - The
render_for_hookstest helper duplicates what the production code does inline. Pulling the production paths through it would defeat the AC, so this is the right tradeoff. prodperm_*helpers are well-scoped to the new tests; naming is consistent.
Alignment with PRD
The PR addresses MP2-3 (adjacent cleanups: env-var rename + format_auth_error consolidation finish) and MP2-5's R44 leg (end-to-end CI smoke). The remaining MP2 work (R38–R42 doc drift, R45 release notes) is correctly deferred to Sprint 6 per the sprint plan dependency chain.
PRD NFR6 (clean break — no deprecation alias for the env-var rename) is honored. PRD §P2.10 PQ3 (extend the existing sudo lane vs. sibling job) — implementer chose to extend, matching the bias.
The flagged wording change on the socket-missing + non-root hook CRUD path is a minor PRD-format_auth_error-consolidation tradeoff: AC R37 says "User-visible wording stays unchanged where reasonable" and explicitly contemplates the consolidation may produce different text. The new wording is consistent with every other NotRoot surface (mailbox_create / mailbox_delete / mailbox_list_force etc.), which is the consolidation's whole point. Routing operators to "start the daemon" is a hint that could move to docs in Sprint 6 (R42 release-notes entry seems a natural home), as the implementer suggested.
Summary and Recommended Actions
Overall verdict: Ready to merge.
Blockers: None.
Non-blockers:
- Sprint 6 docs (R42 release-notes entry, or
book/hooks.mdoperator-troubleshooting section) should restore the "daemon is down — start it withsudo systemctl start aimx" hint that the previous error message carried, since the new canonical wording ("not authorized: aimx hooks create requires root (run with sudo)") tells the operator to use sudo but no longer mentions the underlying cause. Implementer already flagged this for Sprint 6.
Nice-to-haves:
email_list_full_cycle_against_root_owned_configcould assert that a sent-copy file exists insent/alice/afteremail_send, to catch a future silent no-op regression.prodperm_seed_emailcould omitin_reply_to/references(rather than emitting them as empty strings) to match how the ingest path writes inbound mail.- The four duplicated
AuthErrorContextconstructions insrc/hooks.rs::create/deleteare tolerable but could collapse to a tinyhook_auth_ctx(verb)helper if a fifth call site ever appears.
Recommended merge commit message:
[Sprint 5] Adjacent cleanups + end-to-end CI smoke — env var rename, format_auth_error consolidation, 12-tool prod-perm smoke
Renames `AIMX_TEST_SKIP_ROOT_CHECK` → `AIMX_TEST_SKIP_AUTHZ_CHECK` (clean break, no alias)
to match what the env var actually skips (the `authorize()` call gating
`Action::HookCrud`). Finishes the `format_auth_error` consolidation Sprint 3
started: deletes the `format_hook_auth_error` shim and routes the four
remaining authz-error rendering call sites in `src/hooks.rs` (two pre-flight,
two socket-missing) through the canonical renderer in `auth.rs`. Adds three
sudo-lane integration tests that collectively exercise all 12 MCP tools
against a `chmod 0600 root:root` config + running `aimx serve` (with the MCP
process spawned under `runuser -u aimx-test-alice`), closing the bug class
permanently — wired into the existing `mailbox-dir-perms-isolation` CI lane.
Address PR review feedback for the adjacent-cleanups branch: - src/hooks.rs: on the socket-missing + non-root hook CRUD path, append a one-line "hint: if the daemon is running, hook CRUD over UDS would handle this without sudo." after the canonical `format_auth_error` rendering. Restores the daemon-down operator context that the pre-consolidation wording carried, without widening the canonical renderer's surface. New unit test pins the suffix wording. - tests/integration.rs: `email_list_full_cycle_against_root_owned_config` now asserts that `email_send` produced at least one sent-copy `.md` under `sent/alice/`, so a future silent no-op in sent-copy persistence is caught even when EACCES does not fire. - tests/integration.rs: `prodperm_seed_email` now omits the optional `in_reply_to` / `references` frontmatter fields rather than emitting empty strings, matching how the production ingest writer renders inbound mail.
Review feedback addressedPushed b9554b4 in response to the latest review. Fixed
Intentionally left as-is
Verification
|
uzyn
left a comment
There was a problem hiding this comment.
Re-review — cycle 2 (commit b9554b4)
All three previously flagged items are resolved cleanly. No new issues introduced.
Resolved
- Non-blocker — Daemon-down wording lost on socket-missing path — RESOLVED. Implementer applied option (b) verbatim: kept the canonical
format_auth_errorrendering as the first line, then appendedhint: if the daemon is running, hook CRUD over UDS would handle this without sudo.via a tiny private helperappend_daemon_down_hint(String) -> Stringat the call site insrc/hooks.rs::create(line 223) anddelete(line 328).auth::format_auth_errorandAuthErrorContextstay untouched — the canonical renderer's surface is not widened. Verified bygrep -rn 'append_daemon_down_hint' src/ tests/: only the two intended call sites, the definition, and the new unit test. New unit testappend_daemon_down_hint_preserves_canonical_message_and_appends_hintpins both the canonical-prefix invariant (with_hint.starts_with(&canonical)) and the suffix wording (\nhint:,daemon,UDSsubstrings) so future drift is caught. - Nice-to-have —
email_list_full_cycle_against_root_owned_configsent-copy assertion — RESOLVED. Afteremail_sendthe test now readssent/alice/(line 6179–6195) and asserts at least one.mdfile lands there. A future silent no-op in sent-copy persistence is now caught even when the EACCES guard would not fire. - Nice-to-have —
prodperm_seed_emailempty optional fields — RESOLVED. Thein_reply_to = ""/references = ""lines are dropped from the seeded frontmatter (line 6019–6020), matching how the production ingest writer renders inbound mail (omit when empty).
Intentionally left as-is
- Nice-to-have —
AuthErrorContexthelper consolidation. The cycle-1 review explicitly said "skip unless naturally touching all four sites" and the daemon-down-hint fix only touches two of the four. Implementer's reasoning is correct: the helper wraps the renderedString, not theAuthErrorContext, so the four duplicateAuthErrorContextliterals remain at the call sites — fine until a fifth one appears.
Still unresolved
None.
New issues found
None. The diff is mechanical (two call-site wrappings + one helper + one unit test in src/hooks.rs; one assertion block + two-line frontmatter trim in tests/integration.rs). No regressions, no new code paths.
Verification
cargo fmt -- --check: clean.cargo clippy --all-targets -- -D warnings: clean.cargo testdefault lane: 1143 passed, 0 failed, ignored counts match expectation (sudo-lane tests gated behindAIMX_INTEGRATION_SUDO=1+geteuid()==0+aimx-test-aliceuser existence +#[ignore]).hooks::testsmodule: 7 passed (was 6 in cycle 1; the new test brings the count to 7).- Banned-token sweep on cycle-2 additions outside
docs/: zero hits. append_daemon_down_hintcall-site audit: exactly the two intended branches, plus the definition and the unit test. No leakage to other surfaces.
Summary
The cycle-2 fixes are surgical and on-target. The daemon-down hint is restored without widening auth::format_auth_error, the sent-copy assertion closes the silent-no-op gap, and the seeded frontmatter now matches production. Clippy/fmt/test all green. Ready to merge.
Recommended merge commit message
[Sprint 5] Adjacent cleanups + end-to-end CI smoke — env var rename, format_auth_error consolidation, 12-tool prod-perm smoke
Renames `AIMX_TEST_SKIP_ROOT_CHECK` → `AIMX_TEST_SKIP_AUTHZ_CHECK` (clean break, no alias)
to match what the env var actually skips (the `authorize()` call gating
`Action::HookCrud`). Finishes the `format_auth_error` consolidation: deletes
the `format_hook_auth_error` shim and routes the four remaining authz-error
rendering call sites in `src/hooks.rs` (two pre-flight, two socket-missing)
through the canonical renderer in `auth.rs`. The socket-missing + non-root
branch appends a one-line daemon-down hint via a tiny private helper so
operators see why sudo is required without widening the canonical renderer's
surface. Adds three sudo-lane integration tests that collectively exercise
all 12 MCP tools against a `chmod 0600 root:root` config + running `aimx
serve` (with the MCP process spawned under `runuser -u aimx-test-alice`),
closing the bug class permanently — wired into the existing
`mailbox-dir-perms-isolation` CI lane. The email smoke now asserts a
sent-copy `.md` lands under `sent/alice/` after `email_send` so a future
silent no-op is caught even when EACCES does not fire.
… $HOME
The new sudo-lane CI step ran `runuser -u aimx-test-alice -- <target/debug/aimx>`
to spawn the MCP child as a non-root uid, but `target/debug/` on most CI
and dev hosts lives beneath a `0o750` `$HOME`. The alice uid (not in the
home owner's group) cannot traverse it, so `runuser` exits with
"Permission denied" before the MCP child ever runs — closing stdout
immediately and surfacing as the cryptic
`Error("EOF while parsing a value", line: 1, column: 0)` panic in
`McpClient::send_request`.
Fix the test setup by:
- Copying the `aimx` binary into the prodperm tempdir (already chmod'd
`0o755` and rooted under `/tmp`, so always reachable from any uid)
and pointing `prodperm_spawn_mcp` at that copy.
- Switching `AIMX_TEST_MAIL_DROP` from a directory back to a file path
— `FileDropTransport::send` opens it with `OpenOptions::append(true)`
and surfaces `Is a directory (os error 21)` otherwise. This was
silent in the original failure because the EOF panic blocked the
test before `email_send` ever ran.
Also wire a stderr drain into `McpClient` so future MCP-subprocess
deaths surface with the captured stderr + child exit status instead of
the opaque `EOF while parsing a value` panic. Every existing McpClient
construction site (six in total) now spawns a background stderr-drain
thread; `send_request` panics with the captured stderr block whenever
the read returns 0 bytes or the JSON parse fails.
Verified locally:
- `cargo test` — 1035 passed, 8 ignored on the default lane.
- `cargo clippy --all-targets -- -D warnings` clean.
- `cargo fmt -- --check` clean.
- `sudo AIMX_INTEGRATION_SUDO=1 <integration-bin> --ignored
--test-threads=1 mailbox_list_full_cycle_against_root_owned_config
email_list_full_cycle_against_root_owned_config
hook_list_full_cycle_against_root_owned_config` — 3 passed.
uzyn
left a comment
There was a problem hiding this comment.
Re-review — cycle 3 (commit 4a34bba)
The sudo-lane CI failure flagged after cycle 2's approval is fixed. The fix is test-only (zero src/ changes), surgical, and grounds the original failure mode with a real diagnostic so future regressions surface clearly.
Resolved
-
Sudo-lane CI failure —
EOF while parsing a valuepanic in three new prodperm MCP tests — RESOLVED. Root cause wasrunuser -u aimx-test-alice -- <target/debug/aimx>:target/debug/lives under a0o750$HOMEon CI/dev hosts, so the alice uid (not in the home owner's group) hitsPermission deniedon path traversal beforeaimx mcpever execs. The MCP child stdin/stdout/stderr were already piped, so the child never being spawned looked exactly like an MCP child that crashed mid-handshake — the JSON-RPC reader saw an immediate EOF and panicked withserde_json::Error("EOF while parsing a value", line: 1, column: 0).Fix is a copy-the-binary-into-the-tempdir setup change (
prodperm_copy_aimx_binary): the tempdir is0o755and lives under/tmp(always world-traversable from any uid).prodperm_spawn_mcpnow pointsrunuserat<tmp>/aimxinstead oftarget/debug/aimx. Production code untouched. -
Latent
AIMX_TEST_MAIL_DROPdirectory bug inemail_list_full_cycle_against_root_owned_config— RESOLVED. The cycle-2 test pointedAIMX_TEST_MAIL_DROPat a directory (tmp.path().join("mail_drop")+create_dir_all), butFileDropTransport::sendopens the path withOpenOptions::create(true).append(true)(src/transport.rs:338-342), which surfacesIs a directory (os error 21). The cycle-2 EOF panic blockedemail_sendfrom ever running, so this was hidden. Reverted to a file path (tmp.path().join("outbound.log")), matching the convention instart_serve_with_mail_drop(line 2617) andtests/uds_authz.rs:417.
CI status (commit 4a34bba)
core-tests pass 3m14s
mailbox-dir-perms-isolation pass 2m39s ← previously failing; now green
verifier-tests pass 1m5s
docs-build pass 40s
Analyze (rust) pass 6m26s
Analyze (actions) pass 44s
Analyze (javascript-typescript) pass 56s
CodeQL pass 2s
All eight checks green.
Diagnostic-quality improvements (bonus, also part of 4a34bba)
The implementer wired a background stderr drain into every McpClient construction site (six in total: McpClient::spawn, prodperm_spawn_mcp, plus four inline struct literals at lines 894, 973, 1282, 1360, 3072, 4565). When a future MCP subprocess dies before responding to a JSON-RPC request, send_request now panics with the captured stderr block + child exit status instead of the opaque Error("EOF while parsing a value", line: 1, column: 0). Specifically:
- Read-line returns
Ok(0)→panic!("MCP subprocess closed stdout (EOF) before responding to {method:?}{}", self.format_dead_child_diag()). serde_json::from_strparse fails →panic!("MCP returned non-JSON for {method:?}: {e}; raw line {raw:?}{diag}").format_dead_child_diagwaits up to 1s fortry_wait()to returnSome, joins the drain thread, then returns the captured stderr + exit status.shutdown()joins the drain thread afterchild.wait()so no thread leaks.
This is a real maintainability improvement: the original failure took a careful read of runuser semantics and $HOME mode bits to diagnose. With this in place, the next regression of this class will land with a self-explanatory failure message.
A few small quality notes (all Non-blocker, leaving them as-is):
spawn_stderr_drainreturns the drain thread handle butreaderrors are silently swallowed (Ok(0) | Err(_) => break). Acceptable for a test diagnostic — losing one error mid-stream is preferable to drain-thread panics polluting unrelated tests.- The drain reads into an unbounded
Vec<u8>. No backpressure concern since real MCP children write little stderr before either succeeding or dying. Worst case is a few MB if a subprocess goes haywire; well within test memory budgets. format_dead_child_diagmutates&mut self(takes the drain handle). If a test'ssend_requestpanics on EOF, then the panic is caught and the test continues, the diag will not be available a second time. The current call path immediately panics so this is fine; flagging only because the API would be confusing if anyone tried to reuse the McpClient post-failure.
Soundness assessment of the fix
- Scope: test setup only. No
src/changes, no behavior changes to production code. Zero risk of regressing the cycle-2 approved surface. - Correctness of the binary-copy approach: the tempdir is created via
TempDir::new()(mkdtemp under$TMPDIRor/tmp) and explicitly chmod'd0o755byprodperm_setup_env. Copying a0o755binary into it makes it reachable by any uid — confirmed correct. runuserenv passthrough: the implementer's comment claimsAIMX_CONFIG_DIR/AIMX_RUNTIME_DIRsurviverunuser's env reset, and the MCP child finds the daemon socket viaAIMX_RUNTIME_DIR. This is consistent with util-linuxrunuserbehavior (onlyHOME/USER/MAIL/LOGNAMEare rewritten withrunuser -u; other env passes through). The CI green run on the same lane confirms it empirically.- Mail-drop file vs. directory: matches the existing convention in
start_serve_with_mail_drop(line 2617).FileDropTransportopens withappend(true)which fails on a directory — the fix is correct. - No regressions: the new stderr-drain plumbing is additive — existing tests get strictly better diagnostics, no new failure modes. The 1142+1035-passing default-lane runs across the cycle confirm no regression.
Still unresolved
None. The cycle-2 issues remain resolved (verified earlier), and the sudo-lane failure is now fixed.
New issues found
None.
Verification
- Default-lane CI:
core-testspass (3m14s). - Sudo-lane CI:
mailbox-dir-perms-isolationpass (2m39s) — previously the failing job. - Banned-token sweep on cycle-3 additions only:
git diff b9554b4..4a34bba | grep '^+' | rg '\bSprint\b|\bS[0-9]+[-.][0-9]+\b|\bFR-[0-9]+[a-z]?\b|User Story|Acceptance criteria'→ zero hits. - Diff scope:
tests/integration.rsonly, +185/-9 net. Nosrc/changes. - All 6
McpClient { ... }inline struct literals +spawn()+prodperm_spawn_mcpupdated consistently with the newstderr_buf/stderr_drainfields.
Summary
Sound, surgical, test-only fix. The implementer correctly diagnosed a tricky CI-host-specific failure (alice uid + $HOME mode bits + opaque MCP EOF panic), fixed the immediate cause by copying the binary into a world-traversable tempdir, fixed a latent mail-drop directory bug that cycle-2's panic was masking, and improved the test harness so future regressions of this class surface with a real diagnostic instead of an opaque serde panic. CI is green on both lanes. Ready to merge.
Recommended merge commit message
[Sprint 5] Adjacent cleanups + end-to-end CI smoke — env var rename, format_auth_error consolidation, 12-tool prod-perm smoke
Renames `AIMX_TEST_SKIP_ROOT_CHECK` → `AIMX_TEST_SKIP_AUTHZ_CHECK` (clean break, no alias)
to match what the env var actually skips (the `authorize()` call gating
`Action::HookCrud`). Finishes the `format_auth_error` consolidation: deletes
the `format_hook_auth_error` shim and routes the four remaining authz-error
rendering call sites in `src/hooks.rs` (two pre-flight, two socket-missing)
through the canonical renderer in `auth.rs`. The socket-missing + non-root
branch appends a one-line daemon-down hint via a tiny private helper so
operators see why sudo is required without widening the canonical renderer's
surface. Adds three sudo-lane integration tests that collectively exercise
all 12 MCP tools against a `chmod 0600 root:root` config + running `aimx
serve` (with the MCP process spawned under `runuser -u aimx-test-alice`),
closing the bug class permanently — wired into the existing
`mailbox-dir-perms-isolation` CI lane. The email smoke now asserts a
sent-copy `.md` lands under `sent/alice/` after `email_send` so a future
silent no-op is caught even when EACCES does not fire. The MCP test client
gained a stderr-drain so MCP-subprocess deaths surface with the captured
stderr + exit status instead of an opaque `EOF while parsing a value`
panic.
Summary
AIMX_TEST_SKIP_ROOT_CHECKtoAIMX_TEST_SKIP_AUTHZ_CHECKso the name describes what the env var actually skips: theauthorize()call gatingAction::HookCrud, not a generic root check. Clean break — no deprecation alias. CLAUDE.md test-env section updated atomically.format_auth_errorconsolidation. Delete theformat_hook_auth_errorshim insrc/hooks.rs; route the two inline pre-flight call sites and the socket-missing + non-root hook CRUD path throughauth::format_auth_errordirectly with the appropriateAuthErrorContext. The canonical renderer inauth.rsis now the only authz-error rendering surface (grep -nE 'fn format_.*auth_error' src/returns only the canonical renderer plus its own test fns).chmod 0600 root:rootconfig + runningaimx serve, with the MCP process spawned underrunuser -u aimx-test-aliceso the bug class (non-root MCP failing to read root-owned config) is in scope. The existingmailbox-dir-perms-isolationCI lane is extended with a new step that runs the three tests under sudo; no new CI lanes or infrastructure introduced.Test plan
cargo fmt -- --checkcleancargo clippy --all-targets -- -D warningscleancargo testgreen on the default lane (1034 passed, 8 ignored)cargo fmt -- --check+cargo clippy --all-targets -- -D warningsclean#[ignore]behindAIMX_INTEGRATION_SUDO=1