Skip to content

[Sprint 1] Authz model + UDS handler — owner-bound mailbox CRUD#193

Merged
uzyn merged 2 commits into
mainfrom
feat/user-mailbox-s1-authz-uds
May 2, 2026
Merged

[Sprint 1] Authz model + UDS handler — owner-bound mailbox CRUD#193
uzyn merged 2 commits into
mainfrom
feat/user-mailbox-s1-authz-uds

Conversation

@uzyn

@uzyn uzyn commented May 2, 2026

Copy link
Copy Markdown
Owner

Sprint Goal

Non-root MAILBOX-CREATE and MAILBOX-DELETE work over the daemon UDS with the owner identity bound to SO_PEERCRED. Every privilege-escalation path is closed before any user-facing surface ships.

This is Sprint 1 of the user-mailbox track. Sprint 2 will drop the CLI's entry-point root gate and add the MCP mailbox_create / mailbox_delete tools; Sprint 3 ships the docs/release work.

Stories Implemented

[S1-1] Split Action::MailboxCrud into owner-bound variants

  • Action::MailboxCrud removed; MailboxCreate { owner_uid: u32 } and MailboxDelete { mailbox: String } added.
  • authorize() predicate updated — root passes both unconditionally; non-root passes MailboxCreate iff caller_uid == owner_uid (returns NotOwner { mailbox: "<new>" } otherwise); non-root MailboxDelete reuses the existing owner-uid match logic.
  • All callsites updated — src/mailbox.rs and src/mcp.rs recompile against the new variants.
  • Unit tests added: non_root_mailbox_create_owner_match_passes, non_root_mailbox_create_owner_mismatch_rejects, non_root_mailbox_delete_owner_match_passes (#[ignore] for non-root host gating), non_root_mailbox_delete_owner_mismatch_returns_not_owner (same), root_passes_new_variants_unconditionally.

[S1-2] Add peer_username() resolver

  • New peer_username(uid: u32) -> Result<String, String> in src/uds_authz.rs, sitting next to Caller where peer_uid lives.
  • Returns Ok(name) from getpwuid; errs with the exact string "uid <N> has no passwd entry" on misses.
  • Round-trip test pins the contract: MailboxConfig { owner: peer_username(U)?, ... }.owner_uid()? == U for the current process euid.
  • No unwrap() / expect() on the lookup result.

[S1-3] Rewrite mailbox_handler to drop enforce_root and bind owner

  • enforce_root call removed from handle_mailbox_crud. Replaced with the per-verb authorize() call structure described in the sprint doc.
  • Non-root CREATE: req.owner is dropped on the floor; the owner string actually written to config.toml is synthesized via peer_username(caller.uid).
  • Root CREATE: req.owner honored exactly as today (no behavior change for sudo callers).
  • DELETE: the unowned-mailbox response is string-identical to the no-such-mailbox response (NFR2). Both go through a new private no_such_mailbox_reason() helper so the contract is one grep away.
  • Orphan-uid CREATE → AckResponse::Err { code: Validation, reason: "uid <N> has no passwd entry" }.
  • Locks (per-mailbox → process-wide CONFIG_WRITE_LOCK), atomic config rename, idempotent re-create, and rollback semantics preserved.
  • All 21 existing mailbox_handler.rs::tests continue to pass.

[S1-4] Privilege-escalation negative tests

  • nonroot_create_ignores_wire_owner_and_uses_so_peercred — non-root caller submits Owner: root; resulting mailbox's owner field equals peer_username(caller.uid), never "root".
  • nonroot_delete_unowned_returns_no_such_mailbox_string_match — pins the wire-format byte-identity between the unowned and missing cases, plus an explicit "no leak of caller/owner/mailbox-state" check on the reason string.
  • nonroot_create_idempotent_for_owned_mailbox — re-creating an owned mailbox returns the same (Mailbox, "already exists") shape as the existing root-side path.
  • nonroot_create_delete_create_cycle_succeeds — three-step lifecycle as a non-root caller, all three return Ok.
  • nonroot_create_orphan_uid_returns_validation_with_exact_message — uses uid 4_294_967_294; pins the exact wire reason.

Technical Decisions

  • CLI gate kept as a behavior-preserving no-op (option 1 in the brief). src/mailbox.rs now calls authorize() with the new variants but with owner_uid = current_euid() for create and the resolved MailboxConfig for delete. This means the gate trivially passes for both root and non-root creators, and for delete it pre-flights the same owner check the daemon will enforce. The brief flagged this could shift behavior slightly for non-root callers (they now reach the daemon submission path instead of being short-circuited by NotRoot); that's the correct glide path because S1-3 makes the daemon accept those requests. Sprint 2 (S2-1) will drop the entry-point gate entirely.
  • require_root / enforce_root kept (now #[allow(dead_code)]). The helpers and their tests are still useful documentation for future root-gated UDS verbs; deleting them now would make Sprint 2's eventual removal of the test seam for AIMX_TEST_SKIP_ROOT_CHECK look like a bigger churn than it is.
  • Wire-format byte-identity via a single helper. Both the unowned-DELETE and the genuine no-such-mailbox paths route through no_such_mailbox_reason(name). NFR2 (no information leak) is now one function to audit instead of two format! calls in different files.
  • Test seam: install_tester_resolver_with_current_user. The non-root tests need the host's actual Linux username (the value peer_username(geteuid()) returns) to resolve via validate_run_as so the chown step is a no-op. The new helper extends the existing install_tester_resolver to also recognize the current Linux user. Tests that only need the root caller continue to use the simpler resolver.
  • nonroot_caller_or_skip(). Tests that exercise the non-root path early-return when running as root (CI lane). The brief's escape hatch is the existing testowner short-circuit at mailbox_handler.rs:494, but the privilege-escalation defense itself only matters when the caller actually has a non-root uid the kernel reported via SO_PEERCRED, so the tests gate on that condition rather than mocking it.

Deferred Items

Nothing punted from Sprint 1's listed scope. The CLI surface (src/mailbox.rs::run entry-point gate, --owner soft-warning, --force for non-root owners) and the new MCP mailbox_create / mailbox_delete tools are explicitly Sprint 2 work per the sprint plan. The docs (book/security.md, book/mailboxes.md, book/troubleshooting.md, etc.) and the release-notes entry are Sprint 3 work.

Review Focus Areas

  • src/mailbox_handler.rs::handle_mailbox_crud — the rewritten authz block (lines around the new resolved_owner computation). The privilege-escalation defense lives entirely in the assignment of resolved_owner for non-root callers; if a future patch ever lets req.owner.as_deref() flow into the non-root branch, NFR1 is broken.
  • src/mailbox_handler.rs::no_such_mailbox_reason — the byte-identity contract for NFR2. Anything that produces a different string for the unowned-DELETE path than for the missing-DELETE path reopens the leak.
  • src/auth.rs::authorize — the new MailboxCreate { owner_uid } arm intentionally returns NotOwner { mailbox: "<new>" } rather than inventing a fresh OwnerMismatch variant. The sprint plan called this out as "pick one, don't mix" — flag if a fresh variant would be clearer for the CLI's format_auth_error dispatch in S2-1.
  • src/uds_authz.rs::peer_username — the only new public function this sprint adds. The exact wire reason "uid <N> has no passwd entry" is asserted in three places (the unit test, the negative test in mailbox_handler.rs, and now this PR description); a refactor that changes this string has to update all three.
  • CLI gate change in src/mailbox.rs::run — this is the one place behavior shifted slightly for non-root callers between sprints. If that's not the right glide path, S2-1 can clean it up; flag if you'd prefer option 2 (drop the gate now with TODO).

Test Plan

  • cargo test — 1012 unit tests + 92 integration + 1 doctest all green
  • cargo clippy -- -D warnings clean
  • cargo fmt -- --check clean
  • services/verifier crate unaffected; cargo build + cargo clippy clean there too

Implements user-mailbox Sprint 1: split the central authz predicate so
mailbox CRUD is owner-bound rather than root-only, and rewrite the UDS
handler so non-root callers can manage mailboxes they own. The owner
identity always derives from SO_PEERCRED server-side — the wire `Owner:`
header is dropped on the floor for non-root callers, so privilege
escalation through this surface is structurally impossible.

S1-1: src/auth.rs — replace `Action::MailboxCrud` with two owner-bound
variants `MailboxCreate { owner_uid }` and `MailboxDelete { mailbox }`.
Update the `authorize()` predicate and every callsite (mailbox.rs,
mcp.rs). Root passes both unconditionally; non-root passes when their
uid matches the owner. CLI behavior is preserved (entry-point gate
becomes a no-op for create, owner-checked for delete).

S1-2: src/uds_authz.rs — add `peer_username(uid: u32) -> Result<String,
String>` that maps an SO_PEERCRED uid to the username string the rest
of the codebase uses for mailbox owners. Errors (does not panic) on
orphan uids with the exact message `"uid <N> has no passwd entry"`.
Round-trip tested.

S1-3: src/mailbox_handler.rs — drop `enforce_root` from
`handle_mailbox_crud`. Non-root CREATE: ignore `req.owner`, synthesize
the owner via `peer_username(caller.uid)`, then `authorize` with
`MailboxCreate{owner_uid: caller.uid}`. Non-root DELETE: load the
target mailbox config and `authorize` with `MailboxDelete{mailbox}`.
Both unowned-mailbox and missing-mailbox DELETE paths produce the same
wire response (NFR2: no information leak). Lock hierarchy, atomic
config rename, idempotent re-create, and rollback semantics preserved.

S1-4: privilege-escalation negative tests — Owner: header ignored on
non-root, no info leak on unowned delete (string-identical wire
response), idempotent re-create as non-root, full create→delete→create
cycle, orphan-uid CREATE returns the canonical Validation error.

Tests: full crate test suite green (1012 unit tests + 92 integration +
1 doctest); clippy clean; fmt clean.

@uzyn uzyn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sprint 1 Review — [Sprint 1] Authz model + UDS handler — owner-bound mailbox CRUD

Sprint Goal Assessment

The sprint goal — "Non-root MAILBOX-CREATE and MAILBOX-DELETE work over the daemon UDS with the owner identity bound to SO_PEERCRED. Every privilege-escalation path is closed before any user-facing surface ships" — is mostly achieved at the implementation layer: the auth.rs predicate split is clean, peer_username is a careful inverse of MailboxConfig::owner_uid, and the handler's privilege-escalation defense (synthesizing the owner from SO_PEERCRED and dropping req.owner on the floor for non-root callers) is exactly what NFR1 demands. However, the sprint fails its own exit gate: the mailbox-dir-perms-isolation CI lane is RED on this PR because two pre-existing integration tests (mailbox_create_non_root_forbidden and mailbox_delete_non_root_forbidden in tests/uds_authz.rs) still assert the old root-only behavior the sprint deliberately retired. They were not updated and the PR is not mergeable in its current state.

Acceptance Criteria Checklist

[S1-1] Split Action::MailboxCrud into owner-bound variants

  • MailboxCrud removed; MailboxCreate { owner_uid: u32 } and MailboxDelete { mailbox: String } added
  • authorize() predicate updated per spec; MailboxCreate mismatch returns NotOwner { mailbox: "<new>" } (sentinel choice is reasonable; see Code Quality below)
  • All callsites updated (src/mailbox.rs, src/mcp.rs)
  • All five required unit tests added; #[ignore] gating mirrors the existing convention

[S1-2] Add peer_username() resolver

  • New peer_username(uid: u32) -> Result<String, String> in src/uds_authz.rs
  • Returns Ok(name) from getpwuid; errs with the exact string "uid <N> has no passwd entry"
  • Round-trip test pins MailboxConfig::owner_uid() reciprocity for the current process euid
  • No unwrap() / expect() on the lookup result

[S1-3] Rewrite mailbox_handler to drop enforce_root and bind owner

  • enforce_root removed; replaced with per-verb authorize() calls
  • Non-root CREATE: req.owner ignored; owner synthesized via peer_username(caller.uid)
  • Root CREATE: req.owner honored as today
  • DELETE: unowned and missing return string-identical wire response via no_such_mailbox_reason()
  • Orphan-uid CREATE → ErrCode::Validation with exact message
  • Lock hierarchy and atomic config rename preserved
  • All existing mailbox_handler.rs::tests continue to pass (verified the handler test module compiles and the diff doesn't touch their assertions)

[S1-4] Privilege-escalation negative tests

  • All five required negative tests present and well-structured
  • nonroot_delete_unowned_returns_no_such_mailbox_string_match is the strongest test in the suite — the explicit "no leak" word denylist (owner, permission, root, etc.) goes beyond what the spec asked for and will catch regressions even from sloppy log-text refactors

Sprint 1 exit gate

  • cargo test green on root crate (per implementer's note; consistent with the #[ignore] gating)
  • cargo clippy -- -D warnings clean (per implementer's note)
  • cargo fmt -- --check clean (per implementer's note)
  • Manual smoke (MAILBOX-CREATE over UDS as a non-root user, confirm stat -c '%U' shows the caller) — not evidenced. This is the canonical regression guard for the headline NFR1 invariant; please attach the socat/nc -U transcript or equivalent before merge.
  • CI is RED — the mailbox-dir-perms-isolation job (Run UDS authz tests (under sudo) step) fails with two test failures listed below.

Potential Bugs

Blocker #1 — Two integration tests now assert the opposite of Sprint 1's intended behavior, breaking CI

Files: tests/uds_authz.rs:723 (mailbox_create_non_root_forbidden) and tests/uds_authz.rs:762 (mailbox_delete_non_root_forbidden).

These tests were written against the retired root-only gate and were not updated as part of this sprint. The CI lane that runs them under sudo (mailbox-dir-perms-isolation workflow, run id 25249405968) is RED:

---- mailbox_create_non_root_forbidden stdout ----
thread 'mailbox_create_non_root_forbidden' panicked at tests/uds_authz.rs:398:5:
expected response to contain "EACCES"; got "AIMX/1 OK\n"

---- mailbox_delete_non_root_forbidden stdout ----
thread 'mailbox_delete_non_root_forbidden' panicked at tests/uds_authz.rs:398:5:
expected response to contain "EACCES"; got "AIMX/1 OK\n"

The PR description's claim that all tests are green only holds for the default cargo test lane — these two tests are #[ignore]'d behind AIMX_INTEGRATION_SUDO=1 and only run in the dedicated sudo CI lane. That lane is the canonical privilege-boundary regression guard.

Because the sprint deliberately flips the behavior these tests were guarding (non-root CRUD is now allowed for the owner), each test needs to either:

  1. Be re-purposed to prove the new positive contract — e.g. mailbox_create_non_root_owner_ok (alice creates alicemade and the response is AIMX/1 OK and the on-disk owner equals alice's username), and mailbox_delete_non_root_owner_ok (alice creates and then deletes a mailbox of her own and both succeed).
  2. Be replaced with cross-uid negative tests that exercise the privilege boundary the sprint actually defends — e.g. mailbox_create_non_root_cross_uid_owner_ignored (alice submits Owner: bob and the resulting on-disk mailbox is owned by alice, not bob — proving the wire Owner: field cannot be used to impersonate), and mailbox_delete_non_root_cross_uid_forbidden (alice tries to delete bob's mailbox and gets the no_such_mailbox_reason wire string, verifying NFR2 in a real cross-process / two-uid scenario rather than the synthesized in-process unit test).

Option 2 is materially stronger than what the unit tests cover because it goes through the real SO_PEERCRED path the unit tests cannot exercise. Sprint 1's headline NFR1 ("the daemon ignores client-supplied owner from non-root callers") only has one true end-to-end test today, and that test was just deleted in spirit. The sprint should not ship without replacing it.

This is a Blocker on two grounds: (a) CI is RED, and (b) the sprint loses its strongest existing privilege-boundary test in the process of trying to ship the behavior it guards.

Non-blocker #1 — Stale comment in src/mailbox.rs delete-path gate

File: src/mailbox.rs:41-46

The new comment on the delete-path authorize() call says:

See note above on the create path: same Sprint 1 behavior-preserving gate. The mailbox arg is None so non-root callers pre-Sprint-2 cleanly fall through to NoSuchMailbox when the daemon path doesn't catch it first; root keeps the bypass.

But the actual code passes config.mailboxes.get(&name), which is Some(...) whenever the mailbox exists in config. The comment misleadingly claims the arg is always None. The behavior is correct (existing-mailbox + caller-owns → predicate passes; existing-mailbox + caller-doesn't-own → predicate returns NotOwner; missing-mailbox → predicate returns NoSuchMailbox), but the comment will confuse the next reader. Fix the comment to describe what the code actually does.

Non-blocker #2nonroot_create_orphan_uid_returns_validation_with_exact_message is fragile to host passwd state

File: src/mailbox_handler.rs:792-822

The test uses ORPHAN_UID: u32 = 4_294_967_294 as the "orphan" uid without any #[ignore] gate or runtime check that the host actually doesn't map this uid to a passwd entry. The existing lookup_unknown_uid_returns_none test in uds_authz.rs:455 carries an explicit comment acknowledging that "32-bit max UID is reserved for 'nobody' on some systems but typically unmapped on CI runners." On a host where 4_294_967_294 is mapped (e.g. some Alpine images map it to nobody), this test would silently take the non-orphan branch, succeed in creating a mailbox owned by "nobody", and the assertion would fail with a confusing message. Two cheap mitigations:

  • Probe lookup_username(ORPHAN_UID) at the top of the test body and short-circuit (return) when it resolves, with an eprintln!("skipping: uid 4_294_967_294 maps to {name} on this host").
  • Or borrow the existing aimx-nonexistent-orphan-user resolver-miss pattern from auth.rs::orphan_owner_collapses_to_no_such_mailbox_for_non_root — but that exercises the username→uid path, not the uid→username path, so a different sentinel might be needed.

Not blocking because the sentinel is unmapped on the standard CI runner today and the test would fail loudly (not silently) when broken — operators would notice immediately.

Non-blocker #3MailboxCreate { owner_uid } rejection sentinel "<new>" is fine for now but worth a Sprint 2 follow-up

File: src/auth.rs:111-124

The NotOwner { mailbox: "<new>" } sentinel keeps the wire-shape consistent without inventing a fresh OwnerMismatch variant — the implementer's note explicitly invited scrutiny on this. The choice is defensible for Sprint 1 (no surface today renders this string back to a user, since the daemon side only emits the opaque not_authorized_wire_reason() for non-root MailboxCreate failures). Flagging it for Sprint 2's format_auth_error work: when the CLI starts rendering this error to humans, "caller does not own mailbox '<new>'" will read like a bug. S2-1 should either (a) add a dedicated OwnerMismatch { intended_owner_uid: u32 } variant whose Display formats sensibly, or (b) pattern-match the "<new>" sentinel in format_auth_error and switch to a different message ("not authorized: cannot create a mailbox owned by another user"). Document the Sprint 2 hand-off in a // TODO(S2-1): comment so this doesn't get lost.

Non-blocker #4 — CLI entry-point gate behavior shift for non-root delete-of-other-user case

The implementer flagged in the PR body: "with the new Action::MailboxCreate { owner_uid: u32 } variant, the CLI's entry-point authorize() gate trivially passes for non-root callers — a relaxation from the previous unconditional NotRoot rejection."

Verified the trace, and the answer is: the daemon does re-check, so the CLI relaxation does not open a privilege-escalation path. Specifically, for aimx mailboxes create from a non-root caller:

  • CLI gate: MailboxCreate { owner_uid: current_euid() } with caller_uid == owner_uidOk(()), falls through to create().
  • create() calls submit_mailbox_crud_via_daemon with the --owner value (or the prompted value).
  • Daemon receives the request, checks caller.is_root(), finds it false, drops the wire owner, synthesizes from SO_PEERCRED. NFR1 holds.
  • Fallback path (daemon socket missing): create_mailbox(...) writes config.toml directly. For non-root this fails at config.save() with EPERM (the file is 0640 root:root). The CLI surfaces a confusing perm error rather than the precise message R11 mandates.

The CLI fallback-on-socket-missing path for non-root callers is exactly the rough edge S2-1 (R11) is scoped to fix — fail fast with "daemon must be running for non-root mailbox CRUD; start aimx serve or run with sudo to fall back to direct config edit." The implementer's choice to keep the entry-point gate as a behavior-preserving no-op is fine for Sprint 1; just call this out in the Sprint 2 brief so the rough edge doesn't slip.

For aimx mailboxes delete from a non-root caller targeting a mailbox they don't own:

  • CLI gate: MailboxDelete { mailbox: name } with config.mailboxes.get(&name) = Some(<other-user's mailbox>) → predicate returns NotOwner.
  • format_auth_error renders this as "not authorized: caller does not own mailbox 'X'" to the user.

This is a CLI-side message that exists before the daemon ever sees the request. It does not violate NFR2 (which protects the wire), but it does mean a non-root caller can probe the existence of mailboxes owned by other users by trying to delete them and watching whether the error is NotOwner (mailbox exists, owned by someone else) vs. NoSuchMailbox (mailbox doesn't exist). The PRD §6.3 R10/R11 reasoning treats this as acceptable because the CLI runs as the local user and can read the on-disk config.toml directly anyway — but it does mean the CLI surface and the wire surface deliberately leak different things, and that asymmetry should be documented in book/security.md when Sprint 3 lands. Not a Sprint 1 blocker.

Test Coverage

Strong coverage on the new behavior at the unit-test layer. The five S1-4 negative tests are well-targeted, especially nonroot_delete_unowned_returns_no_such_mailbox_string_match with its explicit no-leak word denylist.

Critical gap at the integration layer. The sudo CI lane (the only real end-to-end test of SO_PEERCRED-driven owner binding across two real Linux uids) lost its only meaningful coverage of the new path because the two pre-existing *_non_root_forbidden tests now contradict the shipped behavior and are failing, and no replacement positive tests were added. See Blocker #1 for the proposed replacements.

Suggested additional integration tests for the sudo lane (in tests/uds_authz.rs):

  • mailbox_create_non_root_owner_synthesized_from_peercred: alice connects, submits MAILBOX-CREATE freshmbx Owner: bob, expects AIMX/1 OK, then asserts stat -c '%U' /var/lib/aimx/inbox/freshmbx returns aimx-test-alice (NOT aimx-test-bob). This is the headline NFR1 test at the integration layer and is currently absent.
  • mailbox_create_non_root_idempotent_against_self: alice creates the same mailbox twice in a row, expects ERR with "already exists" on the second.
  • mailbox_delete_non_root_owner_ok: alice creates and then deletes a mailbox of her own, both AIMX/1 OK.
  • mailbox_delete_non_root_cross_uid_returns_no_such_mailbox: alice tries to delete the aimx-test-bob mailbox (which exists in the test config), expects the wire response to match the no_such_mailbox_reason("aimx-test-bob") string exactly. NFR2 at the integration layer.

The unit-test versions of these in mailbox_handler.rs::tests are good but they synthesize Caller::new(uid, gid, None) in-process; they don't exercise the real SO_PEERCRED codepath. The sudo lane is the only place that does.

Code Quality

  • no_such_mailbox_reason() and not_authorized_wire_reason() helpers — exactly the right factoring. The NFR2 contract is now one grep away. Good call.
  • install_tester_resolver_with_current_user — the duplication with install_tester_resolver is annoying but the alternatives are worse (taking the resolver function as a parameter, threading state through the resolver type). The chosen "static fn that does its own lookup at call time" approach is the simplest thing that works under the existing static-function signature. Worth a one-line comment in the function body noting that the per-call getpwuid is intentional and cheap, in case a future contributor tries to "optimize" it into a memoized lookup that breaks the test isolation.
  • #[allow(dead_code)] on require_root / enforce_root — fine as a transitional marker, but if Sprint 2 doesn't add a new root-only UDS verb that uses them, the next refactor pass should delete them rather than carrying them indefinitely.
  • Caller field rename to caller.uidcaller.peer_uid would have made the source obvious at every callsite. Not blocking; consistent with the existing codebase.

Alignment with PRD

  • R1, R5, R6, R7, R8, R9, R20, R21: addressed correctly.
  • R3 (AuthError rendering for MailboxCreate mismatches returns a clear "owner mismatch" error): partially addressed. The NotOwner { mailbox: "<new>" } path satisfies the type-level intent but the rendered message ("caller does not own mailbox ''") will read poorly. See Non-blocker #3 — Sprint 2 work.
  • NFR1: structurally enforced. The handler correctly synthesizes the owner from SO_PEERCRED and ignores req.owner for non-root callers. Unit tests pin this. The integration-layer regression guard for NFR1 is the gap called out above.
  • NFR2: the daemon-side wire response is byte-identical for unowned-DELETE and missing-DELETE via no_such_mailbox_reason(). The nonroot_delete_unowned_returns_no_such_mailbox_string_match test pins this with an explicit no-leak word denylist.
  • NFR3 (idempotency): pinned by nonroot_create_idempotent_for_owned_mailbox.
  • NFR4 (lock hierarchy): preserved; the orphan-uid early-return in handle_mailbox_crud runs before lock acquisition (correct — saves an unnecessary lock take/drop).
  • NFR5 (no new dependencies): confirmed; peer_username reuses the existing libc::getpwuid shim.

PRD scope items that are explicitly Sprint 2 (R10, R11, R12, R13, R14, R16, R17, R18, R19) and Sprint 3 (R24-R28) are correctly deferred; the implementer's "Deferred Items" note matches the sprint plan.

Summary and Recommended Actions

Overall verdict: Needs minor fixes

The implementation work is high-quality — the auth split is clean, the handler rewrite is careful with locks and rollback, and the unit-test suite goes beyond the spec on NFR2 leak prevention. But the PR cannot ship while CI is red, and the sprint cannot meet its NFR1 intent end-to-end without replacement integration tests that exercise the real SO_PEERCRED path across two real uids.

Blockers (must fix before merge):

  1. Update or replace the two failing tests in tests/uds_authz.rs (mailbox_create_non_root_forbidden at line 723 and mailbox_delete_non_root_forbidden at line 762). Strongly prefer the "replace with cross-uid privilege-boundary tests" path described in Blocker #1 — the sprint loses its only end-to-end NFR1 guard otherwise. Add positive tests too (alice creates and deletes her own mailbox successfully).
  2. Run the manual UDS smoke described in the Sprint 1 exit gate (or wire it into CI as one of the new sudo-lane tests above) and attach the result. The unit tests don't cover the actual SO_PEERCRED codepath.

Non-blockers (should fix but not blocking):

  1. Fix the stale comment on src/mailbox.rs:41-46 (the mailbox arg is not None, it's config.mailboxes.get(&name)).
  2. Either gate the nonroot_create_orphan_uid_returns_validation_with_exact_message test on a runtime check that uid 4_294_967_294 actually has no passwd entry on the current host, or document the host assumption explicitly.
  3. Add a // TODO(S2-1): comment near the "<new>" sentinel in src/auth.rs so the CLI rendering work in Sprint 2 doesn't accidentally ship "caller does not own mailbox '<new>'" to users.

Nice-to-haves (low priority, worth tracking):

  1. Add a one-line comment in install_tester_resolver_with_current_user explaining that the per-call lookup_username is intentional (preempts a "let me memoize this" PR).
  2. After Sprint 2 lands, audit require_root / enforce_root and either delete them or document the root-only UDS verb that earns their continued existence.

Comment thread src/mailbox_handler.rs Dismissed
…dary regression guards

Address PR #193 review:

- Blocker 1: Replace `mailbox_create_non_root_forbidden` and
  `mailbox_delete_non_root_forbidden` (which still asserted the retired
  root-only behavior) with the actual privilege-boundary tests Sprint 1
  is meant to defend:
  * `mailbox_create_non_root_owner_synthesized_from_peercred` — alice
    submits `Owner: aimx-test-bob`, the daemon must drop the wire owner
    and bind the on-disk owner to alice's `SO_PEERCRED` uid (NFR1, real
    end-to-end through two real Linux uids).
  * `mailbox_delete_non_root_cross_uid_returns_no_such_mailbox` — alice
    tries to delete bob's mailbox, response template must be
    byte-identical to the canonical no-such-mailbox response (NFR2).
  Plus positive happy-path tests for non-root owner-matched create and
  create+delete cycle.

- Non-blocker 3: Fix the stale comment in `src/mailbox.rs` delete-path
  gate to describe what the code actually does (mailbox arg is
  `config.mailboxes.get(&name)`, not `None`).

- Non-blocker 4: Discover an unused high uid at runtime in
  `nonroot_create_orphan_uid_returns_validation_with_exact_message`
  rather than hard-coding `4_294_967_294` (which maps to `nobody` on
  some images). Walks down from `0xFFFFFFFE` via `lookup_username` and
  skips with a diagnostic when no orphan uid exists in the test
  environment.

- Non-blocker 5: Add `// TODO(S2-1):` comment near the `<new>` sentinel
  in `src/auth.rs` so Sprint 2's `format_auth_error` work doesn't ship
  the awkward "caller does not own mailbox '<new>'" rendering verbatim.

Verified locally:
- `cargo test` (default lane): 1012 unit + 92 integration + 1 doctest, all pass
- `sudo -E AIMX_INTEGRATION_SUDO=1 cargo test --test uds_authz -- --ignored`:
  21/21 pass (including the four new tests + retained `*_root_ok` cases)
- `cargo clippy --all-targets -- -D warnings`: clean
- `cargo fmt -- --check`: clean
@uzyn

uzyn commented May 2, 2026

Copy link
Copy Markdown
Owner Author

Sprint 1 review feedback — addressed

Pushed 982f032 on feat/user-mailbox-s1-authz-uds.

Fixed (Blockers)

Blocker 1 — Retired sudo-lane tests now exercise the real privilege boundary. Removed mailbox_create_non_root_forbidden and mailbox_delete_non_root_forbidden; replaced with four new tests in tests/uds_authz.rs that exercise the new default flow end-to-end through real SO_PEERCRED:

  1. mailbox_create_non_root_owner_synthesized_from_peercred — alice submits Owner: aimx-test-bob, daemon returns AIMX/1 OK, and stat -c '%U' /var/lib/aimx/inbox/alicemade confirms the on-disk owner is aimx-test-alice (the SO_PEERCRED caller), never aimx-test-bob (the wire Owner: header). This is the headline NFR1 regression guard at the integration layer.
  2. mailbox_create_non_root_owner_ok — happy path: alice creates her own mailbox, on-disk owner equals alice.
  3. mailbox_delete_non_root_cross_uid_returns_no_such_mailbox — alice tries to delete bob's mailbox; the wire response template (with the substituted name normalized) must equal the canonical no-such-mailbox response captured at runtime by deleting a definitely-missing name. Also runs the no-leak word denylist (owner, permission, EACCES, root, authorized, denied) on the full response. Defense-in-depth check confirms bob's mailbox is still on disk after the failed cross-uid delete.
  4. mailbox_delete_non_root_owner_ok — happy path: alice creates and deletes her own mailbox in one transaction.

Also updated the per-verb ownership matrix in the file's module-level comment so it reflects Sprint 1 reality.

Blocker 2 — Manual UDS smoke is now covered by sudo-lane integration tests. Test #1 above (mailbox_create_non_root_owner_synthesized_from_peercred) is the automated equivalent of the manual socat/stat -c %U smoke in the Sprint 1 exit gate, plus the cross-uid impersonation attempt the manual smoke wouldn't have caught.

Fixed (Non-blockers)

Non-blocker 3 — Stale comment. src/mailbox.rs:41-46 rewritten to describe what the code actually does (mailbox arg is config.mailboxes.get(&name), which is Some(...) when the mailbox is in the config and None otherwise; the predicate then returns one of Ok(()) / NotOwner / NoSuchMailbox accordingly).

Non-blocker 4 — Orphan-uid test fragility. nonroot_create_orphan_uid_returns_validation_with_exact_message now discovers an unused high uid at test time via the new find_orphan_uid() helper (walks down from 0xFFFFFFFE via lookup_username, bounded probe depth of 64). On hosts where the entire upper range is mapped (vanishingly unlikely), the test prints a diagnostic and skips with eprintln!.

Non-blocker 5 — <new> sentinel rendering. Added // TODO(S2-1): comment in src/auth.rs next to the NotOwner { mailbox: "<new>" } construction with concrete options for Sprint 2 (dedicated OwnerMismatch variant or format_auth_error pattern-matches the sentinel).

Intentionally deferred

Non-blocker 6 — CLI non-root + socket-missing path. This is exactly the rough edge that PRD §6.3 R11 (Sprint 2) is scoped to fix: "daemon must be running for non-root mailbox CRUD; start aimx serve or run with sudo to fall back to direct config edit." Left as-is in this PR — Sprint 2 owns the CLI surface and will fail-fast with that exact precise message instead of letting config.save() produce the confusing EPERM error.

Nice-to-haves (review notes #6 and #7). Tracked but not addressed in this PR: a one-line comment in install_tester_resolver_with_current_user about the per-call lookup intent (low priority — the helper is small and the test isolation requirement is documented one level up), and the require_root / enforce_root audit-or-delete decision (deferred until Sprint 2 settles whether any new root-only UDS verb earns their continued existence).

Quality gates

  • cargo test (default lane): 1012 unit + 92 integration + 1 doctest, all green
  • sudo -E AIMX_INTEGRATION_SUDO=1 cargo test --test uds_authz -- --ignored: 21/21 pass (the four new tests, plus the existing *_root_ok and the SEND / MARK-* / HOOK-* matrix). Verified locally on this branch by running with sudo.
  • cargo clippy --all-targets -- -D warnings: clean
  • cargo fmt -- --check: clean

The new sudo-lane tests carry the same #[ignore] + AIMX_INTEGRATION_SUDO=1 gating as the rest of the file, so the dedicated mailbox-dir-perms-isolation CI lane should now go green where it was previously red.

@uzyn uzyn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sprint 1 Re-Review (Cycle 2) — APPROVED

Reviewing commit 982f032e against the Cycle 1 feedback. CI now green on the canonical privilege-boundary lane (mailbox-dir-perms-isolation); both Cycle 1 blockers and all three actionable non-blockers are resolved cleanly.

Resolved Issues

Blocker 1 — Sudo-lane CI (RESOLVED)

The two retired *_non_root_forbidden tests were replaced — not just deleted — with four new tests in tests/uds_authz.rs that exercise the privilege boundary the sprint actually defends:

  1. mailbox_create_non_root_owner_synthesized_from_peercred (line ~778) — alice connects via runuser -u aimx-test-alice -- python3 (genuine SO_PEERCRED codepath, not synthesized Caller::new), submits Owner: aimx-test-bob, daemon returns AIMX/1 OK, and stat -c '%U' /var/lib/aimx/inbox/alicemade confirms the on-disk owner is aimx-test-alice. This is the headline NFR1 regression guard at the integration layer, and it would catch exactly the regression class Cycle 1 was worried about (a future patch letting req.owner flow into the non-root branch).
  2. mailbox_create_non_root_owner_ok (line ~822) — happy path: alice creates aliceowned, on-disk owner equals alice. Pins the new default flow.
  3. mailbox_delete_non_root_cross_uid_returns_no_such_mailbox (line ~872) — alice tries to delete bob's mailbox; the test (a) captures the canonical no-such-mailbox response at runtime by issuing a delete for a definitely-missing name, (b) normalizes the substituted name to <NAME> on both sides and asserts byte-identical templates, (c) asserts the response contains "does not exist", (d) runs a no-leak word denylist (owner, permission, EACCES, root, authorized, denied) on the full response, and (e) defense-in-depth verifies bob's inbox still exists on disk.
  4. mailbox_delete_non_root_owner_ok (line ~942) — happy path: alice creates and deletes her own mailbox in one transaction.

Verified mechanism: send_frame_as(Some(ALICE), ...) invokes runuser -u aimx-test-alice -- python3 client.py which spawns a real process under alice's uid that connects to the UDS socket. The daemon's accept loop in src/serve.rs:753 calls crate::uds_authz::caller_from_stream, which reads peer_cred() (the kernel's SO_PEERCRED) and constructs the Caller via from_ucred, not test-only Caller::new. This is genuinely end-to-end through SO_PEERCRED.

The no-leak denylist is comprehensive against the actual reason strings: no_such_mailbox_reason() produces "mailbox '{name}' does not exist" (no banned words); not_authorized_wire_reason() produces "not authorized" (caught by authorized). Any future regression that swapped these would be caught.

Blocker 2 — Manual UDS smoke (RESOLVED)

Test #1 above is the automated equivalent of the manual socat/stat -c %U smoke from the Sprint 1 exit gate, plus the cross-uid impersonation attempt that the manual smoke wouldn't have covered. Now permanent CI coverage instead of a one-off transcript.

Non-blocker 3 — Stale comment in src/mailbox.rs (RESOLVED)

src/mailbox.rs:41-47 now accurately describes the three-way predicate behavior (passes / NotOwner / NoSuchMailbox) and correctly identifies the mailbox arg as config.mailboxes.get(&name).

Non-blocker 4 — Orphan-uid test fragility (RESOLVED, host-portable)

nonroot_create_orphan_uid_returns_validation_with_exact_message now calls find_orphan_uid() which probes downward from 0xFFFFFFFE via crate::uds_authz::lookup_username with a bounded PROBE_DEPTH of 64. On hosts where every uid in the upper 64-deep range maps to a passwd entry (vanishingly unlikely — Alpine maps 0xFFFFFFFE to nobody but the prior 63 are typically unmapped), the test prints a diagnostic and skips. Cleanly host-portable, no false positives.

Non-blocker 5 — <new> sentinel TODO (RESOLVED)

src/auth.rs:120-127 now carries a precise // TODO(S2-1): comment with the two concrete options (dedicated OwnerMismatch { intended_owner_uid: u32 } variant or format_auth_error pattern-match). Sprint 2 won't ship "caller does not own mailbox '<new>'" to humans by accident.

Non-blocker 6 — CLI non-root + socket-missing path (DEFERRED, ACCEPTED)

Implementer correctly deferred to Sprint 2 with PRD §6.3 R11 reasoning. This is exactly the rough edge S2-1 is scoped to fix, and the deferral note in the PR comment is clear about the Sprint 2 commitment.

CI Status

  • mailbox-dir-perms-isolation (the canonical privilege-boundary lane): PASS in 2m16s. The Run UDS authz tests (under sudo) step that was red in Cycle 1 is now green.
  • verifier-tests, docs-build, Analyze (actions), Analyze (javascript-typescript): all PASS.
  • core-tests and Analyze (rust): in progress at review time. These are the standard default-lane jobs the implementer locally verified green; no concern.

Spot Checks

  • Real SO_PEERCRED path verified: confirmed Caller::new is #[cfg(test)]-gated (src/uds_authz.rs:91) so production callers must go through Caller::from_ucred, which reads the kernel's SO_PEERCRED. The integration tests use runuser to spawn under a real uid — no synthesis bypass.
  • Cross-uid delete denylist comprehensiveness: the six-word list catches every plausible permission-error idiom; verified against the two helper functions that produce the wire reasons.
  • Orphan-uid host portability: probe walks down from 0xFFFFFFFE with depth 64; survives Alpine's nobody=0xFFFFFFFE quirk and any other plausible high-uid mapping.
  • Per-verb ownership matrix in module doc (tests/uds_authz.rs:7-29): updated to reflect Sprint 1 reality (MAILBOX-CREATE now OK† for non-root with the wire-owner-dropped footnote; MAILBOX-DELETE now NOMBX‡ for cross-uid with the no-leak footnote). Documentation tracks code.

New Issues Found

None. The Cycle 2 changes are surgical and well-scoped. No regressions introduced.

Summary

The implementer addressed Cycle 1 feedback completely and at the right depth. The four replacement sudo-lane tests are stronger than what the Cycle 1 review asked for — mailbox_delete_non_root_cross_uid_returns_no_such_mailbox in particular goes well beyond a substring check by capturing the canonical reason at runtime, normalizing both responses, and applying defense-in-depth checks. NFR1 and NFR2 now have permanent integration-layer coverage on the sudo CI lane, and the orphan-uid test is genuinely host-portable. Ready to merge once core-tests finishes its routine green run.

Recommended merge commit message

[Sprint 1] user-mailbox: owner-bound MAILBOX-CREATE/DELETE over UDS

Replace the central UDS authz predicate's root-only MAILBOX-CRUD action
with two owner-bound variants (MailboxCreate{owner_uid} and
MailboxDelete{mailbox}). Rewrite the daemon-side mailbox handler so
non-root callers can create and delete mailboxes they own, with the
on-disk owner always synthesized from SO_PEERCRED — the wire `Owner:`
header is dropped on the floor for non-root callers, making
privilege escalation through this surface structurally impossible.

The unowned-DELETE wire response is byte-identical to the
no-such-mailbox response (NFR2: no information leak). Both paths route
through a single `no_such_mailbox_reason()` helper.

Adds peer_username() resolver in src/uds_authz.rs as the inverse of
MailboxConfig::owner_uid(). Five unit tests and four sudo-lane
integration tests pin the privilege boundary end-to-end through
real SO_PEERCRED, including a cross-uid impersonation attempt and
a defense-in-depth no-leak word denylist on the cross-uid delete path.

Sprint 2 will drop the CLI's entry-point root gate and add the MCP
mailbox_create / mailbox_delete tools; Sprint 3 ships docs/release.

Tests: 1012 unit + 92 integration + 1 doctest green; sudo lane
21/21 green. Clippy clean, fmt clean.

Comment thread tests/uds_authz.rs Dismissed
@uzyn uzyn merged commit a06798b into main May 2, 2026
8 checks passed
@uzyn uzyn deleted the feat/user-mailbox-s1-authz-uds branch May 2, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants