Skip to content

[Sprint 2.5] Non-blocking Cleanup#3

Merged
uzyn merged 1 commit into
mainfrom
sprint-2.5-cleanup
Apr 9, 2026
Merged

[Sprint 2.5] Non-blocking Cleanup#3
uzyn merged 1 commit into
mainfrom
sprint-2.5-cleanup

Conversation

@uzyn

@uzyn uzyn commented Apr 9, 2026

Copy link
Copy Markdown
Owner

Summary

  • S2.5-1 Ingest Hardening: Added --data-dir / AIMX_DATA_DIR CLI option to override default data path. Added mailbox name validation to reject .., /, \, empty, and null byte names. Replaced hand-rolled yaml_escape with serde_yaml struct serialization for frontmatter generation, eliminating edge cases. Added full-pipeline integration tests exercising ingest_email() with all fixture .eml files via CLI subprocess.
  • S2.5-2 Send Hardening: Added escape_filename() to sanitize attachment filenames in MIME headers (quotes, newlines, carriage returns). Refactored duplicated header construction into write_common_headers(). Added end-to-end integration tests for aimx dkim-keygen CLI. Added test verifying custom dkim_selector config value is used in DKIM signing.

Changes

File Change
Cargo.toml Added clap env feature for AIMX_DATA_DIR env var support
src/cli.rs Added --data-dir global option with AIMX_DATA_DIR env
src/main.rs Thread data_dir option through to all subcommand handlers
src/ingest.rs Replace yaml_escape + manual frontmatter with serde_yaml struct serialization; accept data_dir override
src/mailbox.rs Add validate_mailbox_name() with path traversal checks; accept data_dir override
src/send.rs Add escape_filename(), extract write_common_headers(), accept data_dir override
tests/integration.rs 12 tests total: full-pipeline ingest for all 4 fixtures, routing, env var, dkim-keygen e2e

Test results

  • 64 unit tests passing (8 new)
  • 12 integration tests passing (7 new)
  • cargo clippy -- -D warnings clean
  • cargo fmt -- --check clean

Test plan

  • All existing tests pass (no regressions)
  • --data-dir flag overrides default data directory
  • AIMX_DATA_DIR env var overrides default data directory
  • Mailbox name validation rejects .., /, \, empty, null bytes
  • Frontmatter generated by serde_yaml is valid YAML with correct field values
  • Special characters in email fields (newlines, colons, hashes, @) are handled correctly by serde_yaml
  • Attachment filenames with quotes/newlines/CR are escaped in MIME headers
  • dkim-keygen CLI generates keys and outputs DNS record
  • Custom dkim_selector value appears in DKIM signature (s= parameter)
  • Full pipeline integration: fixture .eml -> CLI ingest -> .md file with correct frontmatter and body

S2.5-1 Ingest Hardening:
- Add --data-dir / AIMX_DATA_DIR CLI option to override /var/lib/aimx/
- Add mailbox name validation (reject .., /, \, empty, null bytes)
- Replace hand-rolled yaml_escape with serde_yaml struct serialization
- Add full-pipeline integration tests for all fixture .eml files

S2.5-2 Send Hardening:
- Escape attachment filenames in MIME headers (quotes, newlines, CR)
- Refactor duplicated header construction into write_common_headers()
- Add integration tests for dkim-keygen CLI end-to-end
- Add test verifying dkim_selector config value is used at runtime

@uzyn uzyn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sprint 2.5 Review: Non-blocking Cleanup

Sprint Goal Assessment

Goal: Address accumulated non-blocking improvements from Sprint 1 and Sprint 2 reviews.

Verdict: Goal fully met. All 9 improvement items from the review backlog have been addressed with clean implementations and thorough test coverage.

Acceptance Criteria Checklist

S2.5-1: Ingest Hardening + Testing

Criterion Status
--data-dir / AIMX_DATA_DIR CLI option to override default data path Met -- cli.rs adds a global --data-dir arg with env = "AIMX_DATA_DIR". Threaded through main.rs to all subcommands via cli.data_dir.as_deref(). Integration test ingest_via_env_var confirms env var path.
Mailbox name validation to reject .., /, \, empty, null bytes Met -- validate_mailbox_name() in mailbox.rs checks all four cases. Unit tests cover empty, ../etc, foo/bar, foo\bar.
Replace yaml_escape with serde_yaml struct serialization Met -- EmailMetadata and AttachmentMeta now derive Serialize. format_markdown() uses serde_yaml::to_string(). The old yaml_escape function and its 59 lines of manual frontmatter building are removed. Tests updated to parse YAML structurally rather than matching string fragments.
\r in yaml_escape Met (superseded) -- yaml_escape removed entirely; serde_yaml handles all escaping.
Full-pipeline integration tests with fixture .eml files Met -- 4 fixture-based integration tests (plain, html, multipart, attachment) now exercise the actual CLI binary via --data-dir, then verify frontmatter fields structurally and body content. Additional routing tests (ingest_routes_to_named_mailbox, ingest_unknown_routes_to_catchall) also added.

S2.5-2: Send Hardening + Testing

Criterion Status
Escape attachment filenames in MIME headers Met -- escape_filename() handles \, ", \r, \n. Applied to both Content-Type and Content-Disposition name/filename parameters. Three unit tests cover quotes, newlines, and CR.
Integration test for aimx dkim-keygen CLI Met -- Three integration tests: dkim_keygen_end_to_end (generates keys, checks DNS output), dkim_keygen_no_overwrite (existing keys block), dkim_keygen_force_overwrite (--force regenerates).
Refactor duplicated header construction in compose_message() Met -- write_common_headers() extracted. From/To/Subject/Date/Message-ID/In-Reply-To/References/MIME-Version are now in one place. Both attachment and non-attachment paths call it. Net reduction of ~50 lines.
Test verifying dkim_selector config value is used at runtime Met -- dkim_selector_config_used test sends with "myselector" and asserts s=myselector appears in the DKIM-Signature header.

Test Coverage

  • 64 unit tests passing (8 new in this sprint)
  • 12 integration tests passing (7 new in this sprint)
  • cargo clippy -- -D warnings: clean
  • cargo fmt -- --check: clean

Test coverage is comprehensive. The integration tests are a significant improvement -- they now exercise the real binary end-to-end rather than just testing mail-parser parseability.

Potential Bugs

None found. The serde_yaml migration produces clean, structurally valid YAML. The read: false field is correctly serialized as a YAML boolean. The frontmatter format (---\n<yaml>\n---\n\n<body>) is correct -- serde_yaml::to_string() does not emit a leading ---, so there is no double-delimiter issue.

Security Issues

None found. The mailbox name validation covers the important path traversal vectors (.., /, \, null bytes). The attachment filename escaping handles quote injection and header-breaking characters. Both were the primary security-relevant items in this sprint.

Alignment with PRD

Fully aligned. This sprint was a cleanup sprint with no new features -- it hardens the ingest and send pipelines that were built in Sprints 1 and 2. The --data-dir option enables testability and non-standard deployments, which supports the PRD's goal of a flexible single-binary tool.

Non-blocker Observations

  1. serde_yaml 0.9 is unmaintained -- The serde_yaml crate (v0.9) has been officially deprecated in favor of alternatives. This is not a problem now, but worth noting for a future dependency audit. No action needed in this sprint.

  2. unwrap_or_default() on serde_yaml::to_string -- In format_markdown(), serde_yaml::to_string(meta).unwrap_or_default() silently produces empty frontmatter on serialization failure. In practice, EmailMetadata serialization should never fail (no maps with non-string keys, no custom serializers), but a future field change could make this fail silently. Consider expect() or error propagation in a future sprint.

Summary

Clean implementation of all 9 cleanup items. The serde_yaml migration is the most impactful change -- it eliminates an entire class of YAML escaping bugs and reduces code by ~60 lines. The integration tests are now genuinely end-to-end (binary subprocess with real fixture files), which is a major improvement over the previous tests that only checked mail-parser parseability. No bugs, no security issues, no regressions.

Verdict: APPROVED -- Ready to merge.


Recommended merge commit message:

Harden ingest and send pipelines with data-dir override, input validation, and end-to-end tests

- Add --data-dir / AIMX_DATA_DIR CLI option to override default /var/lib/aimx path
- Add mailbox name validation (reject .., /, \, empty, null bytes)
- Replace hand-rolled yaml_escape with serde_yaml struct serialization for frontmatter
- Escape attachment filenames in MIME headers (quotes, newlines, CR)
- Extract write_common_headers() to deduplicate header construction in compose_message()
- Add dkim_selector config usage test
- Add 7 integration tests exercising full CLI pipeline with fixture .eml files
- Add 3 dkim-keygen CLI end-to-end tests (generate, no-overwrite, force)

@uzyn uzyn merged commit 97b602c into main Apr 9, 2026
2 checks passed
@uzyn uzyn deleted the sprint-2.5-cleanup branch April 10, 2026 00:32
uzyn added a commit that referenced this pull request Apr 10, 2026
…robustness

- Reorder validate-before-sanitize in write_common_headers so
  compose_message() returns an error when From/To/Subject contain CRLF,
  with sanitization as a safety net (Bug #1)
- Sanitize In-Reply-To and References headers to prevent CRLF injection
  from malicious inbound Message-ID values (Bug #2)
- Add upper-bound guard (999,999) on counter in create_file_atomic to
  prevent theoretical u32 overflow (Suggestion #4)
- Replace duplicate setup_verify_step_available test with a compile-time
  signature check that is distinct from verify_runner_trait_mock_pass
  (Suggestion #3)
- Update CRLF injection tests to verify error return instead of silent
  sanitization, add new tests for reply_to/references injection
uzyn added a commit that referenced this pull request Apr 17, 2026
…lans

Code-review-backed fix plans for each of the 10 findings, with file:line
refs, effort estimates, and a priority order. No code changes yet —
this consolidates the investigation so fixes can be sequenced.

- #10 DKIM mismatch: not a code bug; DNS republish + optional startup check
- #9 shell injection: pass trigger vars via env, not string substitution
- #8 MCP writes: route state mutations through daemon UDS
- #7 claude-code hint: print `claude mcp add` command post-install
- #4 send config read: move mailbox resolution to daemon side
- #2 SPF: plumb envelope MAIL FROM from smtp session through ingest
- #5 wildcard send: remove wildcard branch from resolve_from_mailbox
- #1 mailbox create: add restart hint (or route via daemon)
- #3 plan wording: clarify "compose new" in docs/manual-test.md
uzyn added a commit that referenced this pull request Apr 17, 2026
* docs: add manual test results with findings

Full execution of docs/manual-test.md against agent.zeroshot.lol.
10 findings recorded with severity and fix direction, notably:

- P0 DKIM key on disk does not match DNS TXT (root cause of
  outbound dkim=fail at Gmail)
- P0 Shell injection in on_receive cmd template expansion
- P1 MCP write ops (email_mark_read, etc.) fail when MCP runs
  as non-root due to root:root 0644 mailbox files

* docs: add Recommended fixes section with per-finding implementation plans

Code-review-backed fix plans for each of the 10 findings, with file:line
refs, effort estimates, and a priority order. No code changes yet —
this consolidates the investigation so fixes can be sequenced.

- #10 DKIM mismatch: not a code bug; DNS republish + optional startup check
- #9 shell injection: pass trigger vars via env, not string substitution
- #8 MCP writes: route state mutations through daemon UDS
- #7 claude-code hint: print `claude mcp add` command post-install
- #4 send config read: move mailbox resolution to daemon side
- #2 SPF: plumb envelope MAIL FROM from smtp session through ingest
- #5 wildcard send: remove wildcard branch from resolve_from_mailbox
- #1 mailbox create: add restart hint (or route via daemon)
- #3 plan wording: clarify "compose new" in docs/manual-test.md
uzyn added a commit that referenced this pull request Apr 17, 2026
Sprint 44 (post-launch security + quick fixes) addresses findings #9,
#10, #7, #1-tier-1, #3. Sprint 45 (strict outbound + MCP writes via
daemon addresses #4, #5, #8 and introduces UDS MARK-READ/MARK-UNREAD
verbs. Sprint 46 (mailbox CRUD via UDS) addresses #1-tier-2 with
MAILBOX-CREATE/MAILBOX-DELETE verbs so daemon picks up changes live.

PRD FR-18d tightened: outbound send now requires a concrete non-wildcard
mailbox under config.domain; catchall is inbound-only. PRD FR-18e added
to cover the new state-mutation verbs on the UDS socket.

Finding #2 (SPF envelope MAIL FROM) excluded — already shipped in
cd22428.
EOF
)
uzyn added a commit that referenced this pull request Apr 21, 2026
Harden ingest and send pipelines with data-dir override, input validation, and end-to-end tests

- Add --data-dir / AIMX_DATA_DIR CLI option to override default /var/lib/aimx path
- Add mailbox name validation (reject .., /, \, empty, null bytes)
- Replace hand-rolled yaml_escape with serde_yaml struct serialization for frontmatter
- Escape attachment filenames in MIME headers (quotes, newlines, CR)
- Extract write_common_headers() to deduplicate header construction in compose_message()
- Add dkim_selector config usage test
- Add 7 integration tests exercising full CLI pipeline with fixture .eml files
- Add 3 dkim-keygen CLI end-to-end tests (generate, no-overwrite, force)
uzyn added a commit that referenced this pull request Apr 21, 2026
…lans

Code-review-backed fix plans for each of the 10 findings, with file:line
refs, effort estimates, and a priority order. No code changes yet —
this consolidates the investigation so fixes can be sequenced.

- #10 DKIM mismatch: not a code bug; DNS republish + optional startup check
- #9 shell injection: pass trigger vars via env, not string substitution
- #8 MCP writes: route state mutations through daemon UDS
- #7 claude-code hint: print `claude mcp add` command post-install
- #4 send config read: move mailbox resolution to daemon side
- #2 SPF: plumb envelope MAIL FROM from smtp session through ingest
- #5 wildcard send: remove wildcard branch from resolve_from_mailbox
- #1 mailbox create: add restart hint (or route via daemon)
- #3 plan wording: clarify "compose new" in docs/manual-test.md
uzyn added a commit that referenced this pull request Apr 21, 2026
* docs: add manual test results with findings

Full execution of docs/manual-test.md against agent.zeroshot.lol.
10 findings recorded with severity and fix direction, notably:

- P0 DKIM key on disk does not match DNS TXT (root cause of
  outbound dkim=fail at Gmail)
- P0 Shell injection in on_receive cmd template expansion
- P1 MCP write ops (email_mark_read, etc.) fail when MCP runs
  as non-root due to root:root 0644 mailbox files

* docs: add Recommended fixes section with per-finding implementation plans

Code-review-backed fix plans for each of the 10 findings, with file:line
refs, effort estimates, and a priority order. No code changes yet —
this consolidates the investigation so fixes can be sequenced.

- #10 DKIM mismatch: not a code bug; DNS republish + optional startup check
- #9 shell injection: pass trigger vars via env, not string substitution
- #8 MCP writes: route state mutations through daemon UDS
- #7 claude-code hint: print `claude mcp add` command post-install
- #4 send config read: move mailbox resolution to daemon side
- #2 SPF: plumb envelope MAIL FROM from smtp session through ingest
- #5 wildcard send: remove wildcard branch from resolve_from_mailbox
- #1 mailbox create: add restart hint (or route via daemon)
- #3 plan wording: clarify "compose new" in docs/manual-test.md
uzyn added a commit that referenced this pull request Apr 21, 2026
Sprint 44 (post-launch security + quick fixes) addresses findings #9,
#10, #7, #1-tier-1, #3. Sprint 45 (strict outbound + MCP writes via
daemon addresses #4, #5, #8 and introduces UDS MARK-READ/MARK-UNREAD
verbs. Sprint 46 (mailbox CRUD via UDS) addresses #1-tier-2 with
MAILBOX-CREATE/MAILBOX-DELETE verbs so daemon picks up changes live.

PRD FR-18d tightened: outbound send now requires a concrete non-wildcard
mailbox under config.domain; catchall is inbound-only. PRD FR-18e added
to cover the new state-mutation verbs on the UDS socket.

Finding #2 (SPF envelope MAIL FROM) excluded — already shipped in
f5cebd2.
EOF
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant