[Sprint 1] Core Pipeline + Idea Validation#1
Conversation
- S1.1: Project scaffolding with clap CLI, config.yaml schema, CI workflow - S1.2: Email ingest pipeline (EML -> Markdown with YAML frontmatter) - S1.3: Attachment extraction with deduplication - S1.4: Basic email sending via sendmail with RFC 5322 composition - S1.5: Mailbox create/list/delete with config persistence 35 tests (30 unit + 5 integration), clippy clean, fmt clean.
uzyn
left a comment
There was a problem hiding this comment.
Sprint 1 Review: Core Pipeline + Idea Validation
Sprint Goal Assessment
Sprint 1 aimed to get the inbound and outbound email pipeline working end-to-end for idea validation on a real VPS. The implementation delivers a well-structured Rust project with a working ingest pipeline, basic email sending, mailbox management, CI, and test fixtures. The overall architecture is clean, idiomatic Rust, and well-tested. Two bugs need to be fixed before merge.
Acceptance Criteria Checklist
S1.1 -- Project Scaffolding + CI
| # | Criterion | Status |
|---|---|---|
| 1 | cargo build produces a single aimx binary |
Met |
| 2 | aimx --help shows all planned subcommands (ingest, send, mcp, mailbox, setup, status, preflight, verify) |
Met -- verified via integration test |
| 3 | config.yaml schema defined and parseable with serde: domain, data directory, mailboxes with addresses and on_receive stubs |
Met -- includes forward-compatible on_receive with match filters |
| 4 | Default data directory is /var/lib/aimx/ |
Met |
| 5 | Tests pass for config parsing with sample config | Met -- 7 config tests |
| 6 | GitHub Actions CI workflow runs on push and PR: cargo test, cargo clippy -- -D warnings, cargo fmt -- --check |
Met |
| 7 | CI uses stable Rust toolchain | Met -- dtolnay/rust-toolchain@stable |
| 8 | Test fixtures directory with sample .eml files (plain text, HTML-only, multipart, with attachments) |
Met -- 4 fixture files in tests/fixtures/ |
S1.2 -- Email Ingest Pipeline
| # | Criterion | Status |
|---|---|---|
| 1 | cat test.eml | aimx ingest user@domain.com produces correctly formatted .md |
Met (via library function; CLI hardcoded to /var/lib/aimx) |
| 2 | Frontmatter includes all required fields | Met -- id, message_id, from, to, subject, date, in_reply_to, references, attachments, mailbox, read |
| 3 | read is set to false on ingest |
Met |
| 4 | File named YYYY-MM-DD-NNN.md with incrementing counter |
Met |
| 5 | Unrecognized local parts route to catchall |
Met |
| 6 | HTML-only emails converted to plaintext | Met -- uses html2text |
| 7 | Multipart emails extract text/plain correctly | Met |
| 8 | Unit tests for EML-to-Markdown conversion | Met -- 6 ingest unit tests |
| 9 | Unit tests for frontmatter generation and YAML validity | Met -- frontmatter_valid_yaml test |
| 10 | Unit tests for mailbox routing | Met -- ingest_routes_unknown_to_catchall |
| 11 | Integration test: pipe fixture .eml via stdin, verify .md output |
Partial -- integration tests only validate fixture parseability with mail-parser, not full pipeline output. The comment in the code acknowledges the config path limitation. |
S1.3 -- Attachment Extraction
| # | Criterion | Status |
|---|---|---|
| 1 | Attachments saved to <mailbox>/attachments/<filename> |
Met |
| 2 | Duplicate filenames deduplicated (append counter) | Met |
| 3 | Frontmatter lists each attachment with filename, content_type, size, path | Met |
| 4 | Unit tests: single/multiple/duplicate/none | Met -- 4 attachment tests |
| 5 | Integration test: ingest .eml with attachments, verify files on disk |
Partial -- unit tests cover this, but the integration test only checks attachment_count() |
S1.4 -- Basic Email Sending
| # | Criterion | Status |
|---|---|---|
| 1 | aimx send --from --to --subject --body composes and sends |
Met |
| 2 | Valid RFC 5322 headers (From, To, Subject, Date, Message-ID) | Met |
| 3 | Message handed to sendmail | Met -- via /usr/sbin/sendmail -t |
| 4 | Sending errors produce clear messages | Met |
| 5 | Unit tests for RFC 5322 message composition | Met -- 3 composition tests |
| 6 | Sendmail handoff abstracted behind trait | Met -- MailTransport trait with MockTransport in tests |
S1.5 -- Mailbox Management
| # | Criterion | Status |
|---|---|---|
| 1 | aimx mailbox create creates directory and registers in config |
Met |
| 2 | aimx mailbox list shows mailboxes with message counts |
Met |
| 3 | aimx mailbox delete removes directory and config entry (with confirmation) |
Met -- --yes flag to skip prompt |
| 4 | Duplicate create produces clear error | Met |
| 5 | catchall cannot be deleted |
Met |
| 6 | Unit tests for create/list/delete | Met -- 7 mailbox tests |
| 7 | Unit tests for error cases | Met |
Bugs (Blockers)
Bug 1: Path traversal in attachment filename extraction (Security)
File: src/ingest.rs, lines 148-174
The attachment filename from the email (attachment.attachment_name()) is used directly in file path construction without sanitization. A malicious email with an attachment named ../../../etc/cron.d/evil would write to arbitrary filesystem locations.
Fix: Sanitize the filename by stripping path separators and .. components. For example:
let filename = Path::new(name).file_name()
.and_then(|f| f.to_str())
.unwrap_or("attachment")
.to_string();Bug 2: YAML frontmatter injection via newline in email headers
File: src/ingest.rs, yaml_escape() function, lines 227-251
The yaml_escape function detects strings containing \n and wraps them in double quotes, but embeds the raw newline character inside the quoted string rather than escaping it as \\n. A crafted email subject like "test\n---\ninjected: true" would break the YAML frontmatter structure, producing:
subject: "test
---
injected: true"
This breaks any downstream YAML frontmatter parser (including the Sprint 3 MCP server that reads these files).
Fix: Replace literal newlines with \\n inside the double-quoted string:
format!("\"{}\"", s.replace('\\', "\\\\").replace('"', "\\\"").replace('\n', "\\n"))Non-Blocker Suggestions
1. No --data-dir or --config CLI option
The binary is hardcoded to /var/lib/aimx/. This makes integration testing impossible without root access (as the integration test comments note). Consider adding a --data-dir global option or AIMX_DATA_DIR environment variable. This would also enable the integration tests to exercise the full pipeline with fixture files.
2. Integration tests don't exercise the full ingest pipeline
The integration tests for fixture files (ingest_plain_fixture, etc.) only verify that mail-parser can parse the .eml files. They don't test the actual ingest_email() function with fixture data, which was called for in S1.2 acceptance criterion 11. The unit tests partially compensate (they test ingest_email() with inline EML), but using the fixture files through the full pipeline would be more thorough.
3. Mailbox name validation missing
create_mailbox accepts arbitrary strings including .., /, or empty strings as mailbox names. While this is operator-initiated (lower risk), basic validation would prevent accidental misconfiguration.
4. yaml_escape is fragile -- consider using serde_yaml for frontmatter
The hand-rolled yaml_escape function covers many special characters but could miss edge cases (e.g., strings that look like YAML booleans: true, false, yes, no, null; strings starting with -, ?, ,). Since serde_yaml is already a dependency, consider serializing the frontmatter as a proper YAML struct rather than string concatenation.
5. Minor: html2text::from_read result used with unwrap_or_else
At src/ingest.rs:134, the fallback on html2text failure returns the raw HTML, which would produce a Markdown file containing HTML tags. This is acceptable behavior (graceful degradation), but worth noting.
Test Coverage Summary
- 30 unit tests across config (7), ingest (13), send (5), mailbox (7) -- good coverage
- 5 integration tests -- CLI help validation + fixture parsing
- All tests pass, clippy clean, fmt clean
- Gap: No end-to-end integration test that pipes fixture EML through
ingest_email()and verifies the output.mdfile content
PRD Alignment
The implementation aligns well with PRD milestones M1 (Core Pipeline). Specifically:
- FR-11 (accept raw .eml from stdin): Met
- FR-12 (parse MIME, extract headers/body/attachments): Met
- FR-13 (Markdown with YAML frontmatter): Met (with the yaml_escape bug noted above)
- FR-14 (extract attachments): Met (with the path traversal bug noted above)
- FR-15 (route to mailbox or catchall): Met
- FR-17 (compose RFC 5322 email): Met
- FR-19 (hand to sendmail): Met
- FR-26 (create mailbox): Met
- FR-27 (list mailboxes): Met
- FR-28 (delete mailbox with confirmation): Met
Summary
Verdict: Needs minor fixes
The Sprint 1 implementation is solid overall -- clean architecture, good test coverage (35 tests), idiomatic Rust, proper error handling in production code, and a well-abstracted mail transport trait. Two bugs need to be fixed before merge:
- Path traversal in attachment filenames -- a security issue where malicious email attachments could write to arbitrary filesystem paths.
- YAML frontmatter injection via newlines -- crafted email headers can break the frontmatter structure that downstream Sprint 3 code depends on.
Both are straightforward fixes. Once addressed, this is ready to merge.
Sanitize attachment filenames with Path::file_name() to prevent directory traversal attacks. Escape literal newlines and carriage returns in yaml_escape() to prevent frontmatter injection.
Review Fixes AppliedBlockers (both fixed)1. Path traversal in attachment filenames — Fixed in 2. YAML frontmatter injection via newlines — Fixed in Non-blockers (intentionally left as-is)1. No 2. Integration tests only verify fixture parseability — Acknowledged. The unit tests exercise 3. Mailbox name validation — Low severity since it's operator-initiated via CLI. Will add validation in a follow-up alongside the 4. Hand-rolled 5. HTML-to-text fallback returns raw HTML — Acceptable graceful degradation. The alternative (empty body) would lose more information. Test resultsAll 37 tests pass (32 unit + 5 integration), clippy clean, fmt clean. |
uzyn
left a comment
There was a problem hiding this comment.
Re-review: All Blockers Resolved -- APPROVED
Blocker 1: Path traversal in attachment filenames -- FIXED
Verified in src/ingest.rs lines 153-161. Path::file_name() correctly strips all directory components from the raw attachment name. Edge cases (.., ., /, empty) all produce None or empty string and are skipped via the if filename.is_empty() { continue; } guard. The attachment_path_traversal_sanitized test confirms ../../../etc/cron.d/evil is reduced to evil and no etc directory is created.
Blocker 2: YAML frontmatter injection via newlines -- FIXED
Verified in src/ingest.rs lines 233-262. The yaml_escape() function now replaces \n with \\n and \r with \\r inside double-quoted YAML strings. Crucially, the escape order is correct: backslashes are replaced first (\\ -> \\\\), then double quotes, then newlines -- preventing double-escaping. The yaml_escape_newline_injection test verifies that a string containing \n---\ninjected: true round-trips correctly through serde_yaml.
Minor observation: \r is not in the condition that triggers quoting, so a string containing only \r (no \n) would pass through unquoted. This is not exploitable for frontmatter injection since bare \r does not create a YAML line break, but adding s.contains('\r') to the condition would be a trivial future hardening.
Non-blockers (5) -- Accepted as deferred
All five non-blockers were deferred with reasonable justification. The library layer already supports configurable data_dir via Config, making the --data-dir CLI flag a clean Sprint 2 addition.
Verification
All 37 tests pass (32 unit + 5 integration). Both new security tests (attachment_path_traversal_sanitized, yaml_escape_newline_injection) exercise the exact attack vectors that were flagged.
Recommended merge commit message
Implement Sprint 1: core pipeline, ingest, send, mailbox management
- Project scaffolding with clap CLI, config.yaml schema, CI workflow
- Email ingest pipeline: EML parsing, Markdown+YAML frontmatter output,
attachment extraction with deduplication and path traversal protection
- Basic email sending via sendmail with RFC 5322 composition
- Mailbox CRUD (create/list/delete) with config persistence
- 37 tests (32 unit + 5 integration), clippy clean, fmt clean
…robustness - Reorder validate-before-sanitize in write_common_headers so compose_message() returns an error when From/To/Subject contain CRLF, with sanitization as a safety net (Bug #1) - Sanitize In-Reply-To and References headers to prevent CRLF injection from malicious inbound Message-ID values (Bug #2) - Add upper-bound guard (999,999) on counter in create_file_atomic to prevent theoretical u32 overflow (Suggestion #4) - Replace duplicate setup_verify_step_available test with a compile-time signature check that is distinct from verify_runner_trait_mock_pass (Suggestion #3) - Update CRLF injection tests to verify error return instead of silent sanitization, add new tests for reply_to/references injection
…lans Code-review-backed fix plans for each of the 10 findings, with file:line refs, effort estimates, and a priority order. No code changes yet — this consolidates the investigation so fixes can be sequenced. - #10 DKIM mismatch: not a code bug; DNS republish + optional startup check - #9 shell injection: pass trigger vars via env, not string substitution - #8 MCP writes: route state mutations through daemon UDS - #7 claude-code hint: print `claude mcp add` command post-install - #4 send config read: move mailbox resolution to daemon side - #2 SPF: plumb envelope MAIL FROM from smtp session through ingest - #5 wildcard send: remove wildcard branch from resolve_from_mailbox - #1 mailbox create: add restart hint (or route via daemon) - #3 plan wording: clarify "compose new" in docs/manual-test.md
* docs: add manual test results with findings Full execution of docs/manual-test.md against agent.zeroshot.lol. 10 findings recorded with severity and fix direction, notably: - P0 DKIM key on disk does not match DNS TXT (root cause of outbound dkim=fail at Gmail) - P0 Shell injection in on_receive cmd template expansion - P1 MCP write ops (email_mark_read, etc.) fail when MCP runs as non-root due to root:root 0644 mailbox files * docs: add Recommended fixes section with per-finding implementation plans Code-review-backed fix plans for each of the 10 findings, with file:line refs, effort estimates, and a priority order. No code changes yet — this consolidates the investigation so fixes can be sequenced. - #10 DKIM mismatch: not a code bug; DNS republish + optional startup check - #9 shell injection: pass trigger vars via env, not string substitution - #8 MCP writes: route state mutations through daemon UDS - #7 claude-code hint: print `claude mcp add` command post-install - #4 send config read: move mailbox resolution to daemon side - #2 SPF: plumb envelope MAIL FROM from smtp session through ingest - #5 wildcard send: remove wildcard branch from resolve_from_mailbox - #1 mailbox create: add restart hint (or route via daemon) - #3 plan wording: clarify "compose new" in docs/manual-test.md
Sprint 44 (post-launch security + quick fixes) addresses findings #9, #10, #7, #1-tier-1, #3. Sprint 45 (strict outbound + MCP writes via daemon addresses #4, #5, #8 and introduces UDS MARK-READ/MARK-UNREAD verbs. Sprint 46 (mailbox CRUD via UDS) addresses #1-tier-2 with MAILBOX-CREATE/MAILBOX-DELETE verbs so daemon picks up changes live. PRD FR-18d tightened: outbound send now requires a concrete non-wildcard mailbox under config.domain; catchall is inbound-only. PRD FR-18e added to cover the new state-mutation verbs on the UDS socket. Finding #2 (SPF envelope MAIL FROM) excluded — already shipped in cd22428. EOF )
Implement Sprint 1: core pipeline, ingest, send, mailbox management - Project scaffolding with clap CLI, config.yaml schema, CI workflow - Email ingest pipeline: EML parsing, Markdown+YAML frontmatter output, attachment extraction with deduplication and path traversal protection - Basic email sending via sendmail with RFC 5322 composition - Mailbox CRUD (create/list/delete) with config persistence - 37 tests (32 unit + 5 integration), clippy clean, fmt clean
…lans Code-review-backed fix plans for each of the 10 findings, with file:line refs, effort estimates, and a priority order. No code changes yet — this consolidates the investigation so fixes can be sequenced. - #10 DKIM mismatch: not a code bug; DNS republish + optional startup check - #9 shell injection: pass trigger vars via env, not string substitution - #8 MCP writes: route state mutations through daemon UDS - #7 claude-code hint: print `claude mcp add` command post-install - #4 send config read: move mailbox resolution to daemon side - #2 SPF: plumb envelope MAIL FROM from smtp session through ingest - #5 wildcard send: remove wildcard branch from resolve_from_mailbox - #1 mailbox create: add restart hint (or route via daemon) - #3 plan wording: clarify "compose new" in docs/manual-test.md
* docs: add manual test results with findings Full execution of docs/manual-test.md against agent.zeroshot.lol. 10 findings recorded with severity and fix direction, notably: - P0 DKIM key on disk does not match DNS TXT (root cause of outbound dkim=fail at Gmail) - P0 Shell injection in on_receive cmd template expansion - P1 MCP write ops (email_mark_read, etc.) fail when MCP runs as non-root due to root:root 0644 mailbox files * docs: add Recommended fixes section with per-finding implementation plans Code-review-backed fix plans for each of the 10 findings, with file:line refs, effort estimates, and a priority order. No code changes yet — this consolidates the investigation so fixes can be sequenced. - #10 DKIM mismatch: not a code bug; DNS republish + optional startup check - #9 shell injection: pass trigger vars via env, not string substitution - #8 MCP writes: route state mutations through daemon UDS - #7 claude-code hint: print `claude mcp add` command post-install - #4 send config read: move mailbox resolution to daemon side - #2 SPF: plumb envelope MAIL FROM from smtp session through ingest - #5 wildcard send: remove wildcard branch from resolve_from_mailbox - #1 mailbox create: add restart hint (or route via daemon) - #3 plan wording: clarify "compose new" in docs/manual-test.md
Sprint 44 (post-launch security + quick fixes) addresses findings #9, #10, #7, #1-tier-1, #3. Sprint 45 (strict outbound + MCP writes via daemon addresses #4, #5, #8 and introduces UDS MARK-READ/MARK-UNREAD verbs. Sprint 46 (mailbox CRUD via UDS) addresses #1-tier-2 with MAILBOX-CREATE/MAILBOX-DELETE verbs so daemon picks up changes live. PRD FR-18d tightened: outbound send now requires a concrete non-wildcard mailbox under config.domain; catchall is inbound-only. PRD FR-18e added to cover the new state-mutation verbs on the UDS socket. Finding #2 (SPF envelope MAIL FROM) excluded — already shipped in f5cebd2. EOF )
Cycle 1B follow-ups on the slow-inbound dedup PR. No blocker fixes — the review verdict was "Ready to merge with non-blocker follow-ups." Code changes: - Bump read_frontmatter_id_pair scan window from 4 KiB to 16 KiB so a long References chain plus many cc recipients (common on mailing lists) cannot silently drop a rebuilt cache entry. - Extract the inline async-hook spawn block into spawn_on_receive_async() for readability and a shorter match arm. New tests: - src/smtp/tests.rs::test_smtp_data_returns_fast_with_slow_on_receive_hook: regression test for the primary fix. Registers a 3s on_receive hook, drives an SMTP DATA transaction, asserts 250 OK returns under 1.5s (well below the hook duration) and that the hook eventually fires. Verified to FAIL when HookMode is reverted to Sync. Doc-only changes (no behavior change): - DEDUP_TTL: spell out the RFC-retry-window tradeoff so future readers see the 24h value is a knob, not an absolute. - Dedup gate: inline note that an empty Message-Id bypasses dedup, with a pointer for the why-did-this-hook-fire-N-times debug path. - MaxItemsExceeded warn: explain the regression-to-original-bug shape and that DEDUP_MAX_ITEMS is the lever. - mailbox_locks: document that the dedup cache is per-mailbox by design (different RCPTs land in different caches even on the same Message-Id). - parse_toml_string_value / read_frontmatter_id_pair: flag the hand-rolled TOML-subset limits so a future schema change knows where to look. Deferred per review (each is its own non-blocker): - #1 (verify_auth still blocks 250 OK): out of scope for this PR; its own follow-up. Smaller-blast-radius failure mode than the multi-minute hook this PR fixed.
Summary
clapCLI showing all planned subcommands,config.yamlschema with serde, GitHub Actions CI (test + clippy + fmt), test fixtures for.emlfilesaimx ingest <rcpt>reads raw EML from stdin, parses withmail-parser, produces Markdown with YAML frontmatter (all required fields,read: false), routes to correct mailbox or catchall, names filesYYYY-MM-DD-NNN.md<mailbox>/attachments/, deduplicates filenames, lists attachments in frontmatter with filename/content_type/size/pathaimx send --from --to --subject --bodycomposes RFC 5322 message and hands to sendmail, with trait-abstracted transport for testingaimx mailbox create/list/deletewith directory creation, config persistence, message counts, confirmation prompt, catchall protectionTest plan
cargo clippy -- -D warningspasses cleancargo fmt -- --checkpasses clean