Skip to content

feat: add --port-check-only flag to install.sh#185

Merged
uzyn merged 2 commits into
mainfrom
feat/install-port-check-only
May 1, 2026
Merged

feat: add --port-check-only flag to install.sh#185
uzyn merged 2 commits into
mainfrom
feat/install-port-check-only

Conversation

@uzyn

@uzyn uzyn commented Apr 30, 2026

Copy link
Copy Markdown
Owner

Summary

Add --port-check-only flag to install.sh that runs the same outbound + inbound port-25 checks as aimx portcheck, then exits without installing. Lets evaluators verify a VPS can reach SMTP before committing to install/setup.

One-liner:

curl -fsSL https://aimx.email/install.sh | sh -s -- --port-check-only

Requirements

  • --port-check-only flag short-circuits before install path
  • --verify-host <URL> flag + AIMX_VERIFY_HOST env var override default https://check.aimx.email
  • --help text lists new flag, env var, and a curl-pipe example
  • Outbound EHLO via TCP+EHLO, mirroring RealNetworkOps::check_outbound_port25 (src/setup.rs:829-876)
  • Outbound implementation ladder: python3 → nc → bash /dev/tcp; exit 2 with hint when none available
  • Inbound /probe call via curl with loose "reachable":true substring match (src/setup.rs:774-787)
  • Non-root → inbound [skip] (does NOT fail exit code); root + occupied → call /probe directly; root + free → spawn temp Python listener; root + free + no python3 → [skip]
  • Banner switches to "aimx port 25 connectivity check (no install will be performed)" when --port-check-only set
  • Output styling reuses ui_info / ui_success / ui_warn / ui_error helpers
  • Exit codes: 0 pass-or-skip, 1 fail, 2 missing required tool
  • Cannot trigger any install side-effect (branches before ensure_sudo / network / fs writes)
  • POSIX sh, sh -n install.sh passes
  • tests/install_sh.sh extended with parse_args + verify-host validation cases
  • Existing AIMX_DRY_RUN=1 behavior unchanged

Out of scope

  • No new website asset (no chkport25.sh); folded into install.sh by explicit decision
  • No changes to the aimx portcheck Rust subcommand
  • No authenticated /probe calls or rate-limit handling
  • Best-effort dual-stack handling for IPv6-only environments

Technical approach

  • New top-level state vars PORT_CHECK_ONLY=0, VERIFY_HOST="", DEFAULT_VERIFY_HOST="https://check.aimx.email".
  • parse_args adds --port-check-only, --verify-host, and --verify-host=<URL> cases.
  • main() resolves env-var precedence (flag → AIMX_VERIFY_HOST → default), validates the scheme, then short-circuits to port_check_main before any privileged or networked work.
  • Outbound EHLO mirrors check_outbound_port25 byte-for-byte (220 banner, EHLO aimx, consume 250- continuations, accept on 250 SP, QUIT).
  • Temp listener is Python-only (matches with_temp_smtp_listener in src/portcheck.rs:69-129); nc -l semantics vary too widely between BSD/GNU/ncat to support reliably.
  • /probe parsing is loose "reachable":true substring match (same as src/setup.rs:783) plus a sed extraction for the ip field.

Test plan

  • sh -n install.sh (POSIX syntax)
  • sh tests/install_sh.sh (existing + new assertions all pass)
  • sh install.sh --port-check-only --help exits 0
  • sh install.sh --help | grep -q port-check-only exits 0
  • AIMX_DRY_RUN=1 sh install.sh still produces the existing dry-run output (no port check)
  • Manual smoke (operator): sh install.sh --port-check-only non-root → outbound runs, inbound [skip]; sudo sh install.sh --port-check-only on a port-25-open VPS → both [ok]
  • Site CI gate: .github/workflows/site.yml:51 already runs sh -n website/dist/install.sh after make build; no change needed there

Notes

  • Honors HTTPS-only trust anchor for the verify-host when scheme is https; allows http only for self-hosted dev verifiers (matches validate_verify_host in src/setup.rs:790-801).
  • Listener lifecycle: spawned with &, PID captured, killed after /probe returns; SO_REUSEADDR set so re-runs don't TIME_WAIT-block.

Run the same outbound + inbound port-25 checks as `aimx portcheck`,
then exit without installing. Lets evaluators verify a VPS can reach
SMTP before committing to install/setup:

  curl -fsSL https://aimx.email/install.sh | sh -s -- --port-check-only

- New --port-check-only / --verify-host flags + AIMX_VERIFY_HOST env
- Outbound EHLO ladder: python3 -> nc -> bash /dev/tcp
- Inbound /probe via curl with loose "reachable":true substring match
- Non-root inbound -> [skip] (does not affect exit code)
- Root + free port -> spawn temp Python SMTP listener (mirrors
  src/portcheck.rs:69-129)
- Banner switches to "aimx port 25 connectivity check"
- Exits 0/1/2 (pass-or-skip / fail / missing tool)
- Branches in main() before ensure_sudo / network / fs writes
- POSIX sh; sh -n passes; tests/install_sh.sh extended with parse_args
  + verify-host validation + banner/help assertions
- AIMX_DRY_RUN=1 unchanged

@uzyn uzyn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standalone PR Review — PR #185 feat: add --port-check-only flag to install.sh

Changes Overview

Adds a --port-check-only mode to install.sh that runs the same outbound + inbound port-25 connectivity checks as the Rust aimx portcheck subcommand, then exits without installing anything. Adds a --verify-host flag and AIMX_VERIFY_HOST env var (default https://check.aimx.email). Outbound EHLO uses a python3 → nc → bash /dev/tcp ladder; inbound uses a curl GET against ${VERIFY_HOST}/probe with a loose "reachable":true substring match. Non-root operators get [skip] for inbound (does not affect exit code). tests/install_sh.sh is extended with parse_args, banner, validate_verify_host, and derive_smtp_host assertions.

Scope Alignment

The implementation matches the PR description well. All 13 requirement boxes are checked off, the short-circuit lives in main() before ensure_sudo/resolve_sudo_prefix/network/fs writes, the exit-code contract (0/1/2) is implemented, the banner switch is in place, and AIMX_DRY_RUN=1 (without --port-check-only) is verified to leave existing behavior unchanged. Out-of-scope items (no separate chkport25.sh, no IPv6, no auth) are honored. POSIX-sh compliance verified via sh -n install.sh.

Potential Bugs

1. Listener bind race not surfaced — Non-blocker
port_check_listener_start (lines 721-761) backgrounds python3 - <<'PYEOF' & and immediately captures $!. If s.bind(('0.0.0.0', 25)) fails (race with another binder, EACCES on some configs, etc.), the python child exits 1 immediately. The script then sleep 1 and calls /probe against an unreachable port — the verifier reports unreachable and the operator sees a generic "verifier reported port 25 unreachable" message instead of "tried to spawn a temp listener but it never bound". port_check_listener_stop is then a no-op (kill -0 returns false), so cleanup is OK but the diagnostic is poor. The Rust equivalent in src/portcheck.rs:44-53 does a synchronous bind and surfaces an explicit error before continuing. Worth a short post-spawn liveness probe (kill -0 ${_pid} after sleep 1) with a clearer message when the listener died.

2. Listener loop exits on first idle window — Non-blocker
port_check_listener_start's python loop (lines 731-739) breaks the entire while time.time() < deadline loop on the first socket.timeout from accept(). After serving one connection, if no second connection arrives within the remaining ~25 sec timeout, the listener exits. In practice /probe only does one EHLO probe so this works, but the Rust equivalent (src/portcheck.rs:73-82) keeps looping until the shutdown signal — a stricter parity claim than what the script delivers. If the verifier ever retries (e.g., one TCP probe + one EHLO probe), the second call hits a closed socket. Confirm /probe is single-attempt before merge or change the Python break on socket.timeout to continue (with the outer deadline check still bounding the loop).

3. Listener bind delay = fixed sleep 1 — Non-blocker
On slow VPS / cold cache, python3 -c '...' may not bind within one second. There is no readiness probe (e.g., a quick ss -tln | grep ':25' poll loop). Best-effort, but on a slow first-run the inbound check can flap. Acceptable for the v1 of this feature; document or harden later.

4. Occupancy detection ignores who's holding port 25 — Non-blocker, but a UX gap
port_check_detect_occupancy returns "occupied" when ss/netstat sees something on :25. The script then assumes "Existing daemon will reply to /probe" (line 808). If the operator already has Postfix/Sendmail/Exim listening, /probe will succeed (Postfix replies 220 to the verifier's EHLO) and the script reports [ok], suggesting aimx will install cleanly. The Rust aimx portcheck distinguishes this case (Port25Status::OtherProcess(name) → error with the process name). The shell script doesn't, so an operator can pass --port-check-only on a Postfix-managed VPS and walk away thinking they're ready, then have aimx setup fail later on the same port. The PR description doesn't promise this parity, so it's a non-blocker, but a one-line warning when occupancy is "occupied" would close the gap (e.g., "port 25 is held by another process; verify it's aimx before running setup").

5. Missing curl exits with code 1 instead of 2 — Non-blocker
port_check_main calls need curl (line 845), which exits 1 via err. The PR description's exit-code contract says 2 is for "missing required tool". curl is a required tool here. Minor inconsistency with the documented contract.

6. validate_verify_host runs unconditionally on every install — Non-blocker, mild UX surprise
main() runs resolve_verify_host + validate_verify_host (lines 1031-1032) before the --port-check-only short-circuit. A regular install with AIMX_VERIFY_HOST=garbage exported in the operator's shell will now fail with "verify-host must start with http:// or https://" before any install side effect — even though the env var is documented as "Verifier base URL for --port-check-only" only. Either gate the validate call behind ${PORT_CHECK_ONLY}=1, or extend the help text to note the env var also gates the install path. Mild, but worth a thought.

7. port_check_outbound_nc may hang on some nc variants — Non-blocker, comment correctly admits the constraint
The fallback explicitly avoids -q/-N/-c for portability. On some nc implementations (BusyBox, certain Solaris-derived BSD-nc) the connection won't close after stdin EOF, causing nc to wait for the server to close the TCP half. With nc -w 5 this caps at 5 seconds, so worst-case an extra 5-second wait per fallback invocation. Acceptable; the comment in source already flags this.

Security Issues

None. The script is pure read/probe behavior; no fs writes in port_check_main, no privileged ops, no shell injection vectors (verify-host is parameter-substituted into curl args, never eval'd; the python heredoc passes host/port as sys.argv not via string interpolation; bash heredoc passes them as positional args). The HTTPS-only trust anchor is honored when scheme is https; http is permitted only as documented.

Test Coverage

Covered (good):

  • parse_args for --port-check-only, --verify-host (space + equals form), and missing-value error path
  • validate_verify_host rejects ftp://
  • resolve_verify_host precedence: flag > env > default
  • Default DEFAULT_VERIFY_HOST value
  • --help mentions new flag, env var
  • --port-check-only --help exits 0 and prints help
  • derive_smtp_host strips scheme, path, port
  • print_port_check_banner prints correct title and "no install" notice
  • Negative cases: --tag/--to does not set PORT_CHECK_ONLY; AIMX_DRY_RUN=1 (without --port-check-only) does not switch to port-check banner

Not covered (explicitly out of scope, but worth noting):

  • port_check_outbound_python/nc/bash — no fake-server harness; correctness relies on manual smoke
  • port_check_inbound orchestration matrix (non-root → skip, root + occupied → /probe, root + free → spawn listener)
  • port_check_main exit codes 0/1/2 for the actual end-to-end paths
  • The "no outbound tool on PATH" branch returning 2 with the hint message
  • IP extraction from /probe body (sed parser)

These would require either a fake SMTP server fixture or a mock /probe endpoint — fair to defer, but the manual smoke note in the test plan is the only safety net for the network-touching branches.

Code Quality

Non-blocker — the _inbound_ok variable mixes numeric ("1", "0") and string ("skip") sentinel values. Using [ "${_inbound_ok}" = "1" ] (string compare) for the success path is fine, but the comparison style is asymmetric with _outbound_ok which uses -eq (numeric). Consider a single string-typed _inbound_state of pass | fail | skip for symmetry; or normalize both to strings.

Non-blocker — duplication between port_check_outbound_python and port_check_listener_start Python heredocs. Both implement near-identical SMTP state machines (220 banner, 250 EHLO/HELO, 502 on unknown). Each is small and the duplication is between roles (client vs server), so this is fine — flag for awareness only.

Documentation

Non-blocker — user-facing docs not updated. Per CLAUDE.md: "When making changes that affect CLI behavior, setup output, or MCP tools, update the corresponding guide files too." The new --port-check-only flag is a new public-facing entrypoint (the curl one-liner is in the PR summary), but:

  • website/public/index.html — still only shows curl -fsSL https://aimx.email/install.sh | sh. The new pre-install verification one-liner could be surfaced near the existing "You need sudo — port 25 is privileged" sentence.
  • book/ — no mention of --port-check-only in any chapter (installation, troubleshooting, getting-started). The installation chapter is the natural home.

This is the most actionable follow-up. Either fold a one-paragraph mention into this PR or open an explicit follow-up note in the description.

Summary and Recommended Actions

Overall verdict: Needs minor fixes

The implementation is clean, the scope is met, the test coverage of pure-shell helpers is solid, and the security posture is sound. Bugs identified are non-blockers but accumulate in three areas: listener-lifecycle robustness (#1, #2, #3), parity with the Rust portcheck's port-conflict warning (#4), and documentation (the docs gap). None of these can crash, corrupt data, or break the install path — they're operator-experience refinements.

Blockers (must fix before merge): none.

Non-blockers (should fix but not blocking):

  • (#1) Surface listener-bind failures explicitly instead of silently relying on /probe to fail.
  • (#2) Confirm /probe is single-attempt — if it ever retries, the listener loop's break-on-first-timeout will drop the second connection.
  • (#4) Warn when port 25 occupancy is "occupied" by a non-aimx process, matching aimx portcheck's Port25Status::OtherProcess handling.
  • (#5) Use exit code 2 (not 1) when curl is missing, to match the documented contract.
  • (#6) Either gate validate_verify_host behind PORT_CHECK_ONLY=1 or document that AIMX_VERIFY_HOST affects the install path too.
  • Update website/public/index.html and book/ to mention the new --port-check-only flag.

Nice-to-haves (low priority):

  • Replace fixed sleep 1 with a short readiness poll on :25.
  • Normalize _outbound_ok/_inbound_ok to a single string-state convention.
  • Add an integration-style test that fakes a /probe endpoint (e.g., python -m http.server with a stub) to exercise the inbound state matrix.

- listener: continue (not break) on socket.timeout so a second probe
  attempt within the deadline still gets served. Mirrors the
  shutdown-bound loop in src/portcheck.rs:73-82.
- listener: liveness probe (kill -0 ${_listener_pid}) after the bind
  delay surfaces a distinct "failed to spawn temp listener on :25"
  error instead of letting /probe report a generic "unreachable".
- occupancy: warn the operator when port 25 is held by another
  process — /probe can't distinguish aimx from Postfix/Exim, so a
  green [ok] against an existing daemon was misleading. Mirrors
  Port25Status::OtherProcess in src/portcheck.rs.
- missing curl now exits 2 (was 1) per the documented contract:
  0 pass-or-skip, 1 fail, 2 missing required tool. Replaced
  `need curl` in port_check_main with an explicit check that exits
  the right code.
- AIMX_VERIFY_HOST is now resolved + validated only when
  --port-check-only is set. A garbage value in the operator's shell
  no longer fails regular installs (the env var was never relevant
  to the install path).
- docs: surface --port-check-only in website/public/index.html (Quick
  start), add a Pre-install subsection in book/getting-started.md,
  and cross-reference from book/setup.md and book/troubleshooting.md.
- tests: 14 new assertions covering env-var gating, listener loop
  shape, listener liveness probe, occupancy warning, and
  missing-curl exit code (both source-grep and live PATH-scrub).
@uzyn

uzyn commented Apr 30, 2026

Copy link
Copy Markdown
Owner Author

Cycle 2 — review feedback addressed

Pushed 4cc788f addressing the 6 non-blockers flagged in the review.

Fixed in code

  • ([Sprint 1] Core Pipeline + Idea Validation #1) Listener bind failures silentport_check_inbound now does a kill -0 ${_listener_pid} liveness probe after the bind delay and surfaces a distinct failed to spawn temp listener on :25 (bind failed; another binder won the race?) message instead of letting /probe report a generic "unreachable".
  • ([Sprint 2] DKIM + Production-Quality Outbound #2) Listener loop break-on-first-timeout — Changed break on socket.timeout to continue so the deadline still bounds the loop while a second probe attempt within the deadline gets served. Mirrors src/portcheck.rs:73-82.
  • ([Sprint 3] MCP Server #4) Occupancy detection silently treats foreign processes as aimx — Added a ui_warn line ("port 25 is held by another process; verify it's aimx before running setup") when occupancy is occupied. Inbound check still runs so the operator sees the EHLO result, but the warning makes clear a green [ok] is only meaningful if the holder is aimx.
  • ([Sprint 4] Channel Manager + Inbound Trust #5) need curl exits 1 instead of 2 — Replaced need curl in port_check_main with an explicit command -v curl check that exits 2 with required command not found: curl, matching the documented contract.
  • ([Sprint 5] Setup Wizard #6) validate_verify_host ran unconditionally — Moved resolve_verify_host + validate_verify_host inside the PORT_CHECK_ONLY=1 short-circuit. A stray AIMX_VERIFY_HOST=garbage in the operator's shell no longer breaks regular installs. Verified: AIMX_VERIFY_HOST=garbage AIMX_DRY_RUN=1 sh install.sh exits 0; AIMX_VERIFY_HOST=garbage sh install.sh --port-check-only still rejects with the scheme error.
  • Docs — Added the --port-check-only curl one-liner to website/public/index.html (Quick start section), a new "Pre-install: check port 25" subsection in book/getting-started.md, plus cross-references in book/setup.md and book/troubleshooting.md.

Tests

tests/install_sh.sh gained 14 new assertions across four sections (#7d–#7h):

  • AIMX_VERIFY_HOST gating: regular install with AIMX_VERIFY_HOST=garbage exits 0; --port-check-only with the same env still fails validate_verify_host.
  • Listener loop shape (source-grep): continue on socket.timeout, no break in the timeout handler.
  • Listener liveness probe (source-grep): kill -0 ${_listener_pid} present, distinct bind-failed message present.
  • Occupancy warning (source-grep): warning string present.
  • Missing-curl exit code: source-grep proves port_check_main no longer calls need curl and uses an explicit check; a live PATH-scrub test runs --port-check-only with curl absent and asserts rc=2 plus "curl" in the error.

84/86 assertions pass. The 2 fails are the pre-existing shellcheck SC2024 finding on the sudo -v </dev/tty redirect (unchanged by this PR; baseline ran the same way before my changes).

Intentionally left as-is (deferred per task brief)

These three were called out in the review as nice-to-haves and the task brief listed them as deferrable:

  1. Replace fixed sleep 1 with a readiness poll on :25 — kept as-is; the new liveness probe (kill -0) catches the most common bind failure mode that the readiness poll would also catch, and a true readiness poll adds complexity (ss/netstat parsing) that would need its own portability matrix. Worth revisiting if reports of slow-VPS flakiness come in.
  2. Normalize _outbound_ok (numeric) / _inbound_ok (mixed) to one string convention — pure cosmetic, no behavior change. Skipped to keep this cycle focused on review-flagged issues.
  3. Integration-style tests with a fake /probe endpoint — would require a Python http-server fixture and deterministic teardown; the new live PATH-scrub test exercises one end-to-end branch (curl-missing → exit 2) without that scaffolding. Larger fixture is a follow-up.

Validation run locally

  • sh -n install.sh — POSIX OK
  • sh tests/install_sh.sh — 84/86 pass (2 pre-existing shellcheck fails)
  • sh install.sh --port-check-only --help — exit 0
  • sh install.sh --help | grep -q port-check-only — exit 0
  • AIMX_DRY_RUN=1 sh install.sh — unchanged dry-run output
  • AIMX_VERIFY_HOST=garbage AIMX_DRY_RUN=1 sh install.sh — exit 0 (regression check for [Sprint 5] Setup Wizard #6)
  • AIMX_VERIFY_HOST=garbage sh install.sh --port-check-only — exits 1 with scheme error
  • cargo fmt --check and cargo clippy -- -D warnings — clean

@uzyn uzyn left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cycle 2 re-review — APPROVED

The implementer pushed 4cc788f addressing all 6 non-blockers from cycle 1. I verified each fix in the diff and ran the test suite locally (84/86 pass; the 2 failures are the pre-existing shellcheck SC2024 finding on sudo -v </dev/tty at install.sh:251, present on main and untouched by this PR).

Resolved issues

(#1) Listener bind failures silent — RESOLVED. port_check_inbound now does if ! kill -0 "${_listener_pid}" 2>/dev/null after the bind delay and surfaces a distinct failed to spawn temp listener on :25 (bind failed; another binder won the race?) message. _listener_pid="" is set so the cleanup path correctly skips a no-op kill. Asserted by source-grep tests.

(#2) Listener loop break-on-first-timeout — RESOLVED. break on socket.timeout is now continue. The outer while time.time() < deadline still bounds the loop, and the inner except Exception: break for non-timeout errors is preserved. Two assertions confirm both the presence of continue and the absence of break in the timeout handler.

(#4) Occupancy detection silently treats foreign processes as aimx — RESOLVED. ui_warn "port 25 is held by another process; verify it's aimx before running setup" fires in the occupied branch before /probe runs. The warning ordering is correct (emitted before the inbound result line), though the warning interrupts the partial ' Inbound port 25 .... ' line — a minor cosmetic glitch, not a blocker.

(#5) need curl exits 1 instead of 2 — RESOLVED. Replaced with explicit command -v curl >/dev/null 2>&1 || return 2 and a clear error message. Verified by both source-grep (no need curl in port_check_main) and a live PATH-scrub test that asserts rc=2.

(#6) validate_verify_host ran unconditionally — RESOLVED. resolve_verify_host + validate_verify_host moved inside the if [ "${PORT_CHECK_ONLY}" = "1" ] short-circuit. Verified locally:

  • AIMX_VERIFY_HOST=garbage AIMX_DRY_RUN=1 sh install.sh --target x86_64-unknown-linux-gnu --tag 0.1.0 → exit 0 (regression check).
  • AIMX_VERIFY_HOST=garbage sh install.sh --port-check-only → exit 1 with scheme error.

(#7) Docs not updated — RESOLVED.

  • website/public/index.html Quick start now shows the --port-check-only curl one-liner alongside the standard install command.
  • book/getting-started.md has a new "Pre-install: check port 25" subsection covering the one-liner, sudo behavior, exit codes, and --verify-host / AIMX_VERIFY_HOST.
  • book/setup.md and book/troubleshooting.md cross-reference the pre-install check.

Still unresolved

None.

New issues from cycle-2 fixes

None. I checked specifically for:

  • Cleanup leak when bind-liveness fails — guarded by [ -n "${_listener_pid}" ], no spurious kill.
  • Warning ordering — ui_warn interrupts the partial ' Inbound port 25 .... ' line; cosmetic only, not a blocker.
  • Continue-on-timeout regression — the inner except Exception: break preserves the original failure exit for non-timeout errors.
  • Test PATH isolation — the live PATH-scrub test correctly seeds iso-bin-no-curl with the minimum tool set; no false-positive risk.

Explicitly deferred (acknowledged as non-blockers, not required for merge)

  • Fixed sleep 1 → readiness poll on :25. The new kill -0 liveness probe catches the most common bind-failure mode that a readiness poll would address; harden later if slow-VPS reports come in.
  • Normalize _outbound_ok (numeric) / _inbound_ok (mixed) sentinel conventions. Pure cosmetic, no behavior change.
  • Integration tests with a fake /probe endpoint. The live PATH-scrub test exercises one end-to-end branch (curl-missing → exit 2) without that scaffolding; broader fixture is a follow-up.

Local validation run

  • sh -n install.sh — POSIX OK
  • sh tests/install_sh.sh — 84/86 (pre-existing SC2024 baseline failures only)
  • sh install.sh --port-check-only --help — exit 0
  • AIMX_VERIFY_HOST=garbage AIMX_DRY_RUN=1 sh install.sh ... — exit 0 (#6 regression check)
  • AIMX_VERIFY_HOST=garbage sh install.sh --port-check-only — exit 1 with scheme error

Verdict: Ready to merge

All cycle-1 non-blockers are resolved with focused, minimal changes; no new bugs introduced; tests cover the new behavior; documentation surfaces the new entrypoint in the right places.

Recommended merge commit message

feat(install.sh): add --port-check-only flag

Run the same outbound + inbound port-25 connectivity checks as
`aimx portcheck`, then exit without installing. Lets evaluators
verify a VPS can reach SMTP before committing to install/setup:

  curl -fsSL https://aimx.email/install.sh | sh -s -- --port-check-only

- New --port-check-only / --verify-host flags + AIMX_VERIFY_HOST env
  (verify-host validation gated behind --port-check-only so a stray
  env var in the operator's shell doesn't block regular installs).
- Outbound EHLO ladder: python3 -> nc -> bash /dev/tcp.
- Inbound /probe via curl with loose "reachable":true substring match.
- Non-root inbound -> [skip] (does not affect exit code).
- Root + free port -> spawn temp Python SMTP listener with kill -0
  liveness probe surfacing distinct bind-failed error (mirrors
  src/portcheck.rs:44-53).
- Listener loop continues on socket.timeout so the deadline bounds
  the loop while a second probe within the window is served.
- Root + occupied -> ui_warn + /probe (warns the green [ok] is only
  meaningful if the holder is aimx, mirroring Port25Status::OtherProcess).
- Banner switches to "aimx port 25 connectivity check".
- Exits 0/1/2 (pass-or-skip / fail / missing tool); explicit curl
  check returns 2 per the documented contract.
- Branches in main() before ensure_sudo / network / fs writes.
- POSIX sh; sh -n passes; tests/install_sh.sh extended with parse_args
  + verify-host validation + banner/help + listener loop shape +
  liveness probe + occupancy warning + missing-curl-exits-2 assertions.
- Docs: --port-check-only surfaced on website Quick start, new
  Pre-install subsection in book/getting-started.md, cross-refs in
  book/setup.md and book/troubleshooting.md.
- AIMX_DRY_RUN=1 unchanged.

@uzyn uzyn merged commit c79865c into main May 1, 2026
10 checks passed
@uzyn uzyn deleted the feat/install-port-check-only branch May 1, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant