Skip to content

Non-fatal chown fallback for macOS bind mounts (complement to #1307)#1537

Closed
ztech-gthb wants to merge 2 commits into
coleam00:devfrom
ztech-gthb:fix/macos-virtiofs-chown-fallback
Closed

Non-fatal chown fallback for macOS bind mounts (complement to #1307)#1537
ztech-gthb wants to merge 2 commits into
coleam00:devfrom
ztech-gthb:fix/macos-virtiofs-chown-fallback

Conversation

@ztech-gthb

@ztech-gthb ztech-gthb commented May 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: On macOS bind mounts (VirtioFS), the entrypoint's chown -Rh appuser:appuser /.archon fails because the host controls ownership and host UIDs (e.g. 501) don't map to container's appuser (1001). The script then exit 1s and the container crash-loops.
  • Why it matters: Affects every macOS user running Archon via Docker Desktop with the default bind-mount setup. Currently the only workaround is to run Archon outside Docker entirely.
  • What changed: Failed chown is now treated as a warning. Container falls back to running as root, with IS_SANDBOX=1 exported so ClaudeProvider skips its UID-0 safety check (we're still inside Docker — sandboxed in the meaningful sense).
  • What did not change (scope boundary): No changes to safe.directory-loop (already fixed by fix(docker): register safe.directory for all repos on bind-mount restart #1307), Linux behaviour (chown succeeds → unchanged path), Kubernetes/non-root behaviour (id != 0 → unchanged path), or the security model on Linux.

UX Journey

Before

User on macOS                     Container start
─────────────                     ───────────────
docker compose up ──────────────▶ entrypoint: chown -Rh appuser /.archon
                                    │
                                    └─ FAILS (host UID, VirtioFS)
                                       │
                                       └─ "ERROR: Failed to fix ownership"
                                          exit 1
                                  Container crash-loops
container never starts ◀────────  user blocked

After

User on macOS                     Container start
─────────────                     ───────────────
docker compose up ──────────────▶ entrypoint: chown -Rh appuser /.archon
                                    │
                                    └─ FAILS (host UID, VirtioFS)
                                       │
                                       └─ [warning logged]
                                          [export IS_SANDBOX=1]
                                          [RUNNER=""]   ←── run as root
                                  Server starts as root
                                  ClaudeProvider sees IS_SANDBOX=1,
                                  skips UID-0 check
container running ◀──────────────  user can use Archon

Architecture Diagram

Before

docker-entrypoint.sh
        │
        ├─▶ chown /.archon (fail-fatal)
        ├─▶ RUNNER="gosu appuser"
        └─▶ exec $RUNNER bun run setup-auth

After

docker-entrypoint.sh
        │
        ├─▶ chown /.archon (try)
        │      ├─ ok  ──▶ RUNNER="gosu appuser"
        │      └─ fail ──▶ warn + export IS_SANDBOX=1 + RUNNER=""   [+]
        └─▶ exec $RUNNER bun run setup-auth

Connection inventory:

From To Status Notes
docker-entrypoint.sh gosu appuser unchanged (happy path) only when chown succeeds
docker-entrypoint.sh run as root + IS_SANDBOX=1 new only when chown fails
ClaudeProvider IS_SANDBOX env unchanged already reads this var

Label Snapshot

  • Risk: risk: low
  • Size: size: XS
  • Scope: core
  • Module: core:docker-entrypoint

Change Metadata

  • Change type: bug
  • Primary scope: core

Linked Issue

Validation Evidence (required)

# Locally on macOS (Docker Desktop, VirtioFS bind mount):
docker compose down
docker compose build --no-cache
docker compose up
# Container starts, "WARNING: Could not fix ownership..." appears once,
# then server reaches gh_auth.status_ok and serves on :5174.
  • Evidence: container logs show single warning then normal startup. No crash-loop. Web UI reachable. Functional integration test (chat-message → soft-sync) confirmed working in the same session.
  • Skipped commands: bun run type-check, bun run lint, bun run test — touched file is a shell script, not TS; lint-staged was a no-op for .sh.

Security Impact (required)

  • New permissions/capabilities? No — when fallback fires, the process runs as root inside the container, but it would have run as root before too (the difference is exit 1 vs continuing).
  • New external network calls? No.
  • Secrets/tokens handling changed? No.
  • File system access scope changed? Nochown itself is unchanged; only the failure handling differs.
  • Security note: running as root is undesirable but not a regression — the alternative is "container does not start", which strictly speaking is more secure but also unusable. IS_SANDBOX=1 mirrors how Claude SDK itself handles known-sandboxed contexts; the container is still isolated by Docker.

Compatibility / Migration

  • Backward compatible? Yes — Linux setups where chown succeeds are unchanged. Only the previously-fatal failure path is altered.
  • Config/env changes? Yes (passive): IS_SANDBOX=1 is exported in the fallback path. Users who already set IS_SANDBOX explicitly are not overridden if chown succeeds.
  • Database migration needed? No.

Human Verification (required)

  • Verified scenarios:
  • Edge cases checked: chown-success path (Linux Docker host emulator) still goes through the appuser branch; verified by inspection of the conditional, no code path duplication.
  • What was not verified: Kubernetes / --user-flag invocation (the else branch is unchanged structurally; behaviour-equivalent to pre-PR).

Side Effects / Blast Radius (required)

  • Affected subsystems: container startup only.
  • Potential unintended effects: a Linux user who intentionally configured chown to fail (highly unusual) would now silently get root-execution instead of exit. Mitigated by the explicit WARNING: log line.
  • Guardrails: warning is logged on stderr, visible in docker compose logs. IS_SANDBOX=1 is observable in process env.

Rollback Plan (required)

  • Fast rollback: revert this single commit (git revert <sha>). One-file change, no DB/state to migrate.
  • Feature flags: none.
  • Observable failure symptoms: macOS container would resume crash-looping with the original ERROR: Failed to fix ownership message — same symptom as before this PR.

Risks and Mitigations

  • Risk: A misconfigured Linux host where chown fails for an unrelated reason (filesystem error, capability missing) would now log a warning and run as root instead of failing fast.
    • Mitigation: Warning is loud and explicit. Discovery via logs is easy. The previous behaviour (exit 1) was equally hostile to debugging in that case (no remediation hint either).
  • Risk: IS_SANDBOX=1 skips a ClaudeProvider UID safety check that exists for a reason.
    • Mitigation: The check is for "user is running Claude as root on a host system" — that's not what's happening here (we're inside a container). The flag is purpose-named for exactly this pattern.

Summary by CodeRabbit

  • Bug Fixes
    • Container startup no longer always fails if permission adjustment fails during initialization; it now logs a warning and can continue in restricted environments.
    • Continuing as root requires explicit opt-in via ARCHON_ALLOW_ROOT_FALLBACK=1; without that the container will still exit with an error.

Complements coleam00#1307. On macOS Docker Desktop with VirtioFS bind  mounts, host UIDs (e.g. 501) don't map to appuser (1001) and chown fails — the script then exit 1s and the container crash-loops. Now treats chown failure as a warning, falls  back to running as root, and exports IS_SANDBOX=1 so ClaudeProvider skips its UID-0 check (we're still sandboxed by Docker).
@coderabbitai

coderabbitai Bot commented May 2, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

The entrypoint now attempts chown -Rh appuser:appuser /.archon when run as root and only sets RUNNER="gosu appuser" if that chown succeeds. On failure it logs the captured error and either exits or continues as root only when ARCHON_ALLOW_ROOT_FALLBACK=1, exporting IS_SANDBOX=1.

Changes

Root-Privilege Fallback Handling

Layer / File(s) Summary
Error Handling & Startup Flow
docker-entrypoint.sh
Attempt chown -Rh appuser:appuser /.archon and capture error output when running as root.
Privilege-Drop Decision
docker-entrypoint.sh
Set RUNNER="gosu appuser" only if chown succeeds; on failure, require ARCHON_ALLOW_ROOT_FALLBACK=1 to continue as root.
Logging & Environment
docker-entrypoint.sh
On chown failure log the captured error, export IS_SANDBOX=1 and leave RUNNER empty when opt-in is set; otherwise log error and exit 1.
Subsequent Startup Steps
docker-entrypoint.sh
Existing git safe-directory registration, credential helper config, CLAUDE pinning, and bun run setup-auth / bun run start remain unchanged and follow the updated RUNNER decision.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hopped inside the container's den,
Found chown that tripped now and then.
A gentle warning, sandboxed tune,
I nibble logs beneath the moon.
Startup hums — no crash at ten.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: a non-fatal chown fallback for macOS bind mounts, with a note about complementing PR #1307.
Description check ✅ Passed The description is comprehensive and thorough, covering all major template sections including problem statement, UX journey, architecture diagrams, validation evidence, security impact, compatibility, human verification, side effects, and rollback plan.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch fix/macos-virtiofs-chown-fallback

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker-entrypoint.sh`:
- Line 17: The chown invocation currently silences stderr with `2>/dev/null`,
losing the real OS error; change the `if chown -Rh appuser:appuser /.archon
2>/dev/null; then` pattern to capture stderr into a variable (e.g.
`output=$(chown -Rh appuser:appuser /.archon 2>&1)`) and use the command's exit
status to decide the branch, then include that captured `output` in the warning
message so the exact chown error (like "Operation not permitted" or "Read-only
file system") is logged.
- Around line 17-23: The script currently unconditionally exports IS_SANDBOX=1
on any chown failure, silently bypassing the root-safety guard in
ClaudeProvider; change this so IS_SANDBOX is only set when the user explicitly
opted in (e.g., check if an opt-in env var is already set like IS_SANDBOX=1 or a
new explicit flag SANDBOX_FORCE=1) instead of auto-exporting it on failure.
Update the chown block around the chown -Rh appuser:appuser /.archon check and
the RUNNER assignment so that on failure you print the warning and leave
IS_SANDBOX alone unless the opt-in env var is present; if you need to support
macOS VirtioFS convenience, document/require users to set the opt-in variable
before starting the container rather than assigning IS_SANDBOX=1 automatically.
Ensure references to RUNNER and IS_SANDBOX in docker-entrypoint.sh remain
consistent with this opt-in behavior so ClaudeProvider's getProcessUid() guard
is not silently bypassed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d93421b6-7d5c-48a9-9788-420afb96ef49

📥 Commits

Reviewing files that changed from the base of the PR and between 69b2c89 and 900270d.

📒 Files selected for processing (1)
  • docker-entrypoint.sh

Comment thread docker-entrypoint.sh Outdated
Comment thread docker-entrypoint.sh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docker-entrypoint.sh (1)

27-33: Consider documenting ARCHON_ALLOW_ROOT_FALLBACK=1 in user-facing docs.

The new env var only appears in the exit 1 error message. Users encountering the failure for the first time on macOS won't know to set it unless they read the error carefully. A one-liner in the README (or the docker-compose.yml example as a commented-out variable) would make the macOS path discoverable without digging through container logs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-entrypoint.sh` around lines 27 - 33, Add a short user-facing note
about ARCHON_ALLOW_ROOT_FALLBACK=1 so macOS users can discover the workaround
without reading container logs: update the README and the example
docker-compose.yml to mention ARCHON_ALLOW_ROOT_FALLBACK (what it does, when to
set it) and show a commented-out example line; also briefly document the
resulting IS_SANDBOX behavior and security implications so users understand that
setting ARCHON_ALLOW_ROOT_FALLBACK exports IS_SANDBOX=1 and allows the
entrypoint to continue as root.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docker-entrypoint.sh`:
- Around line 27-33: Add a short user-facing note about
ARCHON_ALLOW_ROOT_FALLBACK=1 so macOS users can discover the workaround without
reading container logs: update the README and the example docker-compose.yml to
mention ARCHON_ALLOW_ROOT_FALLBACK (what it does, when to set it) and show a
commented-out example line; also briefly document the resulting IS_SANDBOX
behavior and security implications so users understand that setting
ARCHON_ALLOW_ROOT_FALLBACK exports IS_SANDBOX=1 and allows the entrypoint to
continue as root.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d6e0b1f9-a316-4351-9bfc-e81c84b54667

📥 Commits

Reviewing files that changed from the base of the PR and between 900270d and 316d261.

📒 Files selected for processing (1)
  • docker-entrypoint.sh

@ztech-gthb

Copy link
Copy Markdown
Contributor Author

Superseded by [follow-up PR — number to fill] — #1518 changed the chown block in incompatible ways and removed the silent IS_SANDBOX-fallback this PR introduced. The follow-up adds an explicit ARCHON_ALLOW_ROOT_FALLBACK opt-in across both /.archon and /home/appuser chowns, which is now the cleaner shape on top of #1518's base.

@ztech-gthb

Copy link
Copy Markdown
Contributor Author

Superseded by PR #1573 — opt-in form on top of #1518's base. Functionally same intent (let macOS users start the container), structurally re-shaped: chown errors now captured into the warning, root-fallback gated on explicit ARCHON_ALLOW_ROOT_FALLBACK=1 instead of auto-ser IS_SANDBOX=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant