fix(launch): reclaim stopped clean-exit container slots on name claim by donbeave · Pull Request #184 · jackin-project/jackin

donbeave · 2026-04-26T15:23:14Z

Root cause

claim_container_name called list_managed_agent_names, which runs docker ps -a — including stopped containers. Because docker run intentionally omits --rm (see below), containers persist after exit as stopped entries. These two facts combined to permanently block every name slot that had ever been used: the next jackin load climbed to clone-2, clone-3, etc., each with a brand-new empty state directory. The user had to re-authenticate with gh auth login on every reload.

Why `--rm` was not the fix

Adding --rm to docker run would auto-remove containers on exit, including crashed ones. That would:

Destroy logs and the container filesystem needed to diagnose crashes
Make jackin hardline crash-restart impossible — it restarts the existing stopped container in place; if Docker already removed it, there is nothing to restart

--rm is intentionally absent. The doc comment added to the docker run block in this PR makes that explicit.

The three-way decision

Instead of a blanket scan, claim_container_name now calls inspect_container_state per candidate:

State	Action
Running	Skip — active session owns this slot
Stopped / exit 0, not OOM-killed	`docker rm` best-effort, then reclaim the slot
Stopped / non-zero exit or OOM-killed	Skip — `jackin hardline` territory
NotFound	Claim the slot as before

Why state survives

The state directory (~/.jackin/data/<name>/) lives on disk independently of the Docker container. It holds .config/gh/ with the gh credentials. Reclaiming a slot mounts the same directory into the new container — credentials are intact, no re-auth needed.

The name-to-state-dir invariant

Each unique container name maps permanently to one state directory. This fix preserves that invariant exactly: it does not reassign names or move state directories. It only allows a slot to be reused when the previous container exited cleanly and has been removed.

Tests added

Five new unit tests for claim_container_name:

NotFound → claimed directly, no docker rm
Running → skipped, next slot (clone-1) claimed
Stopped / exit 0 → docker rm issued, same slot reclaimed
Stopped / non-zero exit → skipped, next slot claimed
Mixed: slot 0 crashed, slot 1 clean-exit → slot 1 reclaimed after rm, not slot 2

🤖 Generated with Claude Code

Replace the blanket `list_managed_agent_names` scan in `claim_container_name` with per-candidate `inspect_container_state` checks. Stopped containers with exit 0 and no OOM flag are removed best-effort, then their slot (and the state dir containing gh credentials) is reclaimed. Running containers and crashed/OOM-killed containers are skipped: the latter are preserved so `jackin hardline` can restart them in place. Also adds a doc comment on the `docker run` block explaining why `--rm` is intentionally absent: crashed containers need to persist for `jackin hardline` crash recovery, and clean-exit removal is handled by the naming loop (which can inspect the exit code) rather than Docker auto-removal (which cannot). Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>

Brings in main's fix(workspace) #185, fix(launch) #184, and a large documentation roadmap addition (#183 + several smaller PRs covering codebase readability, project structure gates, etc.). Adds the `isolation` field to MountConfig literals introduced by main in test fixtures inside `src/app/context.rs` and `src/workspace/resolve.rs` (2 + 3 sites, all `MountIsolation::Shared` since they're not testing isolation behavior). These literals predate this branch's `isolation`-field addition; main couldn't anticipate the new required field, so the merge picks them up unmodified and we backfill the field here. No conflicts in the merge itself. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>

…#184) Replace the blanket `list_managed_agent_names` scan in `claim_container_name` with per-candidate `inspect_container_state` checks. Stopped containers with exit 0 and no OOM flag are removed best-effort, then their slot (and the state dir containing gh credentials) is reclaimed. Running containers and crashed/OOM-killed containers are skipped: the latter are preserved so `jackin hardline` can restart them in place. Also adds a doc comment on the `docker run` block explaining why `--rm` is intentionally absent: crashed containers need to persist for `jackin hardline` crash recovery, and clean-exit removal is handled by the naming loop (which can inspect the exit code) rather than Docker auto-removal (which cannot). Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> Co-authored-by: Claude <noreply@anthropic.com>

…#184) Replace the blanket `list_managed_agent_names` scan in `claim_container_name` with per-candidate `inspect_container_state` checks. Stopped containers with exit 0 and no OOM flag are removed best-effort, then their slot (and the state dir containing gh credentials) is reclaimed. Running containers and crashed/OOM-killed containers are skipped: the latter are preserved so `jackin hardline` can restart them in place. Also adds a doc comment on the `docker run` block explaining why `--rm` is intentionally absent: crashed containers need to persist for `jackin hardline` crash recovery, and clean-exit removal is handled by the naming loop (which can inspect the exit code) rather than Docker auto-removal (which cannot). Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> Co-authored-by: Codex <codex@openai.com>

…#184) Replace the blanket `list_managed_agent_names` scan in `claim_container_name` with per-candidate `inspect_container_state` checks. Stopped containers with exit 0 and no OOM flag are removed best-effort, then their slot (and the state dir containing gh credentials) is reclaimed. Running containers and crashed/OOM-killed containers are skipped: the latter are preserved so `jackin hardline` can restart them in place. Also adds a doc comment on the `docker run` block explaining why `--rm` is intentionally absent: crashed containers need to persist for `jackin hardline` crash recovery, and clean-exit removal is handled by the naming loop (which can inspect the exit code) rather than Docker auto-removal (which cannot). Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> Co-authored-by: Claude <noreply@anthropic.com>

donbeave merged commit 3f0dd96 into main Apr 26, 2026
6 checks passed

donbeave deleted the fix/claim-container-slot-drift branch April 26, 2026 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(launch): reclaim stopped clean-exit container slots on name claim#184

fix(launch): reclaim stopped clean-exit container slots on name claim#184
donbeave merged 1 commit into
mainfrom
fix/claim-container-slot-drift

donbeave commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

donbeave commented Apr 26, 2026

Root cause

Why --rm was not the fix

The three-way decision

Why state survives

The name-to-state-dir invariant

Tests added

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why `--rm` was not the fix