Skip to content

fix(launch): reclaim stopped clean-exit container slots on name claim#184

Merged
donbeave merged 1 commit into
mainfrom
fix/claim-container-slot-drift
Apr 26, 2026
Merged

fix(launch): reclaim stopped clean-exit container slots on name claim#184
donbeave merged 1 commit into
mainfrom
fix/claim-container-slot-drift

Conversation

@donbeave

Copy link
Copy Markdown
Member

Root cause

claim_container_name called list_managed_agent_names, which runs docker ps -a — including stopped containers. Because docker run intentionally omits --rm (see below), containers persist after exit as stopped entries. These two facts combined to permanently block every name slot that had ever been used: the next jackin load climbed to clone-2, clone-3, etc., each with a brand-new empty state directory. The user had to re-authenticate with gh auth login on every reload.

Why --rm was not the fix

Adding --rm to docker run would auto-remove containers on exit, including crashed ones. That would:

  • Destroy logs and the container filesystem needed to diagnose crashes
  • Make jackin hardline crash-restart impossible — it restarts the existing stopped container in place; if Docker already removed it, there is nothing to restart

--rm is intentionally absent. The doc comment added to the docker run block in this PR makes that explicit.

The three-way decision

Instead of a blanket scan, claim_container_name now calls inspect_container_state per candidate:

State Action
Running Skip — active session owns this slot
Stopped / exit 0, not OOM-killed docker rm best-effort, then reclaim the slot
Stopped / non-zero exit or OOM-killed Skip — jackin hardline territory
NotFound Claim the slot as before

Why state survives

The state directory (~/.jackin/data/<name>/) lives on disk independently of the Docker container. It holds .config/gh/ with the gh credentials. Reclaiming a slot mounts the same directory into the new container — credentials are intact, no re-auth needed.

The name-to-state-dir invariant

Each unique container name maps permanently to one state directory. This fix preserves that invariant exactly: it does not reassign names or move state directories. It only allows a slot to be reused when the previous container exited cleanly and has been removed.

Tests added

Five new unit tests for claim_container_name:

  1. NotFound → claimed directly, no docker rm
  2. Running → skipped, next slot (clone-1) claimed
  3. Stopped / exit 0 → docker rm issued, same slot reclaimed
  4. Stopped / non-zero exit → skipped, next slot claimed
  5. Mixed: slot 0 crashed, slot 1 clean-exit → slot 1 reclaimed after rm, not slot 2

🤖 Generated with Claude Code

Replace the blanket `list_managed_agent_names` scan in
`claim_container_name` with per-candidate `inspect_container_state`
checks. Stopped containers with exit 0 and no OOM flag are removed
best-effort, then their slot (and the state dir containing gh credentials)
is reclaimed. Running containers and crashed/OOM-killed containers are
skipped: the latter are preserved so `jackin hardline` can restart them in
place.

Also adds a doc comment on the `docker run` block explaining why `--rm`
is intentionally absent: crashed containers need to persist for
`jackin hardline` crash recovery, and clean-exit removal is handled by
the naming loop (which can inspect the exit code) rather than Docker
auto-removal (which cannot).

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
@donbeave donbeave merged commit 3f0dd96 into main Apr 26, 2026
6 checks passed
@donbeave donbeave deleted the fix/claim-container-slot-drift branch April 26, 2026 15:27
donbeave added a commit that referenced this pull request Apr 26, 2026
Brings in main's fix(workspace) #185, fix(launch) #184, and a large
documentation roadmap addition (#183 + several smaller PRs covering
codebase readability, project structure gates, etc.).

Adds the `isolation` field to MountConfig literals introduced by main
in test fixtures inside `src/app/context.rs` and `src/workspace/resolve.rs`
(2 + 3 sites, all `MountIsolation::Shared` since they're not testing
isolation behavior). These literals predate this branch's
`isolation`-field addition; main couldn't anticipate the new required
field, so the merge picks them up unmodified and we backfill the field
here.

No conflicts in the merge itself.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
donbeave added a commit that referenced this pull request May 6, 2026
…#184)

Replace the blanket `list_managed_agent_names` scan in
`claim_container_name` with per-candidate `inspect_container_state`
checks. Stopped containers with exit 0 and no OOM flag are removed
best-effort, then their slot (and the state dir containing gh credentials)
is reclaimed. Running containers and crashed/OOM-killed containers are
skipped: the latter are preserved so `jackin hardline` can restart them in
place.

Also adds a doc comment on the `docker run` block explaining why `--rm`
is intentionally absent: crashed containers need to persist for
`jackin hardline` crash recovery, and clean-exit removal is handled by
the naming loop (which can inspect the exit code) rather than Docker
auto-removal (which cannot).

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
donbeave added a commit that referenced this pull request May 7, 2026
…#184)

Replace the blanket `list_managed_agent_names` scan in
`claim_container_name` with per-candidate `inspect_container_state`
checks. Stopped containers with exit 0 and no OOM flag are removed
best-effort, then their slot (and the state dir containing gh credentials)
is reclaimed. Running containers and crashed/OOM-killed containers are
skipped: the latter are preserved so `jackin hardline` can restart them in
place.

Also adds a doc comment on the `docker run` block explaining why `--rm`
is intentionally absent: crashed containers need to persist for
`jackin hardline` crash recovery, and clean-exit removal is handled by
the naming loop (which can inspect the exit code) rather than Docker
auto-removal (which cannot).

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Codex <codex@openai.com>
donbeave added a commit that referenced this pull request May 7, 2026
…#184)

Replace the blanket `list_managed_agent_names` scan in
`claim_container_name` with per-candidate `inspect_container_state`
checks. Stopped containers with exit 0 and no OOM flag are removed
best-effort, then their slot (and the state dir containing gh credentials)
is reclaimed. Running containers and crashed/OOM-killed containers are
skipped: the latter are preserved so `jackin hardline` can restart them in
place.

Also adds a doc comment on the `docker run` block explaining why `--rm`
is intentionally absent: crashed containers need to persist for
`jackin hardline` crash recovery, and clean-exit removal is handled by
the naming loop (which can inspect the exit code) rather than Docker
auto-removal (which cannot).

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
donbeave added a commit that referenced this pull request May 7, 2026
…#184)

Replace the blanket `list_managed_agent_names` scan in
`claim_container_name` with per-candidate `inspect_container_state`
checks. Stopped containers with exit 0 and no OOM flag are removed
best-effort, then their slot (and the state dir containing gh credentials)
is reclaimed. Running containers and crashed/OOM-killed containers are
skipped: the latter are preserved so `jackin hardline` can restart them in
place.

Also adds a doc comment on the `docker run` block explaining why `--rm`
is intentionally absent: crashed containers need to persist for
`jackin hardline` crash recovery, and clean-exit removal is handled by
the naming loop (which can inspect the exit code) rather than Docker
auto-removal (which cannot).

Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant