fix: prefer bootstrap for missing database recovery#2940
Conversation
hilmes
left a comment
There was a problem hiding this comment.
Review: fix: prefer bootstrap for missing database recovery
Verdict: LGTM — approve with one flag
The Problem
Two bugs, both exposed during coder_dotfiles bootstrap investigation:
Bug 1 (guidance): Multiple recovery surfaces — databaseNotFoundError, handleFreshCloneError, initGuardServerMessage, CheckDatabaseVersion, checkDatabaseExists, freshCloneServerResult, CheckFreshClone, and the doctor agent enrichers — all told users to run bd init when a configured server-mode database was missing. For existing projects and fresh clones, bd init is unsafe (it creates a new empty database, potentially destroying data if --force is used). bd bootstrap is the correct recovery entry point: it inspects remote/backup/JSONL state and chooses the right action.
Bug 2 (logic): In shared-server mode, detectBootstrapAction checked whether the Dolt data directory was non-empty and short-circuited to Action=none ("database already exists"). But a shared Dolt directory can contain other databases — the configured project database might be missing. Bootstrap would falsely report success when the actual recovery target didn't exist.
The Fix
20 files changed, +545 −51.
1. Bootstrap server probe (bootstrap.go, +68 lines of new code):
checkBootstrapServerDB — a var func(...) (test-injectable) that connects to the Dolt server via MySQL protocol, runs SHOW DATABASES, and checks whether the configured database name exists. Returns a bootstrapServerDBCheck with Exists, Reachable, and Err fields.
detectBootstrapAction now branches on cfg.GetDoltMode() == DoltModeServer when the Dolt directory has entries:
- Probes the server for the configured database
- If probe errors →
Action=nonewith explicit reason (fail closed) - If database exists →
Action=none(existing behavior, correct) - If database missing → falls through to sync/restore/init detection
The probe config uses a deliberate mix of sources:
- Host/user/password/database/TLS from
cfg.Get*()methods (which respect env var overrides) - Port from
doltserver.DefaultConfig(beadsDir).Port(resolves through port file → config.yaml → metadata.json, getting the actual runtime port)
This is correct — the port file is the most reliable source for the actual listening port, especially in shared-server mode where the port is ephemeral.
2. Guidance updates (12 files):
Every user-facing and agent-facing recovery message across the codebase is updated:
bd init→bd bootstrapfor existing-project/fresh-clone recoverybd init --prefixguidance retained for brand-new projects only--dry-runsuggested when operators want to inspect the plan- Wording softened: bootstrap "may recover or initialize depending on detected state"
The consistency is thorough — I traced every occurrence:
errors.go:databaseNotFoundError— backup scenario, sync.git-remote tip, generic tipmain_errors.go:handleFreshCloneErrorinit_guard.go:initGuardServerMessage— with/without sync.git-remotedatabase.go:CheckDatabaseVersion/CheckDatabaseVersionWithStoreserver.go:checkDatabaseExistsfresh_clone_server.go:freshCloneServerResult— with/without sync.git-remotelegacy.go:CheckFreshClone— removed--prefixflag from fix commandagent.go:enrichFreshClone— both standard and sync.git-remote variants
Correctness ✅
- Fail-closed on probe errors. If the server is unreachable or
SHOW DATABASESfails, bootstrap stops with an explicit error reason instead of falling through to init. This prevents accidental re-initialization when the server is temporarily down. checkBootstrapServerDBis avar func, not a method — cleanly injectable in tests without interface overhead. All three new bootstrap tests use this to mock the probe.- DSN construction handles password-less connections correctly (
user@tcp(...)vsuser:pass@tcp(...)). TLS appended as&tls=truewhen enabled. - Context timeout (5s) on both
PingandSHOW DATABASES— bounded, won't hang. - The prefix removal from
CheckFreshClone: the old code extracted a prefix from JSONL and injected it intobd init --prefix <prefix>. Since bootstrap auto-detects the prefix, this is no longer needed.
Edge Cases 🔍
1. Shared-server directory with the correct database present. The probe returns Exists=true and bootstrap correctly reports Action=none with a message including the host:port. The HasExisting flag is set. Same behavior as before, but now validated via actual server check instead of filesystem heuristic.
2. Embedded mode (non-server). The existing else branch preserves the original behavior — non-empty Dolt directory means "database exists." Embedded mode doesn't have the shared-directory problem since each project owns its Dolt directory exclusively.
3. Server unreachable during bootstrap. Fail-closed: Action=none, reason includes the connection error. The user sees why and can retry. This is the right call — better to stop than to blindly re-initialize.
4. SHOW DATABASES permission denied. Probe returns Reachable=true, Err=<permission error>. Bootstrap stops with the error. Correct — if we can't enumerate databases, we can't safely determine whether to recover.
5. Race condition: database created between probe and bootstrap action. Minimal window (probe is fast), and the downstream sync/init actions are themselves idempotent or guarded. Not a practical concern.
⚠️ Flag: .beads/ directory addition
The PR includes 4 new files in .beads/ at the repo root (.gitignore, README.md, config.yaml, metadata.json). These come from commit 7bf9b24e feat: seed server-mode .beads for rig integration — the project is bootstrapping its own beads instance for issue tracking.
This is unrelated to the bootstrap fix and significantly increases the PR's surface area. The .beads/metadata.json hardcodes dolt_server_port: 3307 and dolt_database: "beads" — these are runtime-specific values that will propagate to all contributors via git. Consider:
- Splitting this into a separate PR for cleaner review history
- Or at minimum: the hardcoded port in
metadata.jsonmay conflict with contributors running Dolt on different ports (thoughdoltserver.DefaultConfigresolves through port file first, so the metadata.json value is a last-resort fallback)
Not blocking the fix itself, but worth addressing.
Performance ✅
The server probe adds one MySQL connection + SHOW DATABASES query during bootstrap. Bounded by 5s timeout. Only runs in server mode when the Dolt directory is non-empty. No hot-path impact.
Style ✅
- Test coverage is thorough: 6 new tests covering missing-DB-not-none, probe-error-stops, TLS-in-DSN, fresh-clone guidance, doctor agent guidance, and init-guard messaging.
- Existing tests properly updated with
t.Setenvto clear env vars that could leak between tests. - The
var funcpattern forcheckBootstrapServerDBis consistent with other test-injectable functions in the codebase. - Message wording is careful: "may recover or initialize depending on detected state" avoids overpromising.
Ship the bootstrap fix. Consider splitting the .beads/ directory seeding into its own PR.
|
Rebuilt this PR on a clean upstream main base and force-pushed it so it now contains only the intended bootstrap/recovery fix set. The previous branch included unrelated local .beads artifacts in history. |
ba71540 to
841745f
Compare
Executed-By: beads/crew/emma
Executed-By: beads/crew/emma
Executed-By: beads/crew/emma
Executed-By: beads/crew/emma
Executed-By: beads/crew/emma
Executed-By: beads/crew/emma
Executed-By: beads/crew/emma
Executed-By: beads/crew/emma
Three issues causing CI failures on macOS and Ubuntu: 1. detectBootstrapAction used isEmbeddedMode() (global state) to choose the database path, but tests set cfg.DoltMode without updating the global serverMode flag. Switch to cfg.GetDoltMode() so the detection logic respects the config object passed by the caller. 2. TestDetectBootstrapAction_NoneWhenDatabaseExists created a "dolt" directory but default (embedded) config looks for "embeddeddolt". Fix the test to create the correct directory. 3. embeddeddolt/slots.go and new_methods_test.go used //go:build embeddeddolt tag while all other files in the package use cgo, causing interface satisfaction failures. Also fix undefined isEmbeddedDolt reference (should be isEmbeddedMode()). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: beads/crew/emma
841745f to
1d106d3
Compare
Summary
bd inittobd bootstrapbd bootstrapactually recover shared-server missing-database cases instead of treating any non-empty shared Dolt dir as an existing databaseRelated Issue
bd-3a4Bug
During the coder_dotfiles bootstrap investigation, multiple Beads recovery surfaces told operators to rerun
bd initwhen a configured server-mode database was missing. That was both unsafe and misleading for existing projects and fresh clones.The first pass at switching the guidance to
bd bootstrapexposed a second real bug: in shared-server mode, bootstrap would short-circuit to "database already exists" as soon as the shared Dolt dir contained any database, even if the configured project database was missing. That made the new guidance false in exactly the scenario it was supposed to fix.Why this fix
This PR keeps the change surgical and focused on missing-database recovery for existing projects:
bd bootstrapbd bootstrapnow checks whether the configured server database actually exists before deciding it is a no-opbd bootstrapis described as the safe entry point that may recover or initialize depending on detected state, with--dry-runsuggested when operators want to confirm behavior firstBrand-new project creation guidance still points to
bd init.Testing
Focused commands run:
GOTOOLCHAIN=auto go test ./cmd/bd -run 'TestDetectBootstrapAction|TestInitGuardServerMessage|TestInitGuardDBCheck_ExistsPath|TestHandleFreshCloneError_UsesBootstrapFirstGuidance'GOTOOLCHAIN=auto go test ./cmd/bd/doctor -run 'TestFreshCloneServerResult|TestCheckFreshCloneDB_ServerUnreachable|TestCheckFreshClone_EmbeddedMode|TestCheckFreshClone|TestEnrichFreshClone'GOTOOLCHAIN=auto go test ./internal/storage/dolt -run TestDatabaseNotFoundErrorCoverage added/updated includes:
Action=noneReview process
This branch went through five pre-implementation reviews and a second review wave against the implementation branch. The final branch incorporates review findings about: