Skip to content

fix: prefer bootstrap for missing database recovery#2940

Merged
steveyegge merged 9 commits into
gastownhall:mainfrom
Bella-Giraffety:polecat/mayor-bd-3a4
Apr 3, 2026
Merged

fix: prefer bootstrap for missing database recovery#2940
steveyegge merged 9 commits into
gastownhall:mainfrom
Bella-Giraffety:polecat/mayor-bd-3a4

Conversation

@Bella-Giraffety

Copy link
Copy Markdown
Contributor

Summary

  • change missing-database recovery guidance for existing projects from bd init to bd bootstrap
  • make bd bootstrap actually recover shared-server missing-database cases instead of treating any non-empty shared Dolt dir as an existing database
  • fail closed on bootstrap server probe errors and use the resolved runtime Dolt server config for existence checks
  • add focused regression coverage for message guidance, shared-server bootstrap behavior, probe error handling, and agent-facing doctor guidance

Related Issue

  • Fixes bd-3a4

Bug

During the coder_dotfiles bootstrap investigation, multiple Beads recovery surfaces told operators to rerun bd init when a configured server-mode database was missing. That was both unsafe and misleading for existing projects and fresh clones.

The first pass at switching the guidance to bd bootstrap exposed a second real bug: in shared-server mode, bootstrap would short-circuit to "database already exists" as soon as the shared Dolt dir contained any database, even if the configured project database was missing. That made the new guidance false in exactly the scenario it was supposed to fix.

Why this fix

This PR keeps the change surgical and focused on missing-database recovery for existing projects:

  • user-facing recovery guidance now points existing-project recovery to bd bootstrap
  • bd bootstrap now checks whether the configured server database actually exists before deciding it is a no-op
  • the bootstrap probe uses the resolved runtime server configuration and fails closed on probe errors instead of silently falling through into misleading recovery plans
  • wording is softened so bd bootstrap is described as the safe entry point that may recover or initialize depending on detected state, with --dry-run suggested when operators want to confirm behavior first

Brand-new project creation guidance still points to bd init.

Testing

Focused commands run:

  • GOTOOLCHAIN=auto go test ./cmd/bd -run 'TestDetectBootstrapAction|TestInitGuardServerMessage|TestInitGuardDBCheck_ExistsPath|TestHandleFreshCloneError_UsesBootstrapFirstGuidance'
  • GOTOOLCHAIN=auto go test ./cmd/bd/doctor -run 'TestFreshCloneServerResult|TestCheckFreshCloneDB_ServerUnreachable|TestCheckFreshClone_EmbeddedMode|TestCheckFreshClone|TestEnrichFreshClone'
  • GOTOOLCHAIN=auto go test ./internal/storage/dolt -run TestDatabaseNotFoundError

Coverage added/updated includes:

  • shared-server missing configured DB should not return Action=none
  • bootstrap probe errors should stop with an explicit reason
  • bootstrap probe honors resolved runtime port/TLS behavior
  • fresh-clone doctor and agent guidance are bootstrap-first without brittle full-string snapshots

Review process

This branch went through five pre-implementation reviews and a second review wave against the implementation branch. The final branch incorporates review findings about:

  • shared-server bootstrap no-op semantics
  • probe error handling
  • resolved runtime port/TLS behavior
  • avoiding overpromising bootstrap as always recovery-only
  • keeping the PR scoped to existing-project missing-database recovery instead of broad initialization changes

@hilmes hilmes left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix: prefer bootstrap for missing database recovery

Verdict: LGTM — approve with one flag

The Problem

Two bugs, both exposed during coder_dotfiles bootstrap investigation:

Bug 1 (guidance): Multiple recovery surfaces — databaseNotFoundError, handleFreshCloneError, initGuardServerMessage, CheckDatabaseVersion, checkDatabaseExists, freshCloneServerResult, CheckFreshClone, and the doctor agent enrichers — all told users to run bd init when a configured server-mode database was missing. For existing projects and fresh clones, bd init is unsafe (it creates a new empty database, potentially destroying data if --force is used). bd bootstrap is the correct recovery entry point: it inspects remote/backup/JSONL state and chooses the right action.

Bug 2 (logic): In shared-server mode, detectBootstrapAction checked whether the Dolt data directory was non-empty and short-circuited to Action=none ("database already exists"). But a shared Dolt directory can contain other databases — the configured project database might be missing. Bootstrap would falsely report success when the actual recovery target didn't exist.

The Fix

20 files changed, +545 −51.

1. Bootstrap server probe (bootstrap.go, +68 lines of new code):

checkBootstrapServerDB — a var func(...) (test-injectable) that connects to the Dolt server via MySQL protocol, runs SHOW DATABASES, and checks whether the configured database name exists. Returns a bootstrapServerDBCheck with Exists, Reachable, and Err fields.

detectBootstrapAction now branches on cfg.GetDoltMode() == DoltModeServer when the Dolt directory has entries:

  • Probes the server for the configured database
  • If probe errors → Action=none with explicit reason (fail closed)
  • If database exists → Action=none (existing behavior, correct)
  • If database missing → falls through to sync/restore/init detection

The probe config uses a deliberate mix of sources:

  • Host/user/password/database/TLS from cfg.Get*() methods (which respect env var overrides)
  • Port from doltserver.DefaultConfig(beadsDir).Port (resolves through port file → config.yaml → metadata.json, getting the actual runtime port)

This is correct — the port file is the most reliable source for the actual listening port, especially in shared-server mode where the port is ephemeral.

2. Guidance updates (12 files):

Every user-facing and agent-facing recovery message across the codebase is updated:

  • bd initbd bootstrap for existing-project/fresh-clone recovery
  • bd init --prefix guidance retained for brand-new projects only
  • --dry-run suggested when operators want to inspect the plan
  • Wording softened: bootstrap "may recover or initialize depending on detected state"

The consistency is thorough — I traced every occurrence:

  • errors.go: databaseNotFoundError — backup scenario, sync.git-remote tip, generic tip
  • main_errors.go: handleFreshCloneError
  • init_guard.go: initGuardServerMessage — with/without sync.git-remote
  • database.go: CheckDatabaseVersion / CheckDatabaseVersionWithStore
  • server.go: checkDatabaseExists
  • fresh_clone_server.go: freshCloneServerResult — with/without sync.git-remote
  • legacy.go: CheckFreshClone — removed --prefix flag from fix command
  • agent.go: enrichFreshClone — both standard and sync.git-remote variants

Correctness ✅

  • Fail-closed on probe errors. If the server is unreachable or SHOW DATABASES fails, bootstrap stops with an explicit error reason instead of falling through to init. This prevents accidental re-initialization when the server is temporarily down.
  • checkBootstrapServerDB is a var func, not a method — cleanly injectable in tests without interface overhead. All three new bootstrap tests use this to mock the probe.
  • DSN construction handles password-less connections correctly (user@tcp(...) vs user:pass@tcp(...)). TLS appended as &tls=true when enabled.
  • Context timeout (5s) on both Ping and SHOW DATABASES — bounded, won't hang.
  • The prefix removal from CheckFreshClone: the old code extracted a prefix from JSONL and injected it into bd init --prefix <prefix>. Since bootstrap auto-detects the prefix, this is no longer needed.

Edge Cases 🔍

1. Shared-server directory with the correct database present. The probe returns Exists=true and bootstrap correctly reports Action=none with a message including the host:port. The HasExisting flag is set. Same behavior as before, but now validated via actual server check instead of filesystem heuristic.

2. Embedded mode (non-server). The existing else branch preserves the original behavior — non-empty Dolt directory means "database exists." Embedded mode doesn't have the shared-directory problem since each project owns its Dolt directory exclusively.

3. Server unreachable during bootstrap. Fail-closed: Action=none, reason includes the connection error. The user sees why and can retry. This is the right call — better to stop than to blindly re-initialize.

4. SHOW DATABASES permission denied. Probe returns Reachable=true, Err=<permission error>. Bootstrap stops with the error. Correct — if we can't enumerate databases, we can't safely determine whether to recover.

5. Race condition: database created between probe and bootstrap action. Minimal window (probe is fast), and the downstream sync/init actions are themselves idempotent or guarded. Not a practical concern.

⚠️ Flag: .beads/ directory addition

The PR includes 4 new files in .beads/ at the repo root (.gitignore, README.md, config.yaml, metadata.json). These come from commit 7bf9b24e feat: seed server-mode .beads for rig integration — the project is bootstrapping its own beads instance for issue tracking.

This is unrelated to the bootstrap fix and significantly increases the PR's surface area. The .beads/metadata.json hardcodes dolt_server_port: 3307 and dolt_database: "beads" — these are runtime-specific values that will propagate to all contributors via git. Consider:

  • Splitting this into a separate PR for cleaner review history
  • Or at minimum: the hardcoded port in metadata.json may conflict with contributors running Dolt on different ports (though doltserver.DefaultConfig resolves through port file first, so the metadata.json value is a last-resort fallback)

Not blocking the fix itself, but worth addressing.

Performance ✅

The server probe adds one MySQL connection + SHOW DATABASES query during bootstrap. Bounded by 5s timeout. Only runs in server mode when the Dolt directory is non-empty. No hot-path impact.

Style ✅

  • Test coverage is thorough: 6 new tests covering missing-DB-not-none, probe-error-stops, TLS-in-DSN, fresh-clone guidance, doctor agent guidance, and init-guard messaging.
  • Existing tests properly updated with t.Setenv to clear env vars that could leak between tests.
  • The var func pattern for checkBootstrapServerDB is consistent with other test-injectable functions in the codebase.
  • Message wording is careful: "may recover or initialize depending on detected state" avoids overpromising.

Ship the bootstrap fix. Consider splitting the .beads/ directory seeding into its own PR.

@Bella-Giraffety

Copy link
Copy Markdown
Contributor Author

Rebuilt this PR on a clean upstream main base and force-pushed it so it now contains only the intended bootstrap/recovery fix set. The previous branch included unrelated local .beads artifacts in history.

Thalia-geraghty and others added 9 commits April 2, 2026 19:31
Executed-By: beads/crew/emma
Three issues causing CI failures on macOS and Ubuntu:

1. detectBootstrapAction used isEmbeddedMode() (global state) to choose
   the database path, but tests set cfg.DoltMode without updating the
   global serverMode flag. Switch to cfg.GetDoltMode() so the detection
   logic respects the config object passed by the caller.

2. TestDetectBootstrapAction_NoneWhenDatabaseExists created a "dolt"
   directory but default (embedded) config looks for "embeddeddolt".
   Fix the test to create the correct directory.

3. embeddeddolt/slots.go and new_methods_test.go used //go:build
   embeddeddolt tag while all other files in the package use cgo,
   causing interface satisfaction failures. Also fix undefined
   isEmbeddedDolt reference (should be isEmbeddedMode()).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: beads/crew/emma
@steveyegge steveyegge force-pushed the polecat/mayor-bd-3a4 branch from 841745f to 1d106d3 Compare April 3, 2026 02:52
@steveyegge steveyegge merged commit 4e4d958 into gastownhall:main Apr 3, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants