Skip to content

fix(dolt): distinguish fsck open-failures from integrity failures#3465

Merged
maphew merged 1 commit into
gastownhall:mainfrom
seanmartinsmith:fix/fsck-distinguish-open-failure
Apr 25, 2026
Merged

fix(dolt): distinguish fsck open-failures from integrity failures#3465
maphew merged 1 commit into
gastownhall:mainfrom
seanmartinsmith:fix/fsck-distinguish-open-failure

Conversation

@seanmartinsmith

Copy link
Copy Markdown
Contributor

Problem

prePushFSCK (added in #3447) wraps any dolt fsck error as ErrDanglingReference with the message aborting push to prevent propagating corrupt chunks. That phrasing implies the local database is corrupt — but fsck can fail for environmental reasons that have nothing to do with integrity, and wrapping those as corruption misleads users.

Root Cause

prePushFSCK at internal/storage/dolt/store.go:1907-1925 treats all non-zero fsck exits identically. No distinction between:

  • fsck ran and found integrity problems (the intended abort case)
  • fsck couldn't even open the database (tooling / version mismatch / environmental)

Concrete Trigger (resolved upstream, still worth hardening)

dolthub/dolt#10915 (shipped in dolt v1.86.4, 2026-04-22) — fsck.go used url.Parse to parse the database file URL; on Windows this placed the drive letter into the URL Host field rather than Path, causing dbfactory/file.go to construct an invalid \C:\... path. Every Windows bd user on dolt <1.86.4 running post-#3447 bd hit this: bd dolt push returned a "dangling chunk reference" error on perfectly healthy databases.

The dolt bug is fixed. This PR hardens beads against the class of failure mode so future dolt/bd version mismatches don't produce misleading corruption warnings.

Fix

Distinguish "fsck couldn't run" from "fsck found problems" by matching the two known dolt phrasings that mean the check never executed:

  • "Could not open dolt database" — the url.Parse bug symptom (and any other open failure)
  • "repository state is invalid" — uninitialized or partial .dolt directory

For those cases, log a warning and proceed. For any other fsck failure, abort as before. Real dangling-reference errors still block propagation; only misleading wrappings of environmental failures are changed.

Test Plan

go test -tags gms_pure_go -run TestPrePushFSCK ./internal/storage/dolt/
go test -tags gms_pure_go -run TestFsckCouldNotOpen ./internal/storage/dolt/
go test -tags gms_pure_go ./internal/storage/dolt/

Updated tests:

  • TestPrePushFSCK_UnopenableDB (renamed from TestPrePushFSCK_CorruptNoms) — simulates the unopenable state (.dolt/noms present, no dolt init) and verifies prePushFSCK returns nil + logs a warning rather than wrapping as ErrDanglingReference.
  • TestFsckCouldNotOpen (new) — table test covering both known "couldn't open" phrasings, a real dangling-reference string (must not match), and an empty string.

All previously-passing fsck tests continue to pass.

Context

Fixes #3464.

Root cause in dolt: dolthub/dolt#10915 (fixed in v1.86.4).

Full discovery thread: #beads channel on Dolt Discord, 2026-04-24 — @macneale and @elianddb identified the dolt fix within minutes of the symptom being surfaced.

Scope Guards

  • Not changing prePushFSCK's core intent or #3447's safety-check goal
  • Not adding platform-specific skips (OS-agnostic class of failure)
  • Not modifying push code paths outside prePushFSCK
  • No user-facing behavior change for healthy databases on current dolt

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
internal/storage/dolt/store.go 92.30% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

prePushFSCK previously wrapped any dolt fsck error as ErrDanglingReference
with 'aborting push to prevent propagating corrupt chunks', including
cases where fsck could not even open the database (environmental /
tooling issues, not integrity problems).

This misled users into thinking their healthy databases were corrupt.
Concrete example: dolthub/dolt#10915 (Windows url.Parse bug, pre-v1.86.4)
caused fsck to construct a malformed file path and fail to open; users
hitting this saw the misleading 'dangling chunk reference' error from bd.

Now detect the two known 'couldn't open' signatures from dolt and log a
warning instead of aborting. Real integrity failures (dangling chunks
in an openable db) still abort as before.

Fixes gastownhall#3464

@maphew maphew left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the fsck classification path and ran local targeted validation: go test -tags gms_pure_go -run 'TestPrePushFSCK|TestFsckCouldNotOpen' ./internal/storage/dolt/. This preserves the real integrity-failure abort path while avoiding a misleading corruption error when fsck cannot open the database at all.

@maphew maphew merged commit 4affe45 into gastownhall:main Apr 25, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: bd dolt push broken on Windows after #3447 (dolt fsck Windows path bug)

3 participants