Skip to content

fix: make credential file writes atomic and fail closed on corruption#244

Merged
felipefreitag merged 16 commits into
mainfrom
fix/profile-config-race-4812
May 11, 2026
Merged

fix: make credential file writes atomic and fail closed on corruption#244
felipefreitag merged 16 commits into
mainfrom
fix/profile-config-race-4812

Conversation

@bukinoshita

@bukinoshita bukinoshita commented Apr 9, 2026

Copy link
Copy Markdown
Member

Summary by cubic

Prevents silent loss of saved auth profiles by making credential writes atomic and locked, and by failing closed on corrupted JSON.

  • Bug Fixes
    • Atomic writes: temp file + fsync + rename to avoid partial/truncated credentials.json.
    • Advisory lock around read/modify/write to prevent concurrent races.
    • readCredentials() throws CorruptedCredentialsError on invalid JSON; file is preserved; mutators stop instead of rebuilding empty state.
    • Mutators (storeApiKey, removeApiKey, setActiveProfile, renameProfile, storeApiKeyAsync) use immutable updates under the lock; removeApiKey no longer deletes on parse failure.
    • Logout gracefully handles corrupted credentials and still proceeds to the removal path.

Written for commit 219f90b. Summary will update on new commits.

cursoragent and others added 2 commits April 9, 2026 17:50
…losed on corruption

- Add writeFileAtomic: writes to tmp file, fsyncs, renames over original
- Add withFileLock: advisory lock with stale detection for read/modify/write
- readCredentials now throws CorruptedCredentialsError on malformed JSON
  instead of silently returning null (which caused storeApiKey to rebuild
  an empty config and erase all existing profiles)
- removeApiKey no longer deletes the file when it can't parse JSON
- All mutators (storeApiKey, removeApiKey, setActiveProfile, renameProfile,
  storeApiKeyAsync) wrapped in file lock to prevent concurrent overwrites
- All mutations use immutable spread patterns instead of in-place mutation
- logout command gracefully handles corrupted credentials by catching the
  error and proceeding to the removal path

Fixes: BU-652

Co-authored-by: Bu Kinoshita <bukinoshita@users.noreply.github.com>
@bukinoshita

Copy link
Copy Markdown
Member Author

@cubic-dev-ai can you review?

@cubic-dev-ai

cubic-dev-ai Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

@cubic-dev-ai can you review?

@bukinoshita I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 7 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/lib/file-lock.ts">

<violation number="1" location="src/lib/file-lock.ts:61">
P2: `sleepSync` burns 100% CPU while spinning. Use `Atomics.wait` for a zero-CPU synchronous sleep — it's available in all supported Node versions (≥20).</violation>
</file>

<file name="src/lib/config.ts">

<violation number="1" location="src/lib/config.ts:394">
P1: The auto-migration write inside `resolveApiKeyAsync` still mutates `creds` in-place and calls `writeCredentials` without `withFileLock`, leaving the same lost-update race this PR eliminates everywhere else. Wrap the write in `withFileLock` and use immutable updates to match the other mutators.</violation>
</file>

<file name="src/commands/auth/logout.ts">

<violation number="1" location="src/commands/auth/logout.ts:52">
P2: This change still allows an uncaught credentials-read throw via `resolveProfileName()`, so logout can crash on corrupted credentials instead of reaching the `remove_failed` handling path.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread src/lib/config.ts Outdated
Comment thread src/lib/file-lock.ts
Comment thread src/commands/auth/logout.ts
@bukinoshita bukinoshita marked this pull request as ready for review April 14, 2026 12:55
bukinoshita and others added 8 commits April 14, 2026 09:55
# Conflicts:
#	src/lib/config.ts
#	tests/lib/config.test.ts
This repo uses pnpm; the npm lockfile was committed by accident from a
Cursor Agent run. Adding it to .gitignore prevents regressions.
Every other mutator now wraps read-modify-write in withFileLock, but
removeAllApiKeys was unlinking directly. A concurrent storeApiKey
mid-rename could otherwise be silently overwritten or interleaved.
withFileLock now accepts an async function via overload, so the entire
read-modify-write of each async mutator (storeApiKeyAsync,
removeApiKeyAsync, removeAllApiKeysAsync, renameProfileAsync) executes
under a single lock — including the awaited secure-storage backend
calls.

Previously these read credentials outside the lock, mutated the OS
keychain, then re-acquired the lock for the file write. A concurrent
mutation between those steps could leave the file out of sync with
secure storage (e.g. file points at a profile whose keychain entry was
just deleted by a second process).

Async mutators inline the file-write logic instead of calling the sync
helpers, since the sync helpers re-acquire the lock and the underlying
lock file is not re-entrant.
POSIX guarantees rename atomicity across crash boundaries only after
the parent directory is also fsynced. Without this, a kernel panic
between rename and dir fsync could still surface as a missing
credentials file — which is exactly the durability claim the PR makes.

Best-effort: ignore Windows (unsupported) and EINVAL from filesystems
that reject fsync on directories.
- write-file-atomic: append a random UUID to the tmp filename so two
  processes that happen to share a PID under a shared homedir
  (containers/NFS) can't collide.
- file-lock: capture the lock file's inode at acquire time and verify
  it matches at release time, so we don't unlink a lock that was
  stolen via stale detection by another process.
- file-lock: improve the "could not acquire" error to suggest manual
  cleanup of a stuck lock file.
- tests: clear RESEND_API_KEY in renameProfile beforeEach so the two
  tests that call resolveApiKey don't pick up a developer's local env.
Adds 11 tests for the two new primitives that previously had no
direct coverage:

- write-file-atomic: write+mode, overwrite, tmp cleanup on success,
  error paths (missing dir, read-only dir).
- file-lock: sync + async happy path, release on throw/reject, stale
  lock reclaim, inode mismatch check (don't unlink a stolen lock).

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 8 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tests/lib/write-file-atomic.test.ts">

<violation number="1" location="tests/lib/write-file-atomic.test.ts:59">
P3: This test never exercises the rename-failure path it claims to cover; `chmodSync(tmpDir, 0o500)` blocks temp-file creation first, so it only checks the initial write failure cleanup.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread tests/lib/write-file-atomic.test.ts Outdated
expect(readdirSync(tmpDir)).toEqual([]);
});

it('does not leak a tmp file when rename fails', () => {

@cubic-dev-ai cubic-dev-ai Bot May 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: This test never exercises the rename-failure path it claims to cover; chmodSync(tmpDir, 0o500) blocks temp-file creation first, so it only checks the initial write failure cleanup.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/lib/write-file-atomic.test.ts, line 59:

<comment>This test never exercises the rename-failure path it claims to cover; `chmodSync(tmpDir, 0o500)` blocks temp-file creation first, so it only checks the initial write failure cleanup.</comment>

<file context>
@@ -0,0 +1,75 @@
+    expect(readdirSync(tmpDir)).toEqual([]);
+  });
+
+  it('does not leak a tmp file when rename fails', () => {
+    // Make the directory read-only after creating it, so openSync succeeds
+    // but renameSync fails on the existing destination.
</file context>
Fix with Cubic

The PR description claimed logout already handled corrupted credentials
gracefully, but manual QA showed it crashed with CorruptedCredentialsError
when the file was unreadable.

Two interacting bugs:

- removeAllApiKeysAsync read credentials inside the lock and re-threw on
  corruption. Now uses a new tryReadCredentials helper (exported from
  config.ts) so the "nuke everything" path proceeds with an unlink even
  when the file is corrupted.
- logout.ts called resolveProfileName() unconditionally, which threw on
  corruption when no --profile flag and no RESEND_PROFILE env var. Now
  the profile label is only computed when actually needed (single-profile
  removal), and the "profile not found" check only runs when creds are
  readable.

Single-profile removal (logout --profile X) still fails closed on
corruption, surfacing the corruption error and preserving the file —
that's the right behavior for a surgical operation.

Adds three regression tests:
- logout removes a corrupted file (logoutAll)
- logout --profile X fails closed on corruption
- removeAllApiKeysAsync directly tolerates corruption

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/commands/auth/logout.ts">

<violation number="1" location="src/commands/auth/logout.ts:62">
P1: `logoutAll` uses a truthiness check, so an explicitly provided empty `--profile` is treated as "no profile" and removes all credentials.

(Based on your team's feedback about Commander option presence checks using `!== undefined` instead of truthiness.) [FEEDBACK_USED]</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic

// removal, the user-supplied flag is the source of truth. We deliberately
// avoid resolveProfileName() so a corrupted credentials file doesn't
// crash the "logout all" path.
const profileLabel = profileFlag ?? 'all';

@cubic-dev-ai cubic-dev-ai Bot May 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: logoutAll uses a truthiness check, so an explicitly provided empty --profile is treated as "no profile" and removes all credentials.

(Based on your team's feedback about Commander option presence checks using !== undefined instead of truthiness.)

View Feedback

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/commands/auth/logout.ts, line 62:

<comment>`logoutAll` uses a truthiness check, so an explicitly provided empty `--profile` is treated as "no profile" and removes all credentials.

(Based on your team's feedback about Commander option presence checks using `!== undefined` instead of truthiness.) </comment>

<file context>
@@ -65,12 +55,19 @@ If no credentials file exists, exits cleanly with no error.`,
+    // removal, the user-supplied flag is the source of truth. We deliberately
+    // avoid resolveProfileName() so a corrupted credentials file doesn't
+    // crash the "logout all" path.
+    const profileLabel = profileFlag ?? 'all';
 
-    if (!logoutAll && !creds?.profiles[profileLabel]) {
</file context>

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix with Cubic

Linux ext4/tmpfs reuses just-freed inodes immediately, so a lock stolen
via stale-detection that recreates a file at the same path can share our
original inode. The inode-match check in releaseLock then mistook the
stolen lock as ours and unlinked it — a regression detected only on CI.

Replace inode identity with a randomUUID token written into the lock
file at acquire time and verified at release time. Content comparison
is robust on all platforms.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Auto-approval blocked by 2 unresolved issues from previous reviews.

The main test workflow runs Linux only, but the file-lock and
atomic-write code is platform-sensitive (inode reuse on ext4 already
produced one regression). Extend test-credential-backends.yml — which
already has Windows/macOS/Linux jobs probing OS keychain CLIs — with
a cross-platform job that actually runs the vitest suite for:

- tests/lib/file-lock.test.ts
- tests/lib/write-file-atomic.test.ts
- tests/lib/config.test.ts
- tests/lib/config-async.test.ts
- tests/commands/auth/

Widen the path trigger to src/lib/** and src/commands/auth/** (plus
the parallel test paths) so any change touching the credential plumbing
or its shared libs surfaces platform-specific behavior in CI.

Three POSIX-permission-bit tests skip on win32 since chmod and
stat.mode are no-ops on Windows.
Two pre-existing tests assumed POSIX paths and broke on the new
Windows CI job:

- 'respects XDG_CONFIG_HOME' compared against a hardcoded
  '/custom/config/resend'; on Windows path.join produces backslashes.
  Build the expected path with join() so it matches either separator.
- 'falls back to ~/.config/resend on non-Windows' literally says
  non-Windows in the name. Skip it on win32 and add a parallel
  runIf-Windows test that asserts the APPDATA fallback.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/test-credential-backends.yml">

<violation number="1" location=".github/workflows/test-credential-backends.yml:5">
P2: Keep GitHub Actions `pull_request.paths` filters narrowly scoped to only files relevant to the workflow's target. Using broad globs like `src/lib/**` and `tests/lib/**` triggers the workflow on unrelated changes, defeating the goal of running it only when needed.

(Based on your team's feedback about keeping GitHub Actions pull_request.paths filters narrowly scoped.) [FEEDBACK_USED]</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic

pull_request:
paths:
- src/lib/credential-backends/**
- src/lib/**

@cubic-dev-ai cubic-dev-ai Bot May 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Keep GitHub Actions pull_request.paths filters narrowly scoped to only files relevant to the workflow's target. Using broad globs like src/lib/** and tests/lib/** triggers the workflow on unrelated changes, defeating the goal of running it only when needed.

(Based on your team's feedback about keeping GitHub Actions pull_request.paths filters narrowly scoped.)

View Feedback

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/test-credential-backends.yml, line 5:

<comment>Keep GitHub Actions `pull_request.paths` filters narrowly scoped to only files relevant to the workflow's target. Using broad globs like `src/lib/**` and `tests/lib/**` triggers the workflow on unrelated changes, defeating the goal of running it only when needed.

(Based on your team's feedback about keeping GitHub Actions pull_request.paths filters narrowly scoped.) </comment>

<file context>
@@ -2,10 +2,35 @@ name: Test Credential Backends
   pull_request:
     paths:
-      - src/lib/credential-backends/**
+      - src/lib/**
+      - src/commands/auth/**
+      - tests/lib/**
</file context>

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix with Cubic

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Auto-approval blocked by 3 unresolved issues from previous reviews.

@felipefreitag felipefreitag merged commit 51d89f8 into main May 11, 2026
16 checks passed
@felipefreitag felipefreitag deleted the fix/profile-config-race-4812 branch May 11, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants