fix: make credential file writes atomic and fail closed on corruption#244
Conversation
…losed on corruption - Add writeFileAtomic: writes to tmp file, fsyncs, renames over original - Add withFileLock: advisory lock with stale detection for read/modify/write - readCredentials now throws CorruptedCredentialsError on malformed JSON instead of silently returning null (which caused storeApiKey to rebuild an empty config and erase all existing profiles) - removeApiKey no longer deletes the file when it can't parse JSON - All mutators (storeApiKey, removeApiKey, setActiveProfile, renameProfile, storeApiKeyAsync) wrapped in file lock to prevent concurrent overwrites - All mutations use immutable spread patterns instead of in-place mutation - logout command gracefully handles corrupted credentials by catching the error and proceeding to the removal path Fixes: BU-652 Co-authored-by: Bu Kinoshita <bukinoshita@users.noreply.github.com>
|
@cubic-dev-ai can you review? |
@bukinoshita I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
3 issues found across 7 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/lib/file-lock.ts">
<violation number="1" location="src/lib/file-lock.ts:61">
P2: `sleepSync` burns 100% CPU while spinning. Use `Atomics.wait` for a zero-CPU synchronous sleep — it's available in all supported Node versions (≥20).</violation>
</file>
<file name="src/lib/config.ts">
<violation number="1" location="src/lib/config.ts:394">
P1: The auto-migration write inside `resolveApiKeyAsync` still mutates `creds` in-place and calls `writeCredentials` without `withFileLock`, leaving the same lost-update race this PR eliminates everywhere else. Wrap the write in `withFileLock` and use immutable updates to match the other mutators.</violation>
</file>
<file name="src/commands/auth/logout.ts">
<violation number="1" location="src/commands/auth/logout.ts:52">
P2: This change still allows an uncaught credentials-read throw via `resolveProfileName()`, so logout can crash on corrupted credentials instead of reaching the `remove_failed` handling path.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
# Conflicts: # src/lib/config.ts # tests/lib/config.test.ts
This repo uses pnpm; the npm lockfile was committed by accident from a Cursor Agent run. Adding it to .gitignore prevents regressions.
Every other mutator now wraps read-modify-write in withFileLock, but removeAllApiKeys was unlinking directly. A concurrent storeApiKey mid-rename could otherwise be silently overwritten or interleaved.
withFileLock now accepts an async function via overload, so the entire read-modify-write of each async mutator (storeApiKeyAsync, removeApiKeyAsync, removeAllApiKeysAsync, renameProfileAsync) executes under a single lock — including the awaited secure-storage backend calls. Previously these read credentials outside the lock, mutated the OS keychain, then re-acquired the lock for the file write. A concurrent mutation between those steps could leave the file out of sync with secure storage (e.g. file points at a profile whose keychain entry was just deleted by a second process). Async mutators inline the file-write logic instead of calling the sync helpers, since the sync helpers re-acquire the lock and the underlying lock file is not re-entrant.
POSIX guarantees rename atomicity across crash boundaries only after the parent directory is also fsynced. Without this, a kernel panic between rename and dir fsync could still surface as a missing credentials file — which is exactly the durability claim the PR makes. Best-effort: ignore Windows (unsupported) and EINVAL from filesystems that reject fsync on directories.
- write-file-atomic: append a random UUID to the tmp filename so two processes that happen to share a PID under a shared homedir (containers/NFS) can't collide. - file-lock: capture the lock file's inode at acquire time and verify it matches at release time, so we don't unlink a lock that was stolen via stale detection by another process. - file-lock: improve the "could not acquire" error to suggest manual cleanup of a stuck lock file. - tests: clear RESEND_API_KEY in renameProfile beforeEach so the two tests that call resolveApiKey don't pick up a developer's local env.
Adds 11 tests for the two new primitives that previously had no direct coverage: - write-file-atomic: write+mode, overwrite, tmp cleanup on success, error paths (missing dir, read-only dir). - file-lock: sync + async happy path, release on throw/reject, stale lock reclaim, inode mismatch check (don't unlink a stolen lock).
There was a problem hiding this comment.
1 issue found across 8 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tests/lib/write-file-atomic.test.ts">
<violation number="1" location="tests/lib/write-file-atomic.test.ts:59">
P3: This test never exercises the rename-failure path it claims to cover; `chmodSync(tmpDir, 0o500)` blocks temp-file creation first, so it only checks the initial write failure cleanup.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| expect(readdirSync(tmpDir)).toEqual([]); | ||
| }); | ||
|
|
||
| it('does not leak a tmp file when rename fails', () => { |
There was a problem hiding this comment.
P3: This test never exercises the rename-failure path it claims to cover; chmodSync(tmpDir, 0o500) blocks temp-file creation first, so it only checks the initial write failure cleanup.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/lib/write-file-atomic.test.ts, line 59:
<comment>This test never exercises the rename-failure path it claims to cover; `chmodSync(tmpDir, 0o500)` blocks temp-file creation first, so it only checks the initial write failure cleanup.</comment>
<file context>
@@ -0,0 +1,75 @@
+ expect(readdirSync(tmpDir)).toEqual([]);
+ });
+
+ it('does not leak a tmp file when rename fails', () => {
+ // Make the directory read-only after creating it, so openSync succeeds
+ // but renameSync fails on the existing destination.
</file context>
The PR description claimed logout already handled corrupted credentials gracefully, but manual QA showed it crashed with CorruptedCredentialsError when the file was unreadable. Two interacting bugs: - removeAllApiKeysAsync read credentials inside the lock and re-threw on corruption. Now uses a new tryReadCredentials helper (exported from config.ts) so the "nuke everything" path proceeds with an unlink even when the file is corrupted. - logout.ts called resolveProfileName() unconditionally, which threw on corruption when no --profile flag and no RESEND_PROFILE env var. Now the profile label is only computed when actually needed (single-profile removal), and the "profile not found" check only runs when creds are readable. Single-profile removal (logout --profile X) still fails closed on corruption, surfacing the corruption error and preserving the file — that's the right behavior for a surgical operation. Adds three regression tests: - logout removes a corrupted file (logoutAll) - logout --profile X fails closed on corruption - removeAllApiKeysAsync directly tolerates corruption
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/commands/auth/logout.ts">
<violation number="1" location="src/commands/auth/logout.ts:62">
P1: `logoutAll` uses a truthiness check, so an explicitly provided empty `--profile` is treated as "no profile" and removes all credentials.
(Based on your team's feedback about Commander option presence checks using `!== undefined` instead of truthiness.) [FEEDBACK_USED]</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic
| // removal, the user-supplied flag is the source of truth. We deliberately | ||
| // avoid resolveProfileName() so a corrupted credentials file doesn't | ||
| // crash the "logout all" path. | ||
| const profileLabel = profileFlag ?? 'all'; |
There was a problem hiding this comment.
P1: logoutAll uses a truthiness check, so an explicitly provided empty --profile is treated as "no profile" and removes all credentials.
(Based on your team's feedback about Commander option presence checks using !== undefined instead of truthiness.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/commands/auth/logout.ts, line 62:
<comment>`logoutAll` uses a truthiness check, so an explicitly provided empty `--profile` is treated as "no profile" and removes all credentials.
(Based on your team's feedback about Commander option presence checks using `!== undefined` instead of truthiness.) </comment>
<file context>
@@ -65,12 +55,19 @@ If no credentials file exists, exits cleanly with no error.`,
+ // removal, the user-supplied flag is the source of truth. We deliberately
+ // avoid resolveProfileName() so a corrupted credentials file doesn't
+ // crash the "logout all" path.
+ const profileLabel = profileFlag ?? 'all';
- if (!logoutAll && !creds?.profiles[profileLabel]) {
</file context>
Tip: Review your code locally with the cubic CLI to iterate faster.
Linux ext4/tmpfs reuses just-freed inodes immediately, so a lock stolen via stale-detection that recreates a file at the same path can share our original inode. The inode-match check in releaseLock then mistook the stolen lock as ours and unlinked it — a regression detected only on CI. Replace inode identity with a randomUUID token written into the lock file at acquire time and verified at release time. Content comparison is robust on all platforms.
The main test workflow runs Linux only, but the file-lock and atomic-write code is platform-sensitive (inode reuse on ext4 already produced one regression). Extend test-credential-backends.yml — which already has Windows/macOS/Linux jobs probing OS keychain CLIs — with a cross-platform job that actually runs the vitest suite for: - tests/lib/file-lock.test.ts - tests/lib/write-file-atomic.test.ts - tests/lib/config.test.ts - tests/lib/config-async.test.ts - tests/commands/auth/ Widen the path trigger to src/lib/** and src/commands/auth/** (plus the parallel test paths) so any change touching the credential plumbing or its shared libs surfaces platform-specific behavior in CI. Three POSIX-permission-bit tests skip on win32 since chmod and stat.mode are no-ops on Windows.
Two pre-existing tests assumed POSIX paths and broke on the new Windows CI job: - 'respects XDG_CONFIG_HOME' compared against a hardcoded '/custom/config/resend'; on Windows path.join produces backslashes. Build the expected path with join() so it matches either separator. - 'falls back to ~/.config/resend on non-Windows' literally says non-Windows in the name. Skip it on win32 and add a parallel runIf-Windows test that asserts the APPDATA fallback.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name=".github/workflows/test-credential-backends.yml">
<violation number="1" location=".github/workflows/test-credential-backends.yml:5">
P2: Keep GitHub Actions `pull_request.paths` filters narrowly scoped to only files relevant to the workflow's target. Using broad globs like `src/lib/**` and `tests/lib/**` triggers the workflow on unrelated changes, defeating the goal of running it only when needed.
(Based on your team's feedback about keeping GitHub Actions pull_request.paths filters narrowly scoped.) [FEEDBACK_USED]</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic
| pull_request: | ||
| paths: | ||
| - src/lib/credential-backends/** | ||
| - src/lib/** |
There was a problem hiding this comment.
P2: Keep GitHub Actions pull_request.paths filters narrowly scoped to only files relevant to the workflow's target. Using broad globs like src/lib/** and tests/lib/** triggers the workflow on unrelated changes, defeating the goal of running it only when needed.
(Based on your team's feedback about keeping GitHub Actions pull_request.paths filters narrowly scoped.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/test-credential-backends.yml, line 5:
<comment>Keep GitHub Actions `pull_request.paths` filters narrowly scoped to only files relevant to the workflow's target. Using broad globs like `src/lib/**` and `tests/lib/**` triggers the workflow on unrelated changes, defeating the goal of running it only when needed.
(Based on your team's feedback about keeping GitHub Actions pull_request.paths filters narrowly scoped.) </comment>
<file context>
@@ -2,10 +2,35 @@ name: Test Credential Backends
pull_request:
paths:
- - src/lib/credential-backends/**
+ - src/lib/**
+ - src/commands/auth/**
+ - tests/lib/**
</file context>
Tip: Review your code locally with the cubic CLI to iterate faster.
Summary by cubic
Prevents silent loss of saved auth profiles by making credential writes atomic and locked, and by failing closed on corrupted JSON.
credentials.json.readCredentials()throwsCorruptedCredentialsErroron invalid JSON; file is preserved; mutators stop instead of rebuilding empty state.storeApiKey,removeApiKey,setActiveProfile,renameProfile,storeApiKeyAsync) use immutable updates under the lock;removeApiKeyno longer deletes on parse failure.Written for commit 219f90b. Summary will update on new commits.