feat(core): atomic write rollout for credentials, memory, config, JSONL (closes #3681, #4095 Phase 2)#4333
Conversation
Sync mirror of atomicWriteFile for code paths that can't await (settings persistence on exit, sync config writers). Same semantics: symlink chain resolution, permission preservation, fsync via flush:true, EPERM/EACCES rename retry, EXDEV fallback to direct write. Add forceMode option on AtomicWriteFileOptions — when true, ignore the existing target's permission bits and apply options.mode regardless. Needed for credential files that must heal historically over-permissive files (e.g. a 0o644 token restored from backup must be forced to 0o600). Honored by both async and sync paths. Default false preserves existing behavior. Reuses Atomics.wait for true blocking sleep in renameWithRetrySync — no busy-wait, no extra dep. Refs: #4095 Phase 2
…ier 1) Route all OAuth credential persistence through atomicWriteFile with forceMode: true, so a process crash mid-write cannot leave the user with a half-written token file, and historically over-permissive files (e.g. 0o644 from a manual restore) are healed to 0o600 on the next write. - oauth-token-storage.ts: setCredentials, deleteCredentials - file-token-storage.ts: saveTokens (encrypted MCP token storage) - qwenOAuth2.ts: cacheQwenCredentials (also fixes missing mode — was inheriting 0o644 from umask, now forced to 0o600) - sharedTokenManager.ts: saveCredentialsToFile — drops ~15 lines of hand-rolled tmp + rename in favor of the shared helper Lock-file writes using flag: 'wx' (sharedTokenManager.ts:720) are intentionally left untouched — they rely on exclusive-create semantics that atomic write does not preserve. Tests updated to mock atomicWriteFile instead of fs.writeFile. Refs: #4095 Phase 2
…Tier 2) Route all auto-memory state persistence through atomicWriteFile so a process crash during a dream/extract/forget cycle cannot corrupt the metadata sidecar, extraction cursor, or topic body files. Touched: manager (writeDreamMetadata), extract (writeExtractCursor + bumpMetadata), indexer (rebuild), dream (bumpDreamMetadata), forget (bumpMetadata + topic body rewrite). manager.ts:362 acquireDreamLock uses flag: 'wx' for exclusive create — left untouched, atomic write does not preserve that semantic. Uses atomicWriteFile (not atomicWriteJSON) to preserve the trailing newline these files have always had. Refs: #4095 Phase 2
…4095 Tier 3a) Route the remaining state-file write paths through atomic helpers so a crash mid-write cannot corrupt config, log, or session-scoped state: - trustedFolders.ts (sync): atomicWriteFileSync — sole path that flips workspace trust, must not half-write - logger.ts (4 sites): atomicWriteFile — full-file JSON rewrites for logs.json and per-checkpoint files - tipHistory, installationManager, projectSummary, todoWrite, trustedHooks: bonus sites with the same shape (state JSON written multiple times per session) todoWrite is on the hot path — writes every time the todo list mutates — so the added rename + fsync cost is measurable (a few ms per write on SSD). Trade-off accepted to avoid a half-written todos file silently breaking the next session's resume. Export atomicWriteFile / atomicWriteFileSync from the core public index so CLI-side callers (trustedFolders, tipHistory) can reach them. Tests updated: - logger.test.ts uses vi.importActual to re-export the real helper and override per-test via vi.mocked(atomicWriteFile).mockRejectedValueOnce - trustedFolders.test.ts and todoWrite.test.ts mock the helper directly Refs: #4095 Phase 2
#3656 fixed the read side of glued '}{' JSONL records — when a process was killed mid-appendFile, the trailing '\n' was lost and the next record was concatenated. The write side was left for a follow-up (#3681). This adds flush:true (fsync) to every per-line append: - jsonl-utils.ts writeLine / writeLineSync (session transcripts, auto-titles, prompt history) - debugLogger.ts appendFile (per-session debug log) jsonl-utils.ts write() (full-file replace) now goes through atomicWriteFileSync so a crash during overwrite cannot corrupt the session transcript either. Trade-off: fsync on every append adds disk-sync latency (single-digit ms on SSD, more on spinning disk / network FS). Acceptable for a few writes per turn; the alternative is silently losing the last record of every interrupted session, which #3681 explicitly flagged. Refs: #4095 Phase 2 Tier 3b Closes: #3681
Catch up two sites where Claude Code's equivalent path is atomic but qwen-code's isn't (verified against /Users/jinye.djy/Projects/claude-code on 2026-05-19): - extension/extensionManager.ts:533, :1073 — enablement config and install metadata writes. Claude Code's plugin install-counts and zip cache use atomic temp+rename via writeFileSyncAndFlush_DEPRECATED. - lsp/NativeLspService.ts:1351 — applying an LSP edit to a user file. Claude Code's FileWriteTool/FileEditTool both route through atomic writeTextContent → writeFileSyncAndFlush_DEPRECATED. A bare writeFileSync here could half-write the user's source file if the process is killed during an LSP-driven rename or quick-fix. Also clean up stale fs.rename mock setups in sharedTokenManager.test.ts that became no-ops after Tier 1 migration (rename is no longer called by saveCredentialsToFile). The fs.writeFile mocks stay because the wx-flag lock path still uses them. Refs: #4095 Phase 2
- packages/core/src/index.ts: move atomicFileWrite export to its alphabetical position (before browser.js) - tipHistory.ts: add forceMode: true to atomicWriteFileSync for consistency with other 0o600 sites — heals legacy 0o644 files even though tips are non-critical Refs: #4095 Phase 2
Three issues caught by post-merge Codex review of the #4095 Phase 2 branch — none had user-visible symptoms yet but all were latent bugs. 1. atomicFileWrite: forceMode without mode silently downgraded perms `if (!options?.forceMode)` skipped the existing-mode stat whenever forceMode was true, regardless of whether `mode` was also supplied. Calling `atomicWriteFile(p, data, { forceMode: true })` (no mode) on an existing 0o600 file produced 0o644 (umask default) instead of preserving 0o600. Tightened the guard to also require `options.mode` to be defined; mirrored fix in atomicWriteFileSync. Added two regression tests (async + sync) that assert mode preservation. 2. logger.test.ts: vi.resetAllMocks() blanked the atomicWriteFile shim The vi.fn(actual.atomicWriteFile) factory implementation gets reset to a no-op by `vi.resetAllMocks()` in beforeEach, which would make `logger.initialize()` silently skip creating logs.json on disk. Tests passed by coincidence (file pre-existence from prior runs). Captured the real implementation at module load and re-attach it via `mockImplementation` after each reset. 3. NativeLspService.applyTextEdits: atomic write bypassed file unwritability The read catch swallowed every error and treated it as "new empty file". With atomic write (tmp + rename), an unreadable target on a writable parent could be replaced with edits applied to an empty buffer — the old fs.writeFileSync would have errored on the target permission. Now only ENOENT is treated as new-file; other read errors (EACCES, EISDIR, etc.) propagate. Refs: #4095 Phase 2
The previous fix only handled "read failed → propagate the error". Codex round 2 caught the remaining gap: a file that's readable but chmod 0444 (read-only) would still be replaced by the atomic rename, because rename only needs parent-directory write access. Add an explicit fs.accessSync(W_OK) check before the atomic write. ENOENT is allowed through so LSP can still create new files via edits. Refs: #4095 Phase 2
…nd 3)
`saveCredentialsToFile` wrapped `atomicWriteFile` in `withTimeout(5000)`.
If the call hits the 5s budget (e.g. slow NFS home, network-backed
storage, fsync added by Phase 2), withTimeout rejects but the
atomicWriteFile internal write+rename keeps running unobserved:
1. withTimeout rejects → saveCredentialsToFile throws
2. performTokenRefresh `finally` releases the refresh lock
3. Another process acquires the lock and writes newer credentials
4. The original atomicWriteFile finally completes its rename and
overwrites the newer credentials — silent token rollback
Pre-migration the code awaited the tmp write and the rename in two
separate withTimeout calls; a timed-out tmp write never reached the
rename so there was no race against the target file. The migration
collapsed both into one inseparable atomicWriteFile, which made the
timeout actively unsafe (the work cannot be cancelled after the
timeout fires — fs.rename is not abortable).
Atomic write is durable by design — accept the I/O latency. The
mkdir and stat timeouts are kept (idempotent and read-only
respectively, no corruption risk on late completion).
Refs: #4095 Phase 2
There was a problem hiding this comment.
Pull request overview
This PR expands the atomic file-write rollout across core persistence paths, focusing on credential safety, JSONL/session durability, memory metadata, config/state files, and LSP-driven edits.
Changes:
- Adds synchronous atomic write support and
forceModepermission tightening. - Migrates high-value bare writes/appends to atomic writes or flushed appends.
- Updates affected unit tests and mocks to validate the new write paths.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
packages/core/src/utils/atomicFileWrite.ts |
Adds forceMode, sync atomic write helper, and sync rename retry support. |
packages/core/src/utils/atomicFileWrite.test.ts |
Adds sync helper and forceMode regression coverage. |
packages/core/src/utils/jsonl-utils.ts |
Flushes JSONL appends and atomically rewrites full JSONL files. |
packages/core/src/utils/debugLogger.ts |
Flushes debug log appends. |
packages/core/src/utils/debugLogger.test.ts |
Updates append expectations for flushed writes. |
packages/core/src/utils/projectSummary.ts |
Uses atomic write for welcome-back state. |
packages/core/src/utils/installationManager.ts |
Uses sync atomic write for installation ID persistence. |
packages/core/src/core/logger.ts |
Uses atomic writes for log and checkpoint JSON files. |
packages/core/src/core/logger.test.ts |
Mocks atomic writes while preserving real disk behavior by default. |
packages/core/src/qwen/qwenOAuth2.ts |
Writes cached Qwen credentials atomically with restricted mode. |
packages/core/src/qwen/qwenOAuth2.test.ts |
Adds atomic write mock. |
packages/core/src/qwen/sharedTokenManager.ts |
Replaces manual temp/rename credential write with atomic helper. |
packages/core/src/qwen/sharedTokenManager.test.ts |
Updates credential save mocks. |
packages/core/src/mcp/oauth-token-storage.ts |
Stores MCP OAuth token files with atomic restricted writes. |
packages/core/src/mcp/oauth-token-storage.test.ts |
Updates tests for atomic token writes. |
packages/core/src/mcp/token-storage/file-token-storage.ts |
Stores encrypted token file with atomic restricted write. |
packages/core/src/mcp/token-storage/file-token-storage.test.ts |
Updates encrypted token write expectations. |
packages/core/src/memory/manager.ts |
Writes auto-memory metadata atomically. |
packages/core/src/memory/indexer.ts |
Writes memory index atomically. |
packages/core/src/memory/forget.ts |
Writes memory metadata and topic files atomically. |
packages/core/src/memory/extract.ts |
Writes extraction cursor and metadata atomically. |
packages/core/src/memory/dream.ts |
Writes dream metadata atomically. |
packages/core/src/tools/todoWrite.ts |
Writes todo state atomically. |
packages/core/src/tools/todoWrite.test.ts |
Updates todo write tests for atomic helper. |
packages/core/src/lsp/NativeLspService.ts |
Applies LSP edits via sync atomic write with read/write error handling. |
packages/core/src/hooks/trustedHooks.ts |
Writes trusted hooks config atomically. |
packages/core/src/extension/extensionManager.ts |
Writes extension enablement and install metadata atomically. |
packages/core/src/index.ts |
Re-exports atomic file write utilities. |
packages/cli/src/services/tips/tipHistory.ts |
Writes tip history atomically with restricted mode. |
packages/cli/src/config/trustedFolders.ts |
Writes trusted folders config atomically with restricted mode. |
packages/cli/src/config/trustedFolders.test.ts |
Updates trusted folder save expectations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…bling-cook # Conflicts: # packages/core/src/qwen/qwenOAuth2.ts
Address PR review suggestions from wenshao (via qwen-latest /review): neither renameWithRetry/Sync nor the EXDEV cross-device fallback had direct test coverage. Both paths are critical (Windows AV contention, Docker tmpfs /tmp) and a regression would degrade silently. Vitest can't spy on ESM exports of `node:fs` (`Cannot redefine property: renameSync`), so add narrow internal test seams instead: - renameWithRetry / renameWithRetrySync take an optional `_renameImpl` parameter, defaulting to fs.rename / fs.renameSync. - atomicWriteFile / atomicWriteFileSync take an optional `_testFs` parameter with `rename` and `writeFile` overrides, forwarded to the retry helper and used in the EXDEV fallback branch. The seams are underscore-prefixed and JSDoc-tagged as "Internal test seam — production callers never pass this", which keeps the public API clean while making the behavior testable. New coverage (+9 tests, 36 → 45): - renameWithRetry: retry-EPERM-then-succeed, give-up after retries, no-retry on non-retryable (ENOSPC) - renameWithRetrySync: same 3 patterns (EACCES, EPERM exhausted, EINVAL) - EXDEV fallback: async direct write + tmp cleanup, sync ditto, non-EXDEV failure propagates without fallback (rejects EIO + tmp cleanup) Refs: #4333, #4095 Phase 2
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…sh:true CI failure on all 3 OSes (macos / ubuntu / windows): sdk.test.ts asserted `fs.appendFile` was called with `'utf8'` as the 3rd positional argument, but commit b7badc7 (#4095 Tier 3b — JSONL fsync) changed the `debugLogger.ts` appendFile call from string-form to options-form `{ encoding: 'utf8', flush: true }` to enable per-line fsync. Update the 3 assertions in the telemetry diagnostics test to match the new shape. No behavior change — debugLogger still flushes per append; only the assertion in this previously-unrelated suite needed updating. Refs: #4333, #4095 Phase 2 Tier 3b
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] qwenOAuth2.ts cacheQwenCredentials was de-migrated from atomicWriteFile back to a hand-rolled temp+chmod+rename pattern during merge conflict resolution (commit 7b1a7ae38). The PR description still lists qwenOAuth2 under Tier 1 migration and the behavior-changes table references qwen/qwenOAuth2.ts:982, but the file is no longer in the diff and line 982 now points to pollDeviceToken (the function starts at ~1060 after the merge).
The hand-rolled version uses a predictable temp path component (process.pid) versus atomicWriteFile's crypto.randomBytes(6), and lacks EPERM rename retry and EXDEV fallback. Consider either re-migrating to atomicWriteFile (consolidating onto the shared, tested helper) or moving qwenOAuth2 to the "Deliberately not in scope" section with a rationale and fixing the stale line reference.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
…4333 review) Address three review suggestions from wenshao (via qwen-latest /review), each pointing at a real coverage gap introduced by this PR: 1. NativeLspService.applyTextEdits error branches (round-2 LSP fix): the ENOENT-only read guard and the fs.accessSync(W_OK) refusal had no automated coverage. Added 3 tests accessing applyTextEdits via a typed cast (the method is private; making it public for one verification inflates API surface). Tests use chmod 0000 / chmod 0444 reproducers and assert (a) read failure propagates EACCES without silently overwriting with empty content, (b) W_OK rejects with EACCES/EPERM before the atomic rename touches the target, (c) nonexistent files are still accepted so LSP can create via edits. 2. atomicWriteFileSync non-EXDEV rename failure cleanup: the async counterpart had an explicit EIO-rename test asserting tmp cleanup; the sync variant did not. Added the mirror — injects a sync rename throwing EIO via the existing _testFs seam and asserts `readdirSync(tmpDir).length === 0`. 3. jsonl-utils writeLine / writeLineSync / write smoke tests: the three write paths are the core fix for #3681 (the PR's headline goal) but downstream callers (chatRecordingService, sessionService) mock them entirely. Without direct unit tests, a regression that dropped `flush: true` or reverted `write()` to bare writeFileSync would go undetected. Added 3 real-fs roundtrip tests. Test count delta: - NativeLspService.test.ts: 15 → 18 - atomicFileWrite.test.ts: 45 → 46 - jsonl-utils.test.ts: 22 → 25 Refs: #4333, #4095 Phase 2
|
Re #4333 (review) ( Updated the PR body to remove the stale Why not re-migrate: upstream PR #4255 added The trade-off is real: upstream's hand-roll loses EPERM retry + EXDEV fallback + symlink chain resolution vs the shared helper. That gap is worth a follow-up that extends |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] saveCredentialsToFile at packages/core/src/qwen/sharedTokenManager.ts calls atomicWriteFile successfully (credentials persisted), then calls fs.stat (wrapped in withTimeout(5000)) to update memoryCache.fileModTime. If fs.stat fails (slow NFS, transient mount issue), the catch block throws TokenManagerError(FILE_ACCESS_ERROR, "Failed to write credentials file: ..."). The error message is misleading: the credentials were successfully written. The failure is only in the cache-update step. In a retry loop, this could trigger an unnecessary token refresh storm. Consider wrapping the fs.stat in its own try/catch that logs but does not propagate the error — memoryCache.fileModTime staleness is self-healing on next read.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
…dHooks tests Three PR #4333 round-14 review items: 1. **Path-level tryChmod / tryChmodSync catch narrowing had zero direct coverage**. Round-13 narrowed the bare `catch {}` to ENOSYS/ENOTSUP only (matching the round-8 fd-level fchmod narrowing), but every existing EXDEV test passed `options: undefined`, so `desiredMode === undefined` short-circuited before any chmod was attempted. A regression that inverted the catch condition (missing `!` prefix) would silently swallow EPERM on every sync credential write (trustedHooks, tipHistory, trustedFolders, etc.) with zero diagnostic signal. Added 6 tests via the existing `_testFs.chmod` seam: parameterized ENOSYS/ENOTSUP swallow + EPERM propagation, for both async (`atomicWriteFile`) and sync (`atomicWriteFileSync`). 2. **Orphan-cleanup unlink on the noFollow EXDEV failure path was using raw `fs.unlink` / `fsSync.unlinkSync` instead of the injected `unlinkImpl` seam**. The pre-open unlink correctly used the seam, but the round-9 orphan cleanup added later bypassed it, making it the only fs operation in `atomicWriteFile` not flowing through the test seam. Routed both async and sync orphan cleanup through `unlinkImpl`, and added 2 tests that inject a spy and assert orphan cleanup is invoked against targetPath after a simulated fchmod EPERM. 3. **`trustedHooks.ts` had no test coverage**. Round-11 migrated it to `atomicWriteFileSync` with `{ mode: 0o600, forceMode: true }` — strictly the most security-sensitive write in the PR since the file stores user-approved executable hook commands — but unlike the sibling files (trustedFolders, tipHistory) it had no test file. A regression that dropped `forceMode: true` or weakened the mode would have passed all existing tests. Created `trustedHooks.test.ts` covering: write goes through atomicWriteFileSync with `{ mode: 0o600, forceMode: true }`, write targets the global qwen dir path, the persisted content matches the hook key derived from the hook config, and round-trip trust/untrust behavior. Test results: 73/73 atomicFileWrite tests (was 65) + 5/5 trustedHooks (new). Typecheck clean. Refs: #4333, #4095 Phase 2
…eanup through seam Two PR #4333 round-15 review items: 1. **`trustedHooks.ts` and `trustedFolders.ts` were missing `noFollow: true`**. The credential write sites (`sharedTokenManager`/`oauth-token-storage`/`file-token-storage`) all pass `{ mode: 0o600, forceMode: true, noFollow: true }` to prevent pre-placed symlink attacks. The trustedHooks comment already called it "strictly more sensitive than trustedFolders / tipHistory" — yet credential paths got symlink protection it didn't. A pre-placed symlink at `~/.qwen/trusted_hooks.json` (or `trustedFolders.json`) could redirect the atomic write to an attacker-controlled path, either leaking the executable-trust list / trusted-folder list, or leaving the user's real config silently stale. Added `noFollow: true` to both write sites and updated the assertions in `trustedHooks.test.ts` and `trustedFolders.test.ts`. 2. **Tmp-file cleanup at L240 (async) and L568 (sync) used raw `fs.unlink` / `fsSync.unlinkSync` instead of the injected `unlinkImpl` seam**. Pre-open unlink and orphan cleanup correctly routed through `unlinkImpl`, making the tmp-cleanup branch the only outlier — `_testFs.unlink`-injecting tests couldn't intercept this path, weakening the seam abstraction. Behavioral impact is nil (cleanup is best-effort, errors swallowed), but consistency matters for future test authors. Routed both async and sync variants through `unlinkImpl`. Test results: 99/99 affected suites pass (atomicFileWrite 73, trustedHooks 5, trustedFolders 21). Typecheck clean on core. Refs: #4333, #4095 Phase 2
本地真实测试报告(merge 参考)在本地完成了 PR body 中描述的两阶段验证(Part 1 库级脚本 + Part 2 tmux + 测试环境
1. 构建与类型检查
2. 单元测试(针对本 PR 触及的所有路径)10 个 test 文件、373 个用例全部通过:
3. Part 1 — 库级 verify-atomic-helpers 脚本直接调用源码 逐项:
4. Part 2 — 真实 CLI(tmux +
|
| 检查 | 期望 | 实际 |
|---|---|---|
kill -9 46583 后所有 review-pr-4333 进程消失 |
是 | ✅ |
JSON.parse(logs.json) |
解析为数组 | ✅ length=29,最后一条 type=user |
JSONL 行级解析 + }{ 粘连扫描 |
全部解析 / 0 粘连 | ✅ 7 行,0 parse_failures,0 glued_records |
JSONL 末尾 }\n 收尾 |
是 | ✅ endsWithNewline=true, lastBeforeNL=} |
| 杀进程前 vs 杀进程后 JSONL/logs.json 字节数 | 不被截断 | ✅ JSONL 13413→13413,logs 6230→6230 |
重启后 /resume picker 列出被杀会话 |
是 | ✅ 「请用大约600字详细描述一个软件工程师调试一个race condition的故事...」「1 minute ago · pr-4333-fresh」 排在最上 |
| 选中后会话内容完整加载 | 是 | ✅ 已完成的 race-condition 故事完整可见,context 2.2% |
未在 JSONL 中出现 partial 流式片段——符合预期(per-turn 在 turn 完成时整体写入,未完成的 turn 不会留下半截记录)。
5. 未覆盖范围
诚实声明几项无法在本机验证的项:
- OAuth refresh + kill -9 端到端:本机使用 API Key 认证,OAuth 路径处于休眠状态。代码层面是同形
atomicWriteFile(creds, {mode:0o600, forceMode:true, noFollow:true}),由sharedTokenManager.test.ts(32) 与qwenOAuth2.test.ts(83) 间接覆盖 - Windows AV 压力下的
renameWithRetryEPERM 重试:无 Windows 主机;现有单测覆盖逻辑分支但非真实 AV atomicWriteFileSync在process.exithook 中的 SharedArrayBuffer blockingSleep:未跑 stress test,相关单测有覆盖
6. 结论
- ✅ Build / 类型检查 / 单测全绿(373 tests, 10 files, 全 PR 触及路径覆盖)
- ✅ Part 1 库级 17/17 通过——三轮 Codex review 修复(forceMode-no-mode、LSP
accessSync(W_OK)、withTimeout移除)均回归验证 - ✅ Part 2 真实 CLI kill -9 没有产生任何损坏:
logs.json仍是合法 JSON 数组,JSONL 7/7 行解析无粘连,/resume完整恢复 - 行为变化(凭据 0o600 healing、JSONL fsync 引入 +几 ms 延迟、
saveCredentialsToFile移除 5s timeout)已在 PR body 显式列出,与 PR 描述一致
从本地真实测试角度,该 PR 可以合并。 #3681 的 }{ 粘连闭环验证通过,#4095 Phase 2 的所有迁移点行为正确。
wenshao
left a comment
There was a problem hiding this comment.
Round 13 review with qwen3.7-max (second opinion after R12 with qwen-latest-series-invite-beta-v34). tsc + eslint clean. 15,507 tests pass (5 pre-existing failures unrelated to this PR). 2 new Suggestions found (test coverage gaps); 7 additional findings flagged for human review. — qwen3.7-max via Qwen Code /review
PR #4333 round-15 review item (Critical, from claude-opus-4-8 /qreview): trustedHooks.test.ts:68 hardcoded a POSIX path ('/mock/home/.qwen/trusted_hooks.json'), but production builds the path with path.join(Storage.getGlobalQwenDir(), 'trusted_hooks.json') (trustedHooks.ts:29-31). On Windows path.join emits '\' separators, so the mocked atomicWriteFileSync receives '\mock\home\.qwen\trusted_hooks.json' and the toBe assertion fails — the cause of the red Test (windows-latest, Node 22.x) check. macOS/Linux runs are green because forward slashes match. Build the expected path with path.join so it matches the platform separator, and add the node:path import. Refs: #4333, #4095 Phase 2
…st coverage + LSP concurrency note Four PR #4333 follow-up review items from wenshao: 1. tipHistory.ts was missing noFollow: true while the sibling 0o600+forceMode config sites (trustedFolders.ts:192, trustedHooks.ts:63) both set it. The cleanup commit added forceMode "for consistency with other 0o600 sites" but stopped short of noFollow. All three store user-trusted paths and should share the same pre-placed-symlink protection. Added noFollow: true. 2. annotateWriteError idempotency guard had no direct coverage. The guard switched from includes(targetPath) to startsWith(fnName+"(") because real syscall errors embed the *tmp* path (which contains the target as a substring), so the old guard silently skipped annotation on every real failure. Added a test where the rename error message embeds a tmp-style path containing the target; the correct startsWith guard still annotates it, and reverting to includes would fail it. 3. non-EXDEV rename failure tests only asserted /EIO/, not the atomicWriteFile(...): / atomicWriteFileSync(...): annotation prefix that the production re-throw applies on that path. Tightened both async and sync assertions to match the prefix. 4. applyTextEdits concurrency constraint was undocumented. The async read-modify-write has await points between read and write; atomicWriteFile prevents torn files but does not serialize writers, so concurrent same-path edits can lose updates. Latent today (no production caller / no workspace/applyEdit handler), so documented the per-file-serialization requirement instead of adding a lock. Not taking: extending the _testFs seam with open/openSync to fault-inject O_EXCL EEXIST — the noFollow/O_EXCL security behavior is already covered by the real-fs behavioral tests (noFollow EXDEV fallback refuses to follow symlinks); a seam-injected EEXIST would only exercise error propagation, not the guarantee. Test results: atomicFileWrite 74/74 pass, eslint clean. Refs: #4333, #4095 Phase 2
Round-16 review follow-up (31d4a17)
Tests: atomicFileWrite 74/74 pass, eslint clean. |
wenshao
left a comment
There was a problem hiding this comment.
No new issues found. LGTM! ✅
Reviewed the full diff (atomic-write migration of credential / memory / config / logger / JSONL paths + the forceMode/noFollow options and sync helper). All migration sites preserve prior encoding/mode/newline semantics, the async and sync helper variants are semantically equivalent on every path, CI is green, and the targeted suites pass. The exhaustive prior review rounds resolved every substantive concern.
— claude-opus-4-8 via Claude Code /qreview
…O_EXCL (PR #4333 review) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
|
@doudouOUC heads-up — this branch went stale and now conflicts with The conflict is one file: One thing to watch while resolving: #4431 added the Could you merge |
…bling-cook # Conflicts: # packages/core/src/utils/atomicFileWrite.ts
… into worktree-ethereal-bubbling-cook
tanzhenxin
left a comment
There was a problem hiding this comment.
Re-checked after the merge of main: the #4431 ownership preservation is correctly re-integrated — ownershipWouldChange() + the in-place uid fallback are back in atomicWriteFile, and all 5 ownership regression tests are present, so the stale-base concern is resolved. CI is green on the current head (Test matrix passes on macOS / Ubuntu / Windows). LGTM 👍
Summary
Phase 2 of #4095 — replace the remaining bare
fs.writeFile/fs.writeFileSync/fs.appendFilecalls in security-sensitive and data-integrity paths with the atomic helpers introduced by Phase 1 (PR #4096). Also closes #3681 (JSONL session writer durability).A process killed mid-write —
kill -9, OOM, power loss, AV scan stalling rename, FS unmount mid-flush — could previously leave OAuth credentials, memory metadata, session transcripts, or trust-folder config half-written or with glued}{JSONL records. After this PR all of those go throughatomicWriteFile/atomicWriteFileSync(temp + fsync + rename + EXDEV fallback + EPERM retry).Ref: #4095 Phase 2. Closes: #3681.
What changed
Ten commits, each independently green:
Core migration (6 commits):
feat(core): add atomicWriteFileSync + forceMode option— new sync helper mirroring async API (flush:true, symlink chain, EXDEV fallback, EPERM retry viaAtomics.waitblocking sleep). NewforceModeoption ignores existing-target permissions and appliesoptions.moderegardless — needed for credentials so a historically over-permissive file (e.g. 0o644 restored from backup) gets healed to 0o600 on next write.refactor(core): migrate credential writes (Tier 1)—oauth-token-storage,file-token-storage,sharedTokenManager. All write with{mode: 0o600, forceMode: true}. (cacheQwenCredentialsinqwenOAuth2.tswas initially included but de-migrated during merge with upstream PR feat(serve): auth device-flow route (#4175 Wave 4 PR 21) #4255 — see "Deliberately not in scope" below.)refactor(core): migrate memory state writes (Tier 2)—memory/manager,extract,indexer,dream,forget. Trailing-newline preserved.refactor: migrate config + logger + state writes (Tier 3a)—trustedFolders(sync),logger(4 sites), plus state-file siblings the issue didn't list but match the same risk profile:tipHistory,installationManager,projectSummary,todoWrite,trustedHooks.fix(core): flush JSONL appends to disk (Tier 3b, closes #3681)—jsonl-utils.tswriteLine/writeLineSyncgetflush: true;write()full-file replace goes throughatomicWriteFileSync;debugLogger.tsappendFile getsflush: true.refactor(core): migrate extension config + LSP edit to atomic write— catch-up commit after auditing Claude Code's equivalents:extensionManager.tsenablement config + install metadata,NativeLspService.tsLSP-driven user-file edit.Cosmetic cleanup (1 commit):
7.
chore: cosmetic cleanups from PR review—index.tsexport ordering;tipHistory.tsaddsforceMode: truefor consistency with other 0o600 sites.Codex-review-driven fixes (3 commits): all three were latent bugs flagged across three Codex review rounds. None had user-visible symptoms yet — caught before merge.
8.
fix(core): address Codex review findings on Phase 2 PR— three latent bugs from round 1:modesilently downgraded perms.if (!options?.forceMode)skipped the existing-mode stat wheneverforceModewas true, regardless of whethermodewas also supplied. CallingatomicWriteFile(p, data, { forceMode: true })(no mode) on an existing 0o600 file produced 0o644 (umask default). Tightened guard to also requireoptions.mode; mirrored in sync; added regression tests that assert mode preservation.vi.resetAllMocks()blanked theatomicWriteFileshim.vi.fn(actual.atomicWriteFile)factory implementation gets reset to no-op byvi.resetAllMocks()inbeforeEach, which would makelogger.initialize()silently skip creatinglogs.json. Tests passed by coincidence (file pre-existence). Captured real implementation at module load and re-attached viamockImplementation.NativeLspService.applyTextEditsswallowed all read errors. A read failure (EACCES on read-protected file, EISDIR, etc.) was treated as "new empty file" and the atomic rename could replace the original. Now only ENOENT is treated as new-file; other errors propagate.fix(lsp): refuse LSP edits to chmod 0444 files (round 2)— even after the previous fix, a file that's readable butchmod 0444(read-only) would still be replaced by atomic rename (rename only needs parent-directory write access). Added explicitfs.accessSync(W_OK)check before the atomic write; ENOENT still allowed so LSP can create new files.fix(core): drop withTimeout around atomic credential write (round 3)—saveCredentialsToFilewrappedatomicWriteFileinwithTimeout(5000). If the call hit the budget (slow NFS, fsync stall), withTimeout would reject while the atomicWriteFile internal write+rename kept running unobserved → the refresh lock got released → another process could write newer credentials → the original atomicWriteFile finally completed its rename and silently overwrote the newer token. Atomic write is durable by design andfs.renameis not abortable — accept the I/O latency.mkdirandstattimeouts kept (idempotent and read-only respectively).mcp/oauth-token-storage.ts,mcp/token-storage/file-token-storage.ts,qwen/sharedTokenManager.tsforceMode: trueheals historically over-permissive token files to 0o600utils/jsonl-utils.tswriteLine/writeLineSyncflush: trueper append}{glued records fromkill -9sharedTokenManager.saveCredentialsToFilewithTimeoutaround the writeNativeLspService.applyTextEditsW_OKcheck before writechmod 0444files now throw EACCES (matches pre-Phase-2 behavior); LSP edits on read-protected files no longer silently overwrite with an empty bufferDeliberately not in scope
Audited the repo for other bare-write call sites; the following are intentionally left as-is. For each I checked Claude Code's equivalent path; verdicts in parentheses.
flag: 'wx':sharedTokenManager:708,memory/manager:363,memory/store:49,chatRecordingService:633,sessionService:937— exclusive-create semantics, must not become atomic. (Claude Code agrees.)qwen/qwenOAuth2.tscacheQwenCredentials— initially migrated as part of Tier 1, then de-migrated during merge with upstream PR feat(serve): auth device-flow route (#4175 Wave 4 PR 21) #4255 (the daemon device-flow registry work). Upstream addedAbortSignalthreading +SharedTokenManager.getInstance().clearCache()invalidation into the hand-rolled atomic write.atomicWriteFiledoesn't accept a signal (andfs.renameis not abortable), so re-migrating would silently drop signal cancellation — the exact race I just fixed in round 3 forsaveCredentialsToFile. Kept upstream's hand-rolled atomic, accept slightly weaker guarantees (no EPERM retry / EXDEV fallback) in exchange for correct cancellation semantics.services/gitService.ts,services/gitWorktreeService.ts— both planned for deprecation (shadow-git checkpointing is being replaced byfileHistoryServiceper PR feat(rewind): add file restoration support to /rewind command #4064; worktree path is also slated to be retired). Not worth investing atomic-write effort in code on the way out. Defer permanently.extension/extensionSettings.tsenv writes,extension/claude-converter.ts,extension/gemini-converter.ts,extension/variables.ts— Claude Code has no equivalent (no per-extension env, no converters). Defer.agents/agent-transcript.ts:127,agents/arena/ArenaManager.ts:1616— Claude Code's sub-agent metadata sidecars use bare writeFile too (write-once, re-derivable). Defer.core/prompts.ts:406prompt dump,utils/openaiLogger.ts,tools/shell.ts,utils/truncation.ts,tools/modifiable-tool.ts,core/config/config.ts:2600— transient / per-request / scratch files. Claude Code's equivalent debug-dump uses appendFile too. Defer.cli/src/serve/httpAcpBridge.ts:1274— already a hand-rolled atomic write withwxtmp + rename. Worth folding into the shared helper in a follow-up PR with BOM/encoding regression testing.cli/src/config/settings.ts:875recovery write-back — has its own backup machinery; touching it risks settings-migration regressions, defer.Test plan
Automated
npx vitest run packages/core/src/utils/atomicFileWrite.test.ts— 36 tests (19 async, 13 sync, 5 forceMode including 2 round-1 regression cases for the "no mode" edge)npx vitest run packages/core/src/utils/jsonl-utils.test.ts packages/core/src/utils/debugLogger.test.ts— 44 passlogger.test.ts52 pass —beforeEachre-binds the realatomicWriteFileaftervi.resetAllMocks()(fix from Codex round 1)trustedFolders.test.ts21,todoWrite.test.ts28,extensionManager.test.ts42, LSP suite 62 passnpx tsc --noEmit -p packages/core && npx tsc --noEmit -p packages/clicleanManual smoke verification — completed
Verified on macOS arm64, Node v24.12.0, against this branch (
worktree-ethereal-bubbling-cook). Combined library-level integration script (covers crash-safety claims that don't need a model) with a realnpm run devtmux run (proves the actual code paths users hit fire correctly under kill -9).Part 1 — library-level integration script (17/17 pass)
Direct exercise of
atomicWriteFile/atomicWriteFileSync/writeLine/writeLineSyncagainst real disk, including a kill -9 subprocess for the JSONL claim. Confirms every Codex-round fix is regression-proof at the helper level.verify-atomic-helpers.mjs — full script (click to expand)
Actual output (run on this branch):
Part 2 — real CLI end-to-end (tmux +
npm run dev)Validates the helper changes work inside an actual qwen-code session — specifically that
logger.ts(logs.json full-file rewrite) andjsonl-utils.writeLine(per-turn session transcript append) survive a hard process kill and/resumecan still read the JSONL. This is the closest reproduction of the real #3681 failure mode.Environment: API key auth (
selectedType: "openai"→ DASHSCOPE → glm-5). OAuth path is dormant for API-key users, so the high-value paths under load arelogger.tsandjsonl-utils.ts— exactly what's verified here.Commands (run from the worktree root):
Observed results:
glm-5shownkill -9 <leaf PID>mid-responseJSON.parse(logs.json)}{/resumepicker请用大约200字详细描述...hellovisibleCoverage matrix — PR claim → verification
atomicWriteFileis crash-atomic (write tmp + fsync + rename)atomicWriteFileSyncmirrors async semanticsforceMode: trueheals legacy 0o644 → 0o600forceModewithoutmodepreserves existing perms (Codex round-1 fix)flush: truesurvives kill -9 — no glued}{(closes #3681)chmod 0444(Codex round-2 fix)withTimeoutremoval eliminates token-overwrite race (Codex round-3 fix)sharedTokenManager.test.ts(31 pass)/rewind-style session resume still works after a kill/resumepicker + select)Not covered (out of scope for local verification)
~/.qwen/oauth_creds.jsonin the test environment held test fixtures, and the user's CLI uses API-key auth instead. The code path is identical to the verifiedatomicWriteFile(creds, {mode:0o600, forceMode:true})shape, just driven by the OAuth refresh callback rather than the logger.renameWithRetry/renameWithRetrySyncEPERM retry path is exercised by the existing unit tests but not under live AV pressure.Out-of-scope follow-ups
cli/src/serve/httpAcpBridge.ts:1274hand-rolled atomic write into the shared helper~/.qwen/settings.json— defense in depth, low priority🤖 Generated with Qwen Code