Skip to content

feat: atomic file write & transaction rollback #4095

@doudouOUC

Description

@doudouOUC

Note

Phase 1 shipped in #4096. Post-release the rename-based atomic write was found to reset file ownership in Docker and shared-workspace setups (POSIX rename creates a new inode owned by the writing process's euid:egid, silently dropping the original uid:gid). Mitigation PR #4431 adds an ownership-preserving fallback inside atomicWriteFile; whether to also revert user-file write paths (Write / Edit / NotebookEdit) to plain in-place fs.writeFile is still under evaluation. Full trade-off analysis: docs/design/2026-05-22-atomic-file-write-strategy.md.

Context

The core file write paths in Qwen Code (Write tool, Edit tool) use bare fs.writeFile. If the process crashes or power is lost mid-write, the file is left half-written and corrupt. The codebase already acknowledges this — write-file.ts:371-385 and edit.ts:487-497 both contain explicit TODOs:

the only way to close it is an atomic write (write-to-temp + rename)... deferred to a follow-up

Qwen Code already has atomicWriteJSON (6 call sites across 2 files) but it only supports JSON and lacks flush: true (fsync). This issue tracks the full rollout of atomic writes and data safety mechanisms.

Related issues

Reference:


Phase 1: Generic atomic write + core path integration (~120 lines / 0.5 day)

Goal: Make all Write and Edit tool file operations atomic, eliminating crash-induced file corruption.

1.1 Extend atomicFileWrite.ts — add atomicWriteFile()

File: packages/core/src/utils/atomicFileWrite.ts

  • Add generic atomicWriteFile(filePath, data: string | Buffer, options?) function
  • Support flush: true (fsync), permission preservation (stat target mode → chmod tmp), encoding
  • Symlink resolution: before writing, call fs.realpath() to resolve symlinks. Write the tmp file next to the real target (not next to the symlink) and rename to the real target path. This prevents rename('tmp', 'symlink') from replacing the symlink itself instead of writing through it. (Matches Claude Code's readlinkSync + resolve pattern in writeFileSyncAndFlush_DEPRECATED)
  • Write temp file in the same directory as the resolved target to guarantee rename stays on the same filesystem
  • Fallback to direct write on EXDEV; cleanup tmp and rethrow on other errors
  • Export renameWithRetry for reuse by other modules
  • Refactor atomicWriteJSON to delegate to atomicWriteFile internally (adding missing flush: true)

1.2 Deduplicate renameWithRetry

File: packages/core/src/utils/runtimeStatus.ts

  • Remove the private renameWithRetry at runtimeStatus.ts:220-239 (identical to atomicFileWrite.ts:50-72)
  • Replace with import { renameWithRetry } from './atomicFileWrite.js'
  • Refactor writeRuntimeStatus() inline tmp+rename (L110-121) to use atomicWriteJSON

1.3 Wire fileSystemService.writeTextFile() to atomic write

File: packages/core/src/services/fileSystemService.ts

Replace all 4 bare fs.writeFile calls in StandardFileSystemService.writeTextFile() (L214-262):

Line Current Replace with
L244-247 fs.writeFile(filePath, Buffer.concat(...)) atomicWriteFile(filePath, Buffer.concat(...))
L249 fs.writeFile(filePath, encoded) atomicWriteFile(filePath, encoded)
L258 fs.writeFile(filePath, Buffer.concat(...)) atomicWriteFile(filePath, Buffer.concat(...))
L260 fs.writeFile(filePath, content, 'utf-8') atomicWriteFile(filePath, content, { encoding: 'utf-8' })

Note: The second FileSystemService implementation (AcpFileSystemService in cli/src/acp-integration/service/filesystem.ts) either delegates to a remote ACP connection or falls back to a StandardFileSystemService instance — no changes needed there.

1.4 Add fsync to writeWithBackup.ts

File: packages/cli/src/utils/writeWithBackup.ts

Line 81 uses fs.writeFileSync(tempPath, content, { encoding }) (synchronous). Add flush: true:

fs.writeFileSync(tempPath, content, { encoding, flush: true });

1.5 Upgrade @types/node

The flush option in fs.writeFile/fs.writeFileSync was added in Node 21.2. The project requires Node >=22 (compatible), but @types/node is pinned to ^20.11.24 in packages/cli/package.json (L90), which may lack the flush type definition. Upgrade to @types/node >= 22 to avoid needing type assertions.

1.6 Tests

File: packages/core/src/utils/atomicFileWrite.test.ts (already exists with 5 tests for atomicWriteJSON)

Append new test cases for atomicWriteFile:

  • Writes string content to a new file
  • Writes Buffer content to a new file
  • Preserves existing file permissions
  • Sets explicit mode via options
  • Does not leave temp files on success
  • Cleans up temp file on write failure
  • Cleans up temp file on rename failure
  • Falls back to direct write on EXDEV error
  • Resolves symlinks correctly (writes through symlink to real target)

Phase 2: Batch fix remaining bare fs.writeFile calls (~80 lines / 0.5 day)

Goal: Replace all other high-risk bare fs.writeFile calls with atomic writes.

Tier 1 — Security-sensitive (credentials/tokens)

File Action
core/src/mcp/oauth-token-storage.ts (L102, L182) atomicWriteFile(path, data, { mode: 0o600 })
core/src/mcp/token-storage/file-token-storage.ts (L103) atomicWriteFile(path, encrypted, { mode: 0o600 })
core/src/qwen/qwenOAuth2.ts (L982) atomicWriteFile(path, credString, { mode: 0o600 })
core/src/qwen/sharedTokenManager.ts (L639) Add flush: true + use shared renameWithRetry

Tier 2 — Data integrity (Memory subsystem)

File Action
core/src/memory/manager.ts (L291) atomicWriteJSON
core/src/memory/extract.ts (L93, L118) atomicWriteFile
core/src/memory/indexer.ts (L81) atomicWriteJSON
core/src/memory/dream.ts (L125) atomicWriteFile
core/src/memory/forget.ts (L225, L290) atomicWriteFile

Tier 3 — Configuration & session durability (subsumes #3681)

File Action
cli/src/config/trustedFolders.ts (L182) atomicWriteFile
core/src/core/logger.ts (L160, L231, L338) atomicWriteFile / atomicWriteJSON; add flush: true to writeLine/writeLineSync append paths (closes #3681)

Phase 3: FileCheckpointService (~400 lines / 2 days)

Goal: Per-turn automatic file snapshots with /rewind support for precise rollback to any message point.

Based on Claude Code's fileHistory.ts:

  • New file: packages/core/src/services/fileCheckpointService.ts
  • trackBeforeEdit(filePath, messageId) — called at the start of Write/Edit tool execute() to back up the file before modification
  • makeSnapshot(messageId) — called at the end of each agent turn
  • restoreToSnapshot(messageId) — restore all files to a given message point
  • Backup storage: ~/.qwen-code/sessions/<sessionId>/checkpoints/<contentHash>@v<N>
  • Content-hash dedup (identical content is not stored twice)
  • Incremental snapshots (unchanged files reference previous version)
  • Max 100 snapshots with LRU eviction
  • Integrate with /rewind command

Phase 4: Tool result disk overflow (~150 lines / 1 day)

Goal: Spill oversized tool outputs to disk to prevent OOM and context pollution.

Based on Claude Code's toolResultStorage.ts:

  • New file: packages/core/src/services/toolResultPersistence.ts
  • Threshold: OVERFLOW_THRESHOLD = 50KB (~12K tokens)
  • Overflow content written to ~/.qwen-code/sessions/<sessionId>/tool-results/<toolUseId>.txt
  • History messages retain only a summary stub (first/last 10% truncated)
  • Background cleanup of overflow files older than 7 days on /clear or session exit
  • Check size and trigger overflow in contentGenerator.ts when collecting tool results

Estimated effort

Phase Scope Estimate Risk eliminated
1 Generic atomicWriteFile + core paths ~120 lines / 0.5 day User file corruption on crash
2 Batch fix remaining bare writes ~80 lines / 0.5 day Token/memory/config corruption; closes #3681
3 FileCheckpointService ~400 lines / 2 days Irreversible tool operations, enables /rewind
4 Tool result overflow ~150 lines / 1 day OOM prevention + token cost reduction

Total: ~750 lines / 4 days

Recommended execution order: Phase 1 → 2 → 3 → 4. Phase 1 is the minimum viable change that directly closes the TODOs already marked in the code.

🤖 Generated with Qwen Code

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions