Skip to content

Memory leak from rapid FS events in watched directory — 50GB consumption, swap exhaustion, WindowServer crash #48968

@mrm007

Description

@mrm007

Summary

Zed consumed ~50GB of memory (16GB physical + ~34GB swap) when a directory it had open contained a lock file being rapidly created/destroyed by an external process. The swap exhaustion filled the disk, starved WindowServer of resources, and required a hard reboot.

Environment

  • Zed: 0.222.4 (Stable)
  • macOS: 15.7.3 (24G419)
  • Hardware: MacBook Air M4, 16GB RAM

Steps to Reproduce

  1. Open ~/.config/opencode/ (or any directory) as a project in Zed
  2. Multiple external processes rapidly create/destroy a .lock directory via proper-lockfile (the standard Node.js file locking library), contending on the same lock

Realistic reproduction using the same locking library as the trigger plugin:

Save as thrash.mjs, run bun add proper-lockfile, then launch 8 instances:

import lockfile from "proper-lockfile";
import { writeFileSync, readFileSync, mkdirSync, existsSync, renameSync } from "node:fs";
import { randomBytes } from "node:crypto";
import { join, dirname } from "node:path";

const TARGET_DIR = "/tmp/zed-leak-test";  // Open this directory in Zed
const ACCOUNTS_FILE = join(TARGET_DIR, "state.json");
const INSTANCE = process.env.INSTANCE || "1";

const LOCK_OPTIONS = {
  stale: 10000,
  retries: { retries: 999, minTimeout: 5, maxTimeout: 50, factor: 1.1 },
};

if (!existsSync(ACCOUNTS_FILE)) {
  mkdirSync(dirname(ACCOUNTS_FILE), { recursive: true });
  writeFileSync(ACCOUNTS_FILE, JSON.stringify({ version: 1, data: [] }, null, 2));
}

async function save() {
  const release = await lockfile.lock(ACCOUNTS_FILE, LOCK_OPTIONS);
  try {
    const existing = JSON.parse(readFileSync(ACCOUNTS_FILE, "utf-8"));
    existing.data = [{ instance: INSTANCE, ts: Date.now(), rand: Math.random() }];
    const tmp = `${ACCOUNTS_FILE}.${randomBytes(6).toString("hex")}.tmp`;
    writeFileSync(tmp, JSON.stringify(existing, null, 2));
    renameSync(tmp, ACCOUNTS_FILE);
  } finally {
    await release();
  }
}

let count = 0;
while (true) {
  try {
    await save();
    if (++count % 500 === 0) console.log(`[${INSTANCE}] ${count} writes`);
  } catch {
    await new Promise(r => setTimeout(r, Math.random() * 20));
  }
}
mkdir -p /tmp/zed-leak-test
# Open /tmp/zed-leak-test in Zed, then:
for i in $(seq 1 8); do INSTANCE=$i bun thrash.mjs & done
# Watch Activity Monitor — Zed memory grows ~50 MB/min
  1. Watch Zed's memory in Activity Monitor — it grows unboundedly. The file tree sidebar visibly flickers as the .lock directory appears and disappears thousands of times per second.

Observed Results

8 instances contending on the same lock for ~5 minutes:

Time Zed Footprint Notes
Start 600 MB Baseline
+1 min 760 MB Sidebar flickering
+3 min 850 MB +250 MB, still climbing
+5 min 850+ MB Stopped test

Memory growth rate: ~50 MB/min with 8 contending processes. Memory is never reclaimed — vmmap shows physical footprint only grows. Extrapolating: 8 instances running for a workday would reach ~24 GB, consistent with the original 50GB incident (which ran longer with 8-10 instances).

Expected Behavior

Zed should debounce/coalesce rapid FS events. Memory should remain bounded regardless of how fast files change in a watched directory.

Actual Behavior

Memory grows without bound. In my case:

  • ~50GB total (16GB physical RAM + ~34GB swap)
  • Swap filled the boot disk
  • WindowServer crashed (starved of disk/memory)
  • System became unresponsive; required hard reboot

Root Cause Analysis

The trigger was the opencode-antigravity-auth plugin (filed upstream), which uses proper-lockfile to guard antigravity-accounts.json. Each save cycle produces 4+ FS events:

  1. mkdir antigravity-accounts.json.lock (acquire lock)
  2. utimes on lock directory (heartbeat every 5s)
  3. Write temp file + rename to antigravity-accounts.json
  4. rmdir antigravity-accounts.json.lock (release lock)

During active use with rate limit cycling, this fires dozens of times per minute. But the bug is that Zed doesn't bound its response to FS events — any rapid-change scenario would trigger this.

Related Issues

Impact

This isn't just a performance issue — it's a system stability issue. The unbounded memory growth leads to swap exhaustion, disk full, and cascading failures (WindowServer crash, system freeze). On a 16GB machine, this went from "Zed is slow" to "hard reboot required" within the span of normal usage.

Suggested Fix

Rate-limit or debounce FS event processing. A simple approach: coalesce events per-file within a window (e.g., 100-500ms) before triggering buffer reloads or tree updates. This is standard practice in VS Code (files.watcherExclude + internal debouncing) and other editors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:performance/memory leakFeedback for memory leaksfrequency:commonBugs that happen for at least a third of the users across all platforms and kinds of usagepriority:P2Average run-of-the-mill bugsstate:needs infoIssue needs more information from the user before we can do something with itstate:needs reproNeeds reproduction steps and/or someone to reproduce

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions