Skip to content

[Bug]: Gateway data loss: Silent read error on paired.json leads to overwrite of all pairings #71873

@iret77

Description

@iret77

Summary

A critical data loss bug exists in the OpenClaw Gateway's device and node pairing logic. If paired.json cannot be read on gateway startup for any transient reason (e.g., file lock, temporary I/O error, file corruption), the gateway proceeds with an empty in-memory list of pairings. The next time the state is persisted, this empty list overwrites the valid paired.json file, permanently deleting all existing pairings.

This is likely the underlying root cause for several historical, stale-closed issues related to pairing loss after gateway restarts (e.g., #22866, #21647).

Root Cause Analysis

The bug is located in the loadState function within src/infra/node-pairing.ts (and presumably a similar function in src/infra/device-pairing.ts).

File: src/infra/node-pairing.ts

async function loadState(baseDir?: string): Promise<NodePairingStateFile> {
  const { pairedPath } = resolvePairingPaths(baseDir, "nodes");
  const [, paired] = await Promise.all([
    // ...
    readJsonFile<Record<string, NodePairingPairedNode>>(pairedPath),
  ]);
  const state: NodePairingStateFile = {
    // ...
    pairedByNodeId: paired ?? {}, // <-- ROOT CAUSE
  };
  return state;
}

The issue stems from two design decisions:

  1. Silent Failure in readJsonFile: The utility function readJsonFile (from src/infra/json-files.ts) is designed to catch any and all read/parse errors and return null instead of throwing an error.
  2. Nullish Coalescing in loadState: The loadState function uses the ?? {} operator, which treats the null return from a failed read as equivalent to the file being non-existent or empty.

Sequence of Events:

  1. Gateway starts.
  2. loadState is called.
  3. readJsonFile attempts to read paired.json but fails due to a transient error (e.g., another process holds a temporary lock). It returns null.
  4. loadState receives null and initializes state.pairedByNodeId to an empty object {}.
  5. The gateway continues to run, now believing no devices are paired.
  6. An event occurs that triggers persistState (e.g., a node attempts to reconnect, an admin action is taken).
  7. persistState calls writeJsonAtomic, writing the empty in-memory state to paired.json, permanently destroying the user's pairing data.

The writeJsonAtomic function itself is robust, which ironically ensures the data is overwritten very reliably.

Suggested Fix

The error handling in loadState needs to be more nuanced. It should not treat a read/parse failure on an existing file as an empty state.

A potential fix would be to check if the file exists when readJsonFile returns null.

// Pseudo-code for a safer approach in loadState
const paired = await readJsonFile(pairedPath);
if (paired === null) {
  // Check if the file exists but failed to parse/read
  const fileExists = await fs.access(pairedPath).then(() => true).catch(() => false);
  if (fileExists) {
    // This is the critical error condition
    // The gateway should log a severe error and perhaps enter a degraded state
    // where it refuses to write back to this file to prevent data loss.
    throw new Error(`Failed to read or parse existing pairing file at ${pairedPath}. Halting to prevent data loss.`);
  }
}
// If we are here, either the file was read successfully, or it legitimately doesn't exist.
// It is now safe to default to an empty object.
const state = { pairedByNodeId: paired ?? {} };

This would prevent the data loss scenario and make the underlying I/O or corruption issue visible instead of silently failing.

Affected Files

  • src/infra/node-pairing.ts
  • src/infra/device-pairing.ts (likely has the same logic)
  • src/infra/json-files.ts (source of the silent-failing readJsonFile)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions