Summary
A critical data loss bug exists in the OpenClaw Gateway's device and node pairing logic. If paired.json cannot be read on gateway startup for any transient reason (e.g., file lock, temporary I/O error, file corruption), the gateway proceeds with an empty in-memory list of pairings. The next time the state is persisted, this empty list overwrites the valid paired.json file, permanently deleting all existing pairings.
This is likely the underlying root cause for several historical, stale-closed issues related to pairing loss after gateway restarts (e.g., #22866, #21647).
Root Cause Analysis
The bug is located in the loadState function within src/infra/node-pairing.ts (and presumably a similar function in src/infra/device-pairing.ts).
File: src/infra/node-pairing.ts
async function loadState(baseDir?: string): Promise<NodePairingStateFile> {
const { pairedPath } = resolvePairingPaths(baseDir, "nodes");
const [, paired] = await Promise.all([
// ...
readJsonFile<Record<string, NodePairingPairedNode>>(pairedPath),
]);
const state: NodePairingStateFile = {
// ...
pairedByNodeId: paired ?? {}, // <-- ROOT CAUSE
};
return state;
}
The issue stems from two design decisions:
- Silent Failure in
readJsonFile: The utility function readJsonFile (from src/infra/json-files.ts) is designed to catch any and all read/parse errors and return null instead of throwing an error.
- Nullish Coalescing in
loadState: The loadState function uses the ?? {} operator, which treats the null return from a failed read as equivalent to the file being non-existent or empty.
Sequence of Events:
- Gateway starts.
loadState is called.
readJsonFile attempts to read paired.json but fails due to a transient error (e.g., another process holds a temporary lock). It returns null.
loadState receives null and initializes state.pairedByNodeId to an empty object {}.
- The gateway continues to run, now believing no devices are paired.
- An event occurs that triggers
persistState (e.g., a node attempts to reconnect, an admin action is taken).
persistState calls writeJsonAtomic, writing the empty in-memory state to paired.json, permanently destroying the user's pairing data.
The writeJsonAtomic function itself is robust, which ironically ensures the data is overwritten very reliably.
Suggested Fix
The error handling in loadState needs to be more nuanced. It should not treat a read/parse failure on an existing file as an empty state.
A potential fix would be to check if the file exists when readJsonFile returns null.
// Pseudo-code for a safer approach in loadState
const paired = await readJsonFile(pairedPath);
if (paired === null) {
// Check if the file exists but failed to parse/read
const fileExists = await fs.access(pairedPath).then(() => true).catch(() => false);
if (fileExists) {
// This is the critical error condition
// The gateway should log a severe error and perhaps enter a degraded state
// where it refuses to write back to this file to prevent data loss.
throw new Error(`Failed to read or parse existing pairing file at ${pairedPath}. Halting to prevent data loss.`);
}
}
// If we are here, either the file was read successfully, or it legitimately doesn't exist.
// It is now safe to default to an empty object.
const state = { pairedByNodeId: paired ?? {} };
This would prevent the data loss scenario and make the underlying I/O or corruption issue visible instead of silently failing.
Affected Files
src/infra/node-pairing.ts
src/infra/device-pairing.ts (likely has the same logic)
src/infra/json-files.ts (source of the silent-failing readJsonFile)
Summary
A critical data loss bug exists in the OpenClaw Gateway's device and node pairing logic. If
paired.jsoncannot be read on gateway startup for any transient reason (e.g., file lock, temporary I/O error, file corruption), the gateway proceeds with an empty in-memory list of pairings. The next time the state is persisted, this empty list overwrites the validpaired.jsonfile, permanently deleting all existing pairings.This is likely the underlying root cause for several historical, stale-closed issues related to pairing loss after gateway restarts (e.g., #22866, #21647).
Root Cause Analysis
The bug is located in the
loadStatefunction withinsrc/infra/node-pairing.ts(and presumably a similar function insrc/infra/device-pairing.ts).File:
src/infra/node-pairing.tsThe issue stems from two design decisions:
readJsonFile: The utility functionreadJsonFile(fromsrc/infra/json-files.ts) is designed to catch any and all read/parse errors and returnnullinstead of throwing an error.loadState: TheloadStatefunction uses the?? {}operator, which treats thenullreturn from a failed read as equivalent to the file being non-existent or empty.Sequence of Events:
loadStateis called.readJsonFileattempts to readpaired.jsonbut fails due to a transient error (e.g., another process holds a temporary lock). It returnsnull.loadStatereceivesnulland initializesstate.pairedByNodeIdto an empty object{}.persistState(e.g., a node attempts to reconnect, an admin action is taken).persistStatecallswriteJsonAtomic, writing the empty in-memory state topaired.json, permanently destroying the user's pairing data.The
writeJsonAtomicfunction itself is robust, which ironically ensures the data is overwritten very reliably.Suggested Fix
The error handling in
loadStateneeds to be more nuanced. It should not treat a read/parse failure on an existing file as an empty state.A potential fix would be to check if the file exists when
readJsonFilereturnsnull.This would prevent the data loss scenario and make the underlying I/O or corruption issue visible instead of silently failing.
Affected Files
src/infra/node-pairing.tssrc/infra/device-pairing.ts(likely has the same logic)src/infra/json-files.ts(source of the silent-failingreadJsonFile)