Summary
The external-channel-reorder migration added in #2433 (detectChannelMoves in src/server/utils/channelMoveDetection.ts) produces false-positive "moves" whenever two channels share the same PSK and name. On every config sync it then "migrates" messages between those slots, progressively scrambling which channel messages appear under. On a node that reconnects / re-syncs frequently this fires repeatedly and corrupts message↔channel attribution.
The message-apply side (migrateMessagesForChannelMoves) is not at fault — it already handles swaps correctly (temp channel -99, transactional). The bug is purely in detection.
Affected versions
Present since #2433 (~v3.10.2, Mar 2026). Confirmed still present on main (4.10.2) — the matching logic in channelMoveDetection.ts is unchanged and has no uniqueness guard. Observed in production on 4.9.x (PostgreSQL, multi-source).
Root cause
const newCh = afterSnapshot.find(ch =>
ch.id !== oldCh.id &&
ch.psk === oldCh.psk &&
(ch.name || '') === (oldCh.name || ''));
For each old channel the detector flags a "move" to any other slot holding the same (psk, name) — with no uniqueness guard and no check that the channel actually vacated its slot. If two channels share (psk, name):
oldCh@1 matches after@3 → move 1→3
oldCh@3 matches after@1 → move 3→1
…a phantom bidirectional swap, even though nothing moved. It recurs on every config sync, and the swap-aware apply step dutifully swaps the messages back and forth each time, so a message's channel ends up depending on how many syncs have occurred.
(psk, name) is not a guaranteed-unique key — Meshtastic allows duplicate channel definitions, and default public channels share the well-known PSK — so this is reachable in normal use.
Evidence (production deployment)
audit_log recorded the identical swap dozens of times — e.g. channel_migration_on_startup → Channel moves: slot 1→3, slot 3→1 logged 8+ separate times in a single day, and slot 2→3, slot 3→2 repeatedly across prior days. Genuine reorders do not recur identically on every config sync; this is the detector mis-firing. (On the affected node, channels 0 and 1 — MediumFast/LongFast — already share the default public-key PSK and differ only by name, so a near-duplicate (psk,name) is one rename away.)
The damage rate scales with config-sync frequency, so it is worst on high-traffic / frequently-reconnecting nodes.
Impact
Messages drift to the wrong channel over time: a busy channel's history surfaces under an unrelated channel, and other channels appear empty. Cumulative and hard to notice until the attribution is badly scrambled.
Steps to reproduce
- Configure (or have the device report) two channels with identical name and PSK.
- Let MeshMonitor run a config sync →
audit_log shows channel_migration_on_startup with a slot X→Y, slot Y→X swap.
- Repeat syncs / reconnect → the same phantom swap re-fires and messages migrate back and forth.
Suggested fix
Detection must fail safe (decline) when an identity is ambiguous instead of guessing. Restrict matching to (psk, name) keys that are unique in both snapshots:
export function detectChannelMoves(before, after) {
const key = ch => JSON.stringify([ch.psk, ch.name || '']);
const counts = snap => {
const m = new Map();
for (const ch of snap) if (ch.psk) m.set(key(ch), (m.get(key(ch)) || 0) + 1);
return m;
};
const beforeCounts = counts(before), afterCounts = counts(after);
const moves = [];
for (const oldCh of before) {
if (!oldCh.psk) continue;
const k = key(oldCh);
if (beforeCounts.get(k) !== 1 || afterCounts.get(k) !== 1) continue; // ambiguous → skip
const newCh = after.find(ch => ch.id !== oldCh.id && key(ch) === k);
if (newCh) moves.push({ from: oldCh.id, to: newCh.id });
}
return moves;
}
For a unique (psk,name), the only same-identity channel at the same slot is excluded by ch.id !== oldCh.id, so no-ops no longer produce phantom moves, while genuine single-moves and unique-identity swaps still detect correctly (and the apply step's temp-channel swap handling is unchanged). Tradeoff: a genuine move of a channel whose (psk,name) is duplicated won't be migrated — but messages stay in place (no corruption), which is strictly better than today.
Environment
MeshMonitor 4.9.x, PostgreSQL, multi-source (several Meshtastic gateways). High reconnect/config-sync frequency on the affected high-traffic node amplified the corruption rate.
Summary
The external-channel-reorder migration added in #2433 (
detectChannelMovesinsrc/server/utils/channelMoveDetection.ts) produces false-positive "moves" whenever two channels share the same PSK and name. On every config sync it then "migrates" messages between those slots, progressively scrambling which channel messages appear under. On a node that reconnects / re-syncs frequently this fires repeatedly and corrupts message↔channel attribution.The message-apply side (
migrateMessagesForChannelMoves) is not at fault — it already handles swaps correctly (temp channel-99, transactional). The bug is purely in detection.Affected versions
Present since #2433 (~v3.10.2, Mar 2026). Confirmed still present on
main(4.10.2) — the matching logic inchannelMoveDetection.tsis unchanged and has no uniqueness guard. Observed in production on 4.9.x (PostgreSQL, multi-source).Root cause
For each old channel the detector flags a "move" to any other slot holding the same
(psk, name)— with no uniqueness guard and no check that the channel actually vacated its slot. If two channels share(psk, name):oldCh@1matchesafter@3→ move1→3oldCh@3matchesafter@1→ move3→1…a phantom bidirectional swap, even though nothing moved. It recurs on every config sync, and the swap-aware apply step dutifully swaps the messages back and forth each time, so a message's channel ends up depending on how many syncs have occurred.
(psk, name)is not a guaranteed-unique key — Meshtastic allows duplicate channel definitions, and default public channels share the well-known PSK — so this is reachable in normal use.Evidence (production deployment)
audit_logrecorded the identical swap dozens of times — e.g.channel_migration_on_startup→Channel moves: slot 1→3, slot 3→1logged 8+ separate times in a single day, andslot 2→3, slot 3→2repeatedly across prior days. Genuine reorders do not recur identically on every config sync; this is the detector mis-firing. (On the affected node, channels 0 and 1 — MediumFast/LongFast — already share the default public-key PSK and differ only by name, so a near-duplicate(psk,name)is one rename away.)The damage rate scales with config-sync frequency, so it is worst on high-traffic / frequently-reconnecting nodes.
Impact
Messages drift to the wrong channel over time: a busy channel's history surfaces under an unrelated channel, and other channels appear empty. Cumulative and hard to notice until the attribution is badly scrambled.
Steps to reproduce
audit_logshowschannel_migration_on_startupwith aslot X→Y, slot Y→Xswap.Suggested fix
Detection must fail safe (decline) when an identity is ambiguous instead of guessing. Restrict matching to
(psk, name)keys that are unique in both snapshots:For a unique
(psk,name), the only same-identity channel at the same slot is excluded bych.id !== oldCh.id, so no-ops no longer produce phantom moves, while genuine single-moves and unique-identity swaps still detect correctly (and the apply step's temp-channel swap handling is unchanged). Tradeoff: a genuine move of a channel whose(psk,name)is duplicated won't be migrated — but messages stay in place (no corruption), which is strictly better than today.Environment
MeshMonitor 4.9.x, PostgreSQL, multi-source (several Meshtastic gateways). High reconnect/config-sync frequency on the affected high-traffic node amplified the corruption rate.