Skip to content

Commit 8f65faf

Browse files
committed
feat(API): cross-reference datalayer-owned stores in /diagnostics
Add owned-store cross-reference to chia.datalayer to detect the "datalayer or chia wallet forgot about a store CADT owns" failure mode, which leaves stores unwritable until the wallet is restored. Three new fields appear under chia.datalayer: - ownedStores: actual list from datalayer get_owned_stores RPC - totalOwnedStores: count of ownedStores - expectedOwnedStores: [{storeId, label, owned}] sourced from CADT's home org records (V1 + V2) and the local Meta tables for governance body stores when this node IS the governance body Status escalation: chia.datalayer.status becomes critical when any expected store has owned=false, with message "expected owned store is not owned by datalayer and can't be written to: <id> (<label>)". owned=null (datalayer RPC failure) does not escalate -- the existing datalayer-unreachable critical already covers that case. V2 governance entries are gated on MetaV2.mainGoveranceBodyId being present. That key is upserted only by the V2 governance create paths (createGoveranceBody, addV2ToExistingGovernanceBody) and never by GovernanceV2.subscribeToGovernanceBody, so its presence is the binary "this node IS the V2 governance body" signal. Without this gate, every V2 subscriber would generate a false-positive critical for the remote governance body store they do not own. V1 has no subscribe-to-governance-body flow, so both V1 governance Meta keys (mainGoveranceBodyId, governanceBodyId) are always owned when present and pass straight through. Lazy stores (V1 fileStoreId / dataModelVersionStoreId, V2 file_store_subscribed / data_model_version_store_id) are skipped when null -- datalayer cannot have forgotten a store CADT has not asked it to create yet, so listing them would generate false-positive criticals. readHomeOrgRecord falls back to a direct raw findOne if getHomeOrg throws (V1's JSON.parse(metadata) has no try/catch), so corrupted org metadata does not silently drop the entire owned-store cross-reference -- exactly the failure operators are likely hitting /diagnostics to debug. Tests: 11 pure-function unit tests for collectOwnedStoreExpectations, 1 pure-function test for the extracted escalateLostOwnedStores helper, 1 shape test for the new response fields, and 1 end-to-end test that seeds a V2 home org and asserts datalayer.status=critical against the real production wiring.
1 parent ac348fa commit 8f65faf

3 files changed

Lines changed: 534 additions & 15 deletions

File tree

src/routes/diagnostics.js

Lines changed: 252 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -80,16 +80,63 @@ const settle = async (label, producer, timeoutMs = DEFAULT_TIMEOUT_MS) => {
8080
}
8181
};
8282

83-
const readHomeOrgId = async (Model, getter = 'getHomeOrg') => {
83+
/**
84+
* Returns the full home-org record (raw object) for the given Model, or null
85+
* when no home org is configured. Used to populate both the cadtSection
86+
* homeOrgId field and the expected-owned-stores helper. We prefer
87+
* Model.getHomeOrg(false) (includeAddress=false) so any model-level
88+
* processing (metadata parsing, field normalization) is applied -- and to
89+
* avoid the wallet RPC call that includeAddress=true triggers, which
90+
* /diagnostics must not require.
91+
*
92+
* If `getHomeOrg` throws (e.g. V1 throws on malformed `metadata` JSON --
93+
* see organizations.model.js, no try/catch around JSON.parse), we fall
94+
* back to a direct raw findOne on the home-org row. That recovers the
95+
* owned-store cross-reference for the very failure mode operators are
96+
* likely hitting /diagnostics to debug.
97+
*/
98+
const readHomeOrgRecord = async (Model, getter = 'getHomeOrg', whereClause) => {
8499
try {
85-
// includeAddress=false avoids a wallet RPC call inside getHomeOrg --
86-
// /diagnostics must keep working when the wallet is unreachable.
87-
const homeOrg = await Model[getter](false);
88-
if (!homeOrg) return null;
89-
// V1 uses `orgUid`, V2 uses `org_uid`
90-
return homeOrg.orgUid || homeOrg.org_uid || null;
100+
// `null` is a legitimate "no home org configured" answer (common on
101+
// fresh installs) -- return it without triggering the fallback, which
102+
// is only useful when getHomeOrg threw on otherwise-valid data.
103+
return (await Model[getter](false)) || null;
91104
} catch (error) {
92-
logger.debug(`[diagnostics]: home org lookup failed: ${error.message}`);
105+
logger.warn(`[diagnostics]: ${getter} failed: ${error.message}; trying raw findOne`);
106+
}
107+
if (!whereClause) return null;
108+
try {
109+
return (await Model.findOne({ where: whereClause, raw: true })) || null;
110+
} catch (error) {
111+
logger.warn(`[diagnostics]: raw home org fallback failed: ${error.message}`);
112+
return null;
113+
}
114+
};
115+
116+
const homeOrgUid = (record) =>
117+
record ? record.orgUid || record.org_uid || null : null;
118+
119+
/**
120+
* Read a single string value out of the V1 Meta table, or null when the key
121+
* is absent. Used for governance store IDs that only exist when this node
122+
* IS the governance body (the local creation flow upserts these keys).
123+
*/
124+
const readMetaValueV1 = async (Meta, metaKey) => {
125+
try {
126+
const row = await Meta.findOne({ where: { metaKey }, raw: true });
127+
return row?.metaValue || null;
128+
} catch (error) {
129+
logger.debug(`[diagnostics]: V1 Meta[${metaKey}] lookup failed: ${error.message}`);
130+
return null;
131+
}
132+
};
133+
134+
const readMetaValueV2 = async (MetaV2, metaKey) => {
135+
try {
136+
const row = await MetaV2.findOne({ where: { meta_key: metaKey }, raw: true });
137+
return row?.meta_value || null;
138+
} catch (error) {
139+
logger.debug(`[diagnostics]: V2 Meta[${metaKey}] lookup failed: ${error.message}`);
93140
return null;
94141
}
95142
};
@@ -176,6 +223,111 @@ const collectSubscriptions = async (persistance) => {
176223
};
177224
};
178225

226+
/**
227+
* Pure helper: given the list of expected-owned-store entries (as produced
228+
* by `collectOwnedStoreExpectations`), escalate `accumulator` to critical
229+
* when one or more entries have `owned === false`. Entries with
230+
* `owned === null` (datalayer RPC failure -> ownership unknown) are not
231+
* escalated -- the existing datalayer-unreachable critical already covers
232+
* that case. Returns the list of escalated entries for tests/log purposes.
233+
*
234+
* Exported through `__test` so the integration test can drive the real
235+
* escalation path (instead of re-implementing it).
236+
*/
237+
const escalateLostOwnedStores = (accumulator, expectedOwnedStores) => {
238+
const lostStores = (expectedOwnedStores || []).filter((s) => s.owned === false);
239+
if (lostStores.length > 0) {
240+
const detail = lostStores.map((s) => `${s.storeId} (${s.label})`).join(', ');
241+
accumulator.escalate(
242+
'critical',
243+
`expected owned store is not owned by datalayer and can't be written to: ${detail}`,
244+
);
245+
}
246+
return lostStores;
247+
};
248+
249+
/**
250+
* Pure helper: cross-reference the set of datalayer-owned store IDs against
251+
* the stores CADT itself created (home org + registry + lazily-created
252+
* file/data-model stores + locally-created governance stores). Detects the
253+
* "datalayer/chia wallet forgot about a store we created" failure mode,
254+
* which leaves the store unwritable until the wallet is restored.
255+
*
256+
* Inputs are pre-resolved so this function is sync and trivially testable.
257+
* - ownedStoresResult: result from persistance.getOwnedStores() or null on RPC failure
258+
* - v1HomeOrg / v2HomeOrg: home org records or null
259+
* - v1GovernanceBodyStoreId / v1GovernanceVersionStoreId / v2GovernanceBodyStoreId /
260+
* v2GovernanceVersionStoreId: governance store IDs that this node owns. The
261+
* caller is responsible for distinguishing owner vs subscriber: it must
262+
* pass null for any key the local node does not own. V1 has no
263+
* subscribe-to-governance-body flow so both V1 keys are always owned when
264+
* present; V2 *does* have one, and `MetaV2.governanceBodyId` is upserted
265+
* on subscribe -- so the V2 caller must gate both V2 entries on the
266+
* presence of `MetaV2.mainGoveranceBodyId`, which subscribers do not write.
267+
*
268+
* Lazily-created stores (V1 fileStoreId / dataModelVersionStoreId, V2
269+
* file_store_subscribed / data_model_version_store_id) are skipped when
270+
* NULL -- datalayer cannot have "forgotten" a store CADT hasn't asked it
271+
* to create yet, so listing them would generate false-positive criticals.
272+
*/
273+
const collectOwnedStoreExpectations = ({
274+
ownedStoresResult,
275+
v1HomeOrg,
276+
v2HomeOrg,
277+
v1GovernanceBodyStoreId,
278+
v1GovernanceVersionStoreId,
279+
v2GovernanceBodyStoreId,
280+
v2GovernanceVersionStoreId,
281+
}) => {
282+
const ownedStores =
283+
ownedStoresResult && ownedStoresResult.success ? ownedStoresResult.storeIds || [] : null;
284+
const ownedSet = ownedStores ? new Set(ownedStores) : null;
285+
286+
const expected = [];
287+
const push = (storeId, label) => {
288+
if (storeId) expected.push({ storeId, label });
289+
};
290+
291+
if (v1HomeOrg) {
292+
push(v1HomeOrg.orgUid, 'v1 home org');
293+
push(v1HomeOrg.registryId, 'v1 registry');
294+
push(v1HomeOrg.fileStoreId, 'v1 file store');
295+
push(v1HomeOrg.dataModelVersionStoreId, 'v1 data model version store');
296+
}
297+
298+
if (v2HomeOrg) {
299+
push(v2HomeOrg.org_uid, 'v2 home org');
300+
push(v2HomeOrg.registry_id, 'v2 registry');
301+
// file_store_subscribed is overloaded: on the home org row it stores the
302+
// locally-created file store ID (filestore-v2.model.js
303+
// createDataLayerStoreWithRetry path) and is genuinely owned. On a
304+
// non-home/subscribed org row it stores a REMOTE org's file store ID
305+
// (organizations-v2.model.js upsert path). The caller only passes the
306+
// home-org record here, so this is safe today -- if a future code path
307+
// ever lands a non-owned ID on the home-org row, it will produce a
308+
// false-positive critical, so any future writer of this column must
309+
// continue to use a locally-created store ID for is_home=true rows.
310+
push(v2HomeOrg.file_store_subscribed, 'v2 file store');
311+
push(v2HomeOrg.data_model_version_store_id, 'v2 data model version store');
312+
}
313+
314+
push(v1GovernanceBodyStoreId, 'v1 governance body');
315+
push(v1GovernanceVersionStoreId, 'v1 governance version store');
316+
push(v2GovernanceBodyStoreId, 'v2 governance body');
317+
push(v2GovernanceVersionStoreId, 'v2 governance version store');
318+
319+
const expectedOwnedStores = expected.map((entry) => ({
320+
...entry,
321+
owned: ownedSet ? ownedSet.has(entry.storeId) : null,
322+
}));
323+
324+
return {
325+
ownedStores,
326+
totalOwnedStores: ownedStores ? ownedStores.length : null,
327+
expectedOwnedStores,
328+
};
329+
};
330+
179331
/**
180332
* Normalize a chia peer node_id for comparison: strip a leading 0x and
181333
* lowercase. Both the chia config's `wallet.trusted_peers` keys and the
@@ -251,8 +403,8 @@ export const getDiagnosticsResponse = async () => {
251403
const wallet = (await import('../datalayer/wallet.js')).default;
252404
const fullNodeRpc = (await import('../datalayer/fullNodeRpc.js')).default;
253405
const persistance = await import('../datalayer/persistance.js');
254-
const { Organization } = await import('../models/index.js');
255-
const { OrganizationsV2 } = await import('../models/v2/index.js');
406+
const { Organization, Meta } = await import('../models/index.js');
407+
const { OrganizationsV2, MetaV2 } = await import('../models/v2/index.js');
256408
const fullNodeModule = await import('../datalayer/fullNode.js');
257409

258410
// Phase 1: fast local probes (process scan, system info, chia-tools).
@@ -303,8 +455,52 @@ export const getDiagnosticsResponse = async () => {
303455
() => collectSubscriptions(persistance),
304456
SUBSCRIPTION_BUDGET_MS + 1000,
305457
),
306-
settle('Organization.getHomeOrg', () => readHomeOrgId(Organization), DEFAULT_TIMEOUT_MS),
307-
settle('OrganizationsV2.getHomeOrg', () => readHomeOrgId(OrganizationsV2), DEFAULT_TIMEOUT_MS),
458+
settle(
459+
'Organization.getHomeOrg',
460+
() => readHomeOrgRecord(Organization, 'getHomeOrg', { isHome: true }),
461+
DEFAULT_TIMEOUT_MS,
462+
),
463+
settle(
464+
'OrganizationsV2.getHomeOrg',
465+
() => readHomeOrgRecord(OrganizationsV2, 'getHomeOrg', { is_home: true }),
466+
DEFAULT_TIMEOUT_MS,
467+
),
468+
// Detect "datalayer/chia wallet forgot about a store CADT owns" failures
469+
// (see chia.datalayer.expectedOwnedStores in the response). Falls back to
470+
// ownedStoresResult=null on RPC failure, which makes per-expected owned
471+
// flags null (unknown) rather than false -- the existing datalayer-
472+
// unreachable critical already covers that case.
473+
settle('persistance.getOwnedStores', () => persistance.getOwnedStores(), DEFAULT_TIMEOUT_MS),
474+
// V1 governance: both Meta keys are upserted only by
475+
// Governance.createGoveranceBody (there is no V1 subscribe-to-body
476+
// flow), so both are always owned when present.
477+
settle(
478+
'Meta.mainGoveranceBodyId',
479+
() => readMetaValueV1(Meta, 'mainGoveranceBodyId'),
480+
DEFAULT_TIMEOUT_MS,
481+
),
482+
settle(
483+
'Meta.governanceBodyId',
484+
() => readMetaValueV1(Meta, 'governanceBodyId'),
485+
DEFAULT_TIMEOUT_MS,
486+
),
487+
// V2 governance: `mainGoveranceBodyId` is upserted ONLY by the create
488+
// paths (createGoveranceBody / addV2ToExistingGovernanceBody), never by
489+
// GovernanceV2.subscribeToGovernanceBody. `governanceBodyId` is upserted
490+
// by both create AND subscribe, so it cannot be treated as an
491+
// owned-store marker on its own. We use the presence of
492+
// `mainGoveranceBodyId` below as the binary "this node IS the V2
493+
// governance body" gate before publishing either V2 entry.
494+
settle(
495+
'MetaV2.mainGoveranceBodyId',
496+
() => readMetaValueV2(MetaV2, 'mainGoveranceBodyId'),
497+
DEFAULT_TIMEOUT_MS,
498+
),
499+
settle(
500+
'MetaV2.governanceBodyId',
501+
() => readMetaValueV2(MetaV2, 'governanceBodyId'),
502+
DEFAULT_TIMEOUT_MS,
503+
),
308504
];
309505

310506
const [
@@ -320,6 +516,11 @@ export const getDiagnosticsResponse = async () => {
320516
subscriptionsRes,
321517
homeOrgV1Res,
322518
homeOrgV2Res,
519+
ownedStoresRes,
520+
metaV1MainGovBodyRes,
521+
metaV1GovBodyRes,
522+
metaV2MainGovBodyRes,
523+
metaV2GovBodyRes,
323524
] = await Promise.all(rpcSettles);
324525

325526
// walletReachable: derived from the get_network_info RPC return value, NOT
@@ -340,6 +541,9 @@ export const getDiagnosticsResponse = async () => {
340541
const enableV2 = configV2?.ENABLE !== false;
341542
const chiaRoot = getChiaRoot();
342543

544+
const v1HomeOrg = homeOrgV1Res.ok ? homeOrgV1Res.value : null;
545+
const v2HomeOrg = homeOrgV2Res.ok ? homeOrgV2Res.value : null;
546+
343547
// ---- CADT section -------------------------------------------------------
344548
const cadtSection = {
345549
version: packageJson.version,
@@ -359,15 +563,15 @@ export const getDiagnosticsResponse = async () => {
359563
isGovernanceBody: configV1.IS_GOVERNANCE_BODY === true,
360564
apiKeyConfigured: !!(configV1.CADT_API_KEY && configV1.CADT_API_KEY !== ''),
361565
governanceBodyId: configV1.GOVERNANCE?.GOVERNANCE_BODY_ID || null,
362-
homeOrgId: homeOrgV1Res.ok ? homeOrgV1Res.value : null,
566+
homeOrgId: homeOrgUid(v1HomeOrg),
363567
},
364568
v2: {
365569
enabled: enableV2,
366570
readOnly: configV2.READ_ONLY === true,
367571
isGovernanceBody: configV2.IS_GOVERNANCE_BODY === true,
368572
apiKeyConfigured: !!(configV2.CADT_API_KEY && configV2.CADT_API_KEY !== ''),
369573
governanceBodyId: configV2.GOVERNANCE?.GOVERNANCE_BODY_ID || null,
370-
homeOrgId: homeOrgV2Res.ok ? homeOrgV2Res.value : null,
574+
homeOrgId: homeOrgUid(v2HomeOrg),
371575
},
372576
};
373577

@@ -446,13 +650,40 @@ export const getDiagnosticsResponse = async () => {
446650
const subscriptionsValue = subscriptionsRes.ok
447651
? subscriptionsRes.value
448652
: { available: false, subscriptions: [], truncated: false, totalSubscriptions: 0, error: subscriptionsRes.error };
653+
654+
// Owned-store cross-reference: detect the "datalayer/chia wallet forgot
655+
// about a store CADT owns" failure mode. The pure helper handles all
656+
// null/missing cases (no home org, RPC failure, lazy stores not yet
657+
// created). V2 governance store IDs are gated on
658+
// MetaV2.mainGoveranceBodyId being present -- that key is set ONLY by
659+
// the create paths, never by GovernanceV2.subscribeToGovernanceBody, so
660+
// its presence is the binary "this node IS the V2 governance body"
661+
// signal. V1 has no subscribe path, so both V1 governance keys are
662+
// always owned when present.
663+
const v2IsGovernanceBody = !!(metaV2MainGovBodyRes.ok && metaV2MainGovBodyRes.value);
664+
const ownedStoreView = collectOwnedStoreExpectations({
665+
ownedStoresResult: ownedStoresRes.ok ? ownedStoresRes.value : null,
666+
v1HomeOrg,
667+
v2HomeOrg,
668+
v1GovernanceBodyStoreId: metaV1MainGovBodyRes.ok ? metaV1MainGovBodyRes.value : null,
669+
v1GovernanceVersionStoreId: metaV1GovBodyRes.ok ? metaV1GovBodyRes.value : null,
670+
v2GovernanceBodyStoreId: v2IsGovernanceBody ? metaV2MainGovBodyRes.value : null,
671+
v2GovernanceVersionStoreId:
672+
v2IsGovernanceBody && metaV2GovBodyRes.ok ? metaV2GovBodyRes.value : null,
673+
});
674+
const ownedStoresError = ownedStoresRes.ok ? null : ownedStoresRes.error;
675+
449676
return {
450677
rpcUrl: appConfig.DATALAYER_URL || null,
451678
reachable,
452679
subscriptions: subscriptionsValue.subscriptions,
453680
totalSubscriptions: subscriptionsValue.totalSubscriptions,
454681
truncated: subscriptionsValue.truncated,
455682
...(subscriptionsValue.error ? { subscriptionsError: subscriptionsValue.error } : {}),
683+
ownedStores: ownedStoreView.ownedStores,
684+
totalOwnedStores: ownedStoreView.totalOwnedStores,
685+
expectedOwnedStores: ownedStoreView.expectedOwnedStores,
686+
...(ownedStoresError ? { ownedStoresError } : {}),
456687
};
457688
})();
458689

@@ -545,6 +776,11 @@ export const getDiagnosticsResponse = async () => {
545776
if (datalayerSection.subscriptions?.some((s) => s.synced === false)) {
546777
dlStatus.escalate('warning', 'One or more DataLayer subscriptions are not synced');
547778
}
779+
// Critical when CADT believes it owns a store the datalayer has lost
780+
// track of -- that store cannot be written to and usually requires
781+
// operator intervention (the chia wallet occasionally "forgets" a
782+
// store it created).
783+
escalateLostOwnedStores(dlStatus, datalayerSection.expectedOwnedStores);
548784
Object.assign(datalayerSection, dlStatus.result());
549785
}
550786

@@ -644,5 +880,7 @@ export const __test = {
644880
collectSubscriptions,
645881
buildTrustedPeerView,
646882
normalizeNodeId,
883+
collectOwnedStoreExpectations,
884+
escalateLostOwnedStores,
647885
StatusAccumulator,
648886
};

0 commit comments

Comments
 (0)