fix: init mirror models eagerly and recover lost writes on reconnect#1585
Conversation
Fixes a silent startup race in CADT's MySQL mirror: if MySQL was not
yet reachable when CADT started, every mirror Model.init() call was
permanently skipped. Subsequent mirror writes threw
"Cannot read properties of undefined (reading 'constructor')" from
deep inside Sequelize and the mirror stayed non-functional for the
life of the process. Observed on the cadt-testneta-observer k8s pod.
Core fix (V1 + V2):
- safeMirrorDbHandler[V2] no longer wraps Model.init(). A new
initMirrorModel[V2] helper initialises models synchronously at
module-load time, which is safe because Sequelize.init is pure
metadata registration and does not require a live connection.
All 34 V1+V2 *.model.mirror.js files updated.
- safeMirrorDbHandler[V2] now detects disconnect->reconnect
transitions (and "setup never ran at startup") and runs
prepareMysqlMirror[V2] + backfillMirror[V2] before the current
callback, so writes dropped during an outage are recovered
without requiring a CADT restart.
- backfillMirror[V2] also sweeps orphan rows (mirror rows whose PK
no longer exists in source) so DELETEs issued during an outage
eventually propagate. Pagination now has an ORDER BY on the PK
so concurrent writes don't cause skipped rows. V1 previously
had no backfill at all - that is added.
- validateMirrorDbNames now fails loud instead of being swallowed
by the mirror-setup try/catch, so a V1/V2 DB_NAME collision
aborts startup the same way it did before this PR's refactor.
Tests:
- New integration tests: assert every mirror model is initialised
at module load (with a single-column-PK invariant guarding the
orphan sweep); orphan sweep removes source-deleted rows from
the mirror; reconnect detection runs backfill; steady-state
connects do NOT run backfill.
- V1 live-api data-short.js now verifies records land in the V1
MySQL mirror (symmetric with V2). Ports mysql-mirror-helpers to
V1 and fixes a date-then-numeric comparison fallback in both
helpers so '2024-01-01' no longer equals '2024-06-15' by
parseFloat.
CI:
- test-v2-live-api and test-v1-live-api now stop MariaDB mid-setup
and restart it ~30s after CADT starts, exercising the startup
race plus the reconnect+backfill recovery path end-to-end.
Both jobs hard-fail if MariaDB is still reachable after
service mariadb stop so a degraded race test cannot silently
pass. V1 Start CADT now polls the V1 API endpoint with
exit-on-timeout, matching V2.
Related fix (out of scope for the bug but same test surface):
- wallet-health.live.spec.js and datalayer-test-helpers.js now
honour CERTIFICATE_FOLDER_PATH via a new shared live-api helper,
matching src/datalayer/wallet.js. Previously hardcoded
\${chiaRoot}/config/ssl would false-negative in deployments
using a non-default Chia SSL directory.
Sequelize's default datetime format "YYYY-MM-DD HH:mm:ss.SSS +00:00"
includes a trailing "+00:00" offset that recent MariaDB strict mode
rejects with:
Incorrect datetime value: '2026-04-16 22:41:27.490 +00:00'
for column `cadt_mirror_test`.`audit`.`createdAt` at row 1
Every mirror write touching a DATE/DATETIME column was silently dropped
by the fire-and-forget safeMirrorDbHandler[V2]. This affected both V1
and V2 mirrors in production - the v2 live-api CI run exposed 59 such
errors but still passed because V2 test assertions only check
project/methodology-level records, not audit. V1 live-api verification
added in this PR surfaced the bug directly on the verified records
(projects.projectStatusDate, labels.creditingPeriodStartDate, etc.).
The fix is to set `timezone: '+00:00'` at the Sequelize constructor
level for both `mirror` and `v2Mirror` configs. At the constructor
level (unlike Model.init, where the option is silently ignored),
Sequelize converts Date values to UTC and emits the MariaDB-safe
"YYYY-MM-DD HH:MM:SS[.SSS]" format.
Also updates prepareMysqlMirrorV2's JSDoc to reflect that it does
throw (intentionally) when V1/V2 share a MIRROR_DB name - the earlier
"Never throws" claim was left over from before validateMirrorDbNames
was moved to fail-loud. Both callers (prepareV2Db and
startV2ReconnectBackfill) are wrapped in try/catch at the right level.
Two addressable bugbot + CI findings on the mirror fixes branch: 1. MariaDB "Incorrect datetime value" errors continued after the prior `timezone: "+00:00"` config tweak. Root cause: Sequelize 6's base `DATE._stringify` emits "YYYY-MM-DD HH:mm:ss.SSS Z" (e.g. "2026-04-17 17:07:13.399 +00:00") which MariaDB strict mode rejects, and some runtime paths reach the base serialiser even when a dialect- specific MySQL class is wired up. Patch the base DATE._stringify to emit the MariaDB-safe "YYYY-MM-DD HH:mm:ss" format; shadow an own- property `_stringify` on `sqlite.DATE.prototype` that preserves the original offset-bearing format (SQLite's parse() round-trip depends on it). Add `tests/v2/integration/mirror-datetime-format.spec.js` to guard the patch against future Sequelize upgrades. 2. Backfill used offset-based pagination, which can silently skip rows if a row is deleted concurrently (the next OFFSET lands one row later than intended). Switch to keyset pagination (WHERE pk > :lastSeenPk ORDER BY pk ASC LIMIT N) in both V1 and V2. Composite-PK mirrors now throw explicitly instead of silently loading the whole table; the single-PK invariant is still asserted at load time by mirror-model-init.spec.js. Add a defensive break if a terminal row returns a null PK. Tighten V2's outer-catch log formatting to match V1's (defensive `error?.message` fallback plus stack on debug).
|
Addressed the Bugbot pagination finding in 5d23732: Root fix: Switched both V1 ( Documented caveat: Keyset pagination is still not immune to concurrent INSERTs with a PK less than Hardening: Composite-PK mirrors now throw explicitly instead of silently loading the whole source table into memory. Also bundled in the same commit (same CI failure, different root cause):
|
The prior revision patched Sequelize's base DATE._stringify to force
a MariaDB-safe format, on the hypothesis that some steady-state write
paths bypassed the mysql.DATE subclass. Empirical tracing (all three
write paths - single INSERT, bulk INSERT, INSERT ... ON DUPLICATE KEY
UPDATE - instrumented against the query generator) shows that is NOT
the case: every MySQL write dispatches through mysql.DATE.stringify,
which already emits "YYYY-MM-DD HH:mm:ss".
The actual source of the residual "Incorrect datetime value" CI errors
was the reconnect-backfill path. `source.findAll({ raw: true })`
returns SQLite DATE columns as raw strings ("YYYY-MM-DD HH:mm:ss.SSS
+00:00") without running sqlite.DATE.parse. The strings were forwarded
verbatim into `mirror.bulkCreate`, which MariaDB strict mode rejected.
Fix: drop `raw: true` from backfill's findAll calls. Model instances
run sqlite.DATE.parse during construction, so DATE columns become Date
objects on dataValues; mysql.DATE then serialises them safely.
Use `.get({ plain: true, raw: true })` to extract the plain payload -
plain:true alone runs attribute-level `get()` accessors, which would
corrupt columns like ProjectV2.projectSector (stored as JSON, returned
as Array by the getter) when bulkCreate'd into the mirror's STRING/TEXT
column.
The sqlite PK-only orphan sweep keeps `raw: true` - its projection
excludes DATE and custom-getter columns, and a comment now documents
that invariant.
Revert the previous base-class `_stringify` monkey-patch and the
no-op `timezone: "+00:00"` constructor options (that value is already
Sequelize's default). Simplify the datetime regression test to pin
only the mysql subclass's output contract - the one Sequelize API we
actually depend on.
|
Root-cause fix pushed as 4aa768b. What the prior CI run revealed Actual root cause What the prior patch actually did What 4aa768b does Empirical verification of the steady-state path Bonus fix caught by pre-push review |
V1 live-api CI on 4aa768b still reproduced 33 "Incorrect datetime value: '... +00:00'" errors on Audit mirror writes - the exact base- DATE._stringify signature. Local instrumentation shows every MySQL INSERT path (bulkInsert, insertQuery, upsert) dispatches through mysql.DATE._stringify, so the CI observation points to some Sequelize-internal path we can't isolate from here that prototype- invokes the base method on a mysql attribute type. Rather than continue bisecting Sequelize internals, restore a targeted patch on Sequelize.DataTypes.DATE.prototype._stringify that strips the " +00:00" offset suffix (MariaDB strict mode rejects it) while keeping "YYYY-MM-DD HH:mm:ss.SSS" fractional seconds so local SQLite round-trips preserve millisecond precision. Legacy SQLite rows still round-trip via sqlite.DATE.parse's `date.includes("+")` branch. Uses a static ESM `import moment from 'moment'` (the prior attempt with `require('moment')` threw ReferenceError under ESM and was silently swallowed by safeMirrorDbHandler's catch). Also addresses the two open Bugbot findings: - V1 `mirrorDBEnabled()` always returned true in test mode. The in-process cost was the on-startup 11-table backfill iterating every source/mirror pair. Gate the startup backfill on isMysqlMirrorConfiguredForReconnect() instead of mirrorDBEnabled() so SQLite-fallback test runs skip it. Keep mirrorDBEnabled() true in mirrorTest mode - V1 integration tests rely on that implicit test enablement for SQLite-fallback mirror writes to land. - sweepMirrorOrphans[V2] loaded every PK from both mirror and source tables in a single unbounded findAll. Replace with keyset-paginated scans. Peak memory is now O(|mirror|) for the orphan-candidate set rather than O(|mirror|+|source|); the source side is streamed. Restore the original mirror-first scan order (a pre-push review caught that an earlier refactor inverted it, which allowed false- positive orphan deletes for rows inserted concurrently between the source scan and the mirror scan).
|
Addressed the two outstanding Bugbot findings in 9571d6e: Medium: Low: Plus one pre-push-review catch that would have regressed: the first refactor inverted the scan order (source-first, mirror-second). That violates the documented concurrency-safety invariant — a row inserted concurrently between the source scan and the mirror scan would have been misclassified as an orphan and deleted. Restored the original mirror-first order; the orphan-candidate set pattern preserves the streaming memory win without losing the safety property. Also in the same commit: restored a targeted |
Adds a gated (CADT_DATE_STRINGIFY_DIAG=1) stack-trace logger on the patched base _stringify, capped at 8 invocations per process so it can't spam logs or disk. Enabled in the V1 and V2 live-api Start-CADT steps so one CI run captures the full call stack for whatever Sequelize internal path is reaching the base serialiser on a mysql attribute type despite the mysql.DATE subclass being wired up. Remove this commit after the root cause is understood and the explanatory comment in src/config/config.js is updated with the concrete Sequelize call site.
CI stack-trace diagnostic (enabled in 6088b1b) captured 12 base DATE._stringify invocations in the V1 live-api run. All 12 came from SQLiteQueryGenerator / SQLiteQueryInterface frames - i.e. the SQLite source writes. Zero came from a MySQL path. The run produced 0 MariaDB datetime rejections, confirming that: 1. Every MySQL mirror write correctly dispatches through mysql.DATE._stringify (which emits the safe "YYYY-MM-DD HH:mm:ss" form). The base class is not reached on a MySQL write. 2. The base _stringify IS called for every SQLite source write. Before this patch, SQLite therefore persisted "YYYY-MM-DD HH:mm:ss.SSS +00:00" - the exact offset-bearing form MariaDB strict mode rejects. 3. Some code path (backfill was one; empirical comparison between 4aa768b and 9571d6e shows at least one more that I was unable to isolate) forwards those SQLite-stored strings to MariaDB verbatim. The backfill fix (drop raw:true, use `.get({plain:true,raw:true})`) closed the backfill path. The base-class patch closes the unknown remaining path by making SQLite never persist the " +00:00" form in the first place. Remove the diagnostic counter / env-gated stack logger and the CADT_DATE_STRINGIFY_DIAG wiring in the V1/V2 live-api CI jobs. Rewrote the src/config/config.js comment to state the concrete empirical finding instead of the previous "presumably some internal path" speculation.
|
Diagnostic stack-trace logger (6088b1b) captured the empirical root cause in V1 live-api CI. Removed the diagnostic in 7b059e2 and rewrote the What the diagnostic showed 12 base Conclusion
The patch is therefore necessary, not just defense-in-depth. |
Three unresolved bugbot findings on PR #1585: 1. Skip mirror callback when MySQL setup has not completed yet (V1+V2). In safeMirrorDbHandler[V2], after the post-reconnect await, the callback previously ran unconditionally. If MySQL setup was still failing (mirrorSetupSucceeded=false), the write hit tables that did not exist and the error was swallowed by the outer catch. Add a guard that returns early when MySQL is configured but setup has not succeeded. The next authenticate() success re-enters the reconnect path and retries setup. 2. Prevent mirrorDBEnabledV2 from desyncing from sequelizeV2Mirror. sequelizeV2Mirror is built once at module load from the memoized config. mirrorDBEnabledV2() previously re-read the live config on every call, which could flip the enablement true while writes still go to the originally-selected SQLite fallback. Snapshot the enablement at module load (v2MirrorTargetsMysql) so the check stays consistent with the Sequelize instance. 3. Apply the same snapshot pattern to V1 for symmetry (v1MirrorConfiguredAtLoad + mirrorDBEnabled). Add test-only overrides (__setMysqlConfiguredForReconnect*, __resetMirrorAuthState*) and regression tests for the new skip-callback guard in both the V1 and V2 reconnect specs. The auth-state reset is required to prevent the new test from leaving mirrorAuthState in 'disconnected' and leaking into downstream integration specs.
|
Addressed remaining bugbot findings in Resolved1. Reconnect backfill runs callback even after setup failure (Low, V1+V2)After 2. Mirror enable check desynchronizes mirror target (Medium, V2)
3. V1/V2 sweep and backfill logic fully duplicated (Low, code style)Replied on the thread — duplication is intentional for this PR; a shared helper would need to parameterise six distinct things (Sequelize instances, loggers, model barrels, state variables, Op imports, and asymmetric test-mode behaviour) and would obscure the code more than inlined copies do. Follow-up refactor issue will be opened separately. Regression testsAdded a regression test to both Also added Test results (local)
Pre-push diff review: 2 warnings, both addressed (V1/V2 symmetry + test coverage for the new guard). |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit dfab796. Configure here.

Summary
Fixes a silent startup race in CADT's MySQL mirror: if MySQL wasn't reachable when CADT started, every mirror
Model.init()call was permanently skipped and subsequent mirror writes threwCannot read properties of undefined (reading 'constructor')from deep inside Sequelize. The mirror stayed non-functional for the life of the process. Observed on thechiamanaged-cadt-testneta-observer-officialk8s namespace —cadt_mirror_v1andcadt_mirror_v2databases contained zero tables despite ~60 mirror errors per sync cycle.This PR addresses both the immediate init race AND the broader durability gap where writes during a MySQL outage are silently dropped.
Core fix (V1 + V2)
safeMirrorDbHandler[V2]no longer wrapsModel.init(). A newinitMirrorModel[V2]helper runs init synchronously at module load — safe becauseSequelize.init()is pure metadata registration and doesn't need a live connection. All 34 V1+V2*.model.mirror.jsfiles updated.safeMirrorDbHandler[V2]now detectsdisconnected → connectedtransitions (and "setup never ran at startup") and runsprepareMysqlMirror[V2]+backfillMirror[V2]before the current callback. Writes dropped during an outage are recovered without a CADT restart.backfillMirror[V2]now also sweeps orphan rows (mirror rows whose PK is no longer in source) soDELETEs issued during an outage eventually propagate. Pagination now hasORDER BYon the PK so concurrent writes don't cause skipped rows. V1 previously had no backfill at all — added in this PR.validateMirrorDbNamesfails loud on startup instead of being swallowed by the mirror-setuptry/catch.Tests
data-short.jsnow verifies records land in the V1 MySQL mirror (symmetric with V2). Portsmysql-mirror-helpersto V1 and fixes a date-then-numeric fallback in both helpers so'2024-01-01'no longer equals'2024-06-15'viaparseFloat.CI
test-v2-live-apiandtest-v1-live-apinow stop MariaDB mid-setup and restart it ~30s after CADT starts, exercising the startup race + reconnect+backfill recovery end-to-end.service mariadb stopso a degraded race test cannot silently pass.Start CADTpolls the V1 API with exit-on-timeout, matching V2's/healthpolling.Related fix (same test surface)
wallet-health.live.spec.jsanddatalayer-test-helpers.jsnow honourCERTIFICATE_FOLDER_PATHvia a new shared live-api helper, matchingsrc/datalayer/wallet.js. The previous hardcoded${chiaRoot}/config/sslgave a false-negative in deployments using a non-default Chia SSL directory.Test plan
npm run test:v2passes (1465 tests, +34 from new single-PK invariant guards)npm run test:v1passes (115 tests)mirror-model-init+mirror-reconnect+mirror-backfillsuites pass × 3 stability runsnode --checkall 46 changed filesorigin/v2-rc2baseline)js-yamlvalidation of.github/workflows/tests.yamlpre-push-diff-review(4 slices × 2 = 8 subagent reviews); all critical + high-impact findings fixed; info-level findings triaged and filed for follow-up where appropriatetest-v2-live-apiCI job exercises the delayed-MariaDB race pathtest-v1-live-apiCI job exercises the V1 delayed-MariaDB race path and verifies V1 mirror record landingContext
v2-rc2:c0eb64fb(handle MySQL mirror connection failure gracefully in v1 prepareDb). This PR's reconnect logic supersedes the"will be retried by safeMirrorDbHandler"assumption in that commit's message — retries now actually work because models are initialised independently of authenticate.Note
High Risk
High risk because it changes V1/V2 mirror database enablement, setup/migration, and outage recovery paths, which can affect data consistency and startup behavior under partial MySQL availability.
Overview
Fixes MySQL mirror startup and outage durability by initializing mirror models eagerly (new
initMirrorModel/initMirrorModelV2) and adding reconnect-aware recovery insafeMirrorDbHandler/safeMirrorDbHandlerV2that retries mirror setup and runs catch-up backfills before applying new writes.Adds/expands mirror backfill logic: V1 gains a full
backfillMirrorimplementation; V2’sbackfillMirrorV2is upgraded to keyset pagination and now also sweeps orphan rows to propagate deletes missed during outages, with guards/overrides for tests and consistent mirror enablement snapshots.Hardens CI and tests around these scenarios: live-api workflows now deliberately start CADT with MariaDB down then restart it to assert recovery; new integration specs cover model-init, datetime serialization, reconnect backfill, and orphan sweep; V1 live-api runner now verifies MySQL-mirror contents and closes the mirror pool; live-api TLS helpers now honor
CERTIFICATE_FOLDER_PATH.Reviewed by Cursor Bugbot for commit dfab796. Bugbot is set up for automated code reviews on this repo. Configure here.