[migration] DB transaction hardening: withTransaction reentrancy + split bulk migration (C/8)#579
Conversation
…ulk migration Two DB-correctness fixes: * withTransaction reentrancy (Martian-Engineering#474): the async function awaited operation() between BEGIN IMMEDIATE and COMMIT, letting concurrent callers race into a mid-transaction BEGIN. ConversationStore.withTransaction already routes through the per-connection async mutex from src/transaction-mutex.ts (PR Martian-Engineering#261), but Martian-Engineering#474 shows the regression surface needs explicit coverage. Add a setImmediate-interleaved stress test that ticks the event loop between every step inside the transaction body so concurrent callers race aggressively — under the current wiring all transactions complete cleanly; without it the test reproduces the SqliteError from the issue. * runLcmMigrations bulk-exec split (Martian-Engineering#569): PR Martian-Engineering#482 added ensureMessagePartsTable as belt-and-suspenders against Node db.exec()'s silent-abort failure mode. The same exposure existed for every other table in the same bulk block (summaries, summary_messages, summary_links, lcm_migration_state, plus their indexes). Split the bulk exec into per-statement calls (still inside the BEGIN EXCLUSIVE wrapper from Martian-Engineering#455) so each per-table SQL error throws on its own instead of being swallowed. Add a test that simulates a SQL failure on summary_messages and verifies runLcmMigrations now throws. Closes Martian-Engineering#569. Refs Martian-Engineering#474, Martian-Engineering#455, Martian-Engineering#261, Martian-Engineering#482.
There was a problem hiding this comment.
Pull request overview
Hardens SQLite transaction/migration correctness by adding regression coverage for withTransaction reentrancy under aggressive async interleaving, and by refactoring initial schema creation to avoid multi-statement db.exec() silent-abort behavior during migrations.
Changes:
- Add a
setImmediate-interleaved stress test to guard againstwithTransaction()reentrancy errors under concurrent async load (#474). - Split the initial schema bulk
db.exec()into per-statementdb.exec()calls viaLCM_INITIAL_SCHEMA_STATEMENTSto ensure failures surface immediately (#569). - Expand/adjust migration tests to validate split-bulk failure behavior and independent table creation.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/db/migration.ts |
Introduces LCM_INITIAL_SCHEMA_STATEMENTS and iterates per statement inside the existing BEGIN EXCLUSIVE migration wrapper. |
test/migration.test.ts |
Updates the message_parts guard test for per-statement exec and adds split-bulk regression tests. |
test/transaction-mutex.test.ts |
Adds an event-loop-interleaving stress test to detect nested-transaction errors under concurrency. |
.changeset/pr9-migration-hardening.md |
Adds a patch changeset documenting the two hardening changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
`messages.UNIQUE(conversation_id, seq)` and `context_items.PRIMARY KEY (conversation_id, ordinal)` already create implicit indexes on those exact columns; the explicit `messages_conv_seq_idx` and `context_items_conv_idx` were pure write/maintenance overhead. Removed from `LCM_INITIAL_SCHEMA_STATEMENTS` so fresh installs no longer create them, and added `DROP INDEX IF EXISTS` cleanup steps so existing databases also shed the redundancy on next migration. Addresses code review on Martian-Engineering#579.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…c failure mode * src/db/migration.ts: `ensureMessagePartsTable` JSDoc now explains the post-Martian-Engineering#569 split-bulk approach exec's statements individually, so the legacy silent-abort failure mode no longer applies on new installs. The guard remains as defense-in-depth for legacy DBs upgraded across the silent-abort window. Inline comment at the registration site rewritten to match. * test/migration.test.ts: drop the `PR #9` reference (an internal agent-generated PR-numbering artifact); reword to PR-agnostic phrasing and reference the actual issue/PR numbers (Martian-Engineering#569, Martian-Engineering#482) for the migration change. No code change. Addresses review on Martian-Engineering#579.
|
Maintainer review: needs rebase before deeper review. Summary Next step Evidence checked
|
Summary
Closes #569. Two related DB-correctness fixes for the v0.9.3 stability sweep.
D2 —
withTransactionreentrancy regression test (#474)Issue #474 reports
SqliteError: cannot start a transaction within a transactionfromConversationStore.withTransaction()4-6×/hour under multi-bridge load.withTransactionalready routes through the per-connection async mutex fromsrc/transaction-mutex.ts(added in #261), but #474 shows the regression surface needs explicit coverage.This PR adds a
setImmediate-interleaved stress test that ticks the event loop between every step inside the transaction body, forcing aggressive concurrency. Under the current wiring all transactions complete cleanly — without the mutex the test would reproduce the SqliteError reliably.D3 — split bulk migration exec (#569 + #482 root cause)
PR #482 added
ensureMessagePartsTableas belt-and-suspenders against Nodedb.exec()'s silent-abort failure mode. The same exposure existed for every other table in the same bulk block (summaries, summary_messages, summary_parents, large_files, conversation_*, lcm_migration_state, plus their indexes).This PR (option 2 from the audit) splits the bulk
db.exec()into per-statement calls held in aLCM_INITIAL_SCHEMA_STATEMENTSarray, iterated under the existingBEGIN EXCLUSIVEwrapper from #455. Each per-table SQL error throws on its own — silent abort is no longer possible.A regression test simulates a SQL failure on
summary_messagesand assertsrunLcmMigrationsnow throws. A second test verifies every table from the previous bulk block is still created independently.Files changed
src/db/migration.ts— extracted bulkdb.exec()block (lines ~815-1003) intoLCM_INITIAL_SCHEMA_STATEMENTSarray (~17 statements), iterated per-statement inside the existingBEGIN EXCLUSIVEwrapper.test/migration.test.ts— updated existing message_parts belt-and-suspenders test to match per-statement interception, plus two new tests for split-bulk behavior.test/transaction-mutex.test.ts— addedsetImmediate-interleaved regression test for withTransaction() reentrant callers throw "cannot start a transaction within a transaction" #474..changeset/pr9-migration-hardening.md— changeset (patch).Test plan
npm testclean: 836 passing (up from 833 baseline; +3 new tests)npm run buildcleansetImmediateinterleaving test for withTransaction() reentrant callers throw "cannot start a transaction within a transaction" #474Refs #474, #455, #261, #482.