Skip to content

fix(db/sqlite): defer user_id index creation until after column ALTER (v0.4.1 hotfix)#1792

Merged
Wirasm merged 1 commit into
devfrom
fix/sqlite-migration-order-v0.4.1
May 28, 2026
Merged

fix(db/sqlite): defer user_id index creation until after column ALTER (v0.4.1 hotfix)#1792
Wirasm merged 1 commit into
devfrom
fix/sqlite-migration-order-v0.4.1

Conversation

@Wirasm

@Wirasm Wirasm commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Problem: Every existing user upgrading from v0.3.x to v0.4.0 has a completely non-functional binary — every operation throws Error: no such column: user_id.
  • Why it matters: v0.4.0 just shipped. Anyone running brew upgrade archon or curl install.sh right now ends up with a broken install. New users (fresh ~/.archon/archon.db) are unaffected.
  • What changed: The two CREATE INDEX statements for user_id on conversations and workflow_runs are moved out of createSchema() and into migrateColumns(), right after the ALTER TABLE that adds the column.
  • What did NOT change: idx_user_identities_user_id stays in createSchema()user_identities is a new table created by the same exec block, so the column always exists when the index is created. No behavior change for new installs.

UX Journey

Before (v0.4.0)

  Existing v0.3.x user           Archon v0.4.0
  ────────────────────           ─────────────
  brew upgrade archon ─────────▶ installs binary
  archon isolation list ───────▶ SqliteAdapter()
                                   createSchema()
                                     CREATE INDEX idx_conversations_user_id
                                     ❌ "no such column: user_id"
                                   constructor throws
                                 ❌ Error: no such column: user_id
  every command fails ◀────────

After (v0.4.1)

  Existing v0.3.x user           Archon v0.4.1
  ────────────────────           ─────────────
  brew upgrade archon ─────────▶ installs binary
  archon isolation list ───────▶ SqliteAdapter()
                                   createSchema()  ✓ no user_id refs
                                   migrateColumns()
                                     ALTER TABLE ... ADD COLUMN user_id ✓
                                     CREATE INDEX idx_conversations_user_id ✓
                                     (same for workflow_runs)
                                   constructor returns
                                 isolation list returns environments ✓

Architecture Diagram

Before

SqliteAdapter constructor
  └─ initSchema()
      ├─ createSchema()         [contains user_id index refs] ❌ aborts on pre-0.4.0 DB
      └─ migrateColumns()       [adds user_id column]        — never reached

After

SqliteAdapter constructor
  └─ initSchema()
      ├─ createSchema()         [contains only safe index refs] ✓
      └─ migrateColumns()
          ├─ conversations migration: ALTER + [+] CREATE INDEX user_id
          └─ workflow_runs migration: ALTER + [+] CREATE INDEX user_id

Connection inventory:

From To Status Notes
createSchema() idx_conversations_user_id removed moved to migrateColumns
createSchema() idx_workflow_runs_user_id removed moved to migrateColumns
migrateColumns() idx_conversations_user_id new after ALTER TABLE
migrateColumns() idx_workflow_runs_user_id new after ALTER TABLE

Label Snapshot

  • Risk: risk: low (3-line code change + comprehensive test)
  • Size: size: S
  • Scope: core
  • Module: core:db/sqlite-adapter

Change Metadata

  • Change type: bug
  • Primary scope: core

Linked Issue

  • Closes
  • Related: caught by /test-release brew 0.4.0 immediately after v0.4.0 was published

Validation Evidence (required)

Commands and result:

bun run validate
# exit 0 — all checks (type-check, lint, format:check, tests, bundled checks) pass
  • New regression test added: SqliteAdapter > upgrade from pre-0.4.0 schema > migrates user_id columns and indexes onto an existing pre-0.4.0 database
  • Verified that the new test FAILS when the production fix is reverted (confirmed by git stash push <prod file> + re-running the test)
  • Manually validated against my own broken ~/.archon/archon.db (which was in the v0.4.0-broken state): all user_id columns added, all indexes created, archon isolation list returned 6 environments cleanly.

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Database migration needed? Yes — automatic. SQLite migration runs on first SqliteAdapter construction. No user action required beyond upgrading the binary.

Human Verification (required)

  • Verified scenarios:
    • Existing v0.3.x DB → v0.4.1 binary: schema migration completes, all 4 user_id/created_by_user_id columns added, both new indexes created, isolation list succeeds.
    • Fresh DB → v0.4.1 binary: clean init, all schema elements present.
  • Edge cases checked:
    • Test asserts the WHERE user_id IS NOT NULL partial index syntax is preserved on the migration path.
    • Test confirms the constructor doesn't throw when ALTER TABLE adds the column.
  • What was not verified:
    • PostgreSQL backend (not affected — PostgreSQL uses numbered SQL migrations in migrations/, not this in-process schema sync).

Side Effects / Blast Radius (required)

  • Affected subsystems: anything that opens an SQLite connection — i.e., the entire app when DATABASE_URL is unset.
  • Potential unintended effects: very small. The CREATE INDEX is idempotent (IF NOT EXISTS) and runs inside the same try/catch as the ALTER TABLE, so any unexpected failure is swallowed and logged like the existing migration errors.
  • Guardrails for early detection: the new regression test exercises the exact pre-0.4.0 → post-0.4.0 upgrade path that broke in production.

Rollback Plan (required)

  • Fast rollback: revert this commit + cut v0.4.2 (or downgrade users to v0.3.12 via brew install coleam00/archon/archon@0.3.12).
  • Observable failure symptoms: Error: no such column: user_id in any DB-touching command.

Risks and Mitigations

  • Risk: the migration's catch-block swallows errors silently. If the ALTER TABLE fails for some reason, the CREATE INDEX inside the same try will also fail and be swallowed. The DB will be left in a half-migrated state.
    • Mitigation: not in scope for this hotfix — that's pre-existing code shape (db.sqlite_migration_*_columns_failed warnings already exist). Fixing the silent-failure pattern is a separate refactor. Releasing v0.4.1 is the priority.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed database schema migration for upgrades from pre-0.4.0 versions, ensuring all required columns and indexes are properly created during the upgrade process.

Review Change Stack

createSchema() ran two CREATE INDEX statements referencing user_id on
remote_agent_conversations and remote_agent_workflow_runs. On databases
created before v0.4.0 those columns don't exist yet — they're added by
migrateColumns(), which runs AFTER createSchema(). The index creation
aborted the entire createSchema() exec block, the constructor threw, and
every subsequent operation failed with "no such column: user_id". New
installs were unaffected because the columns exist in the same schema.

Moves both CREATE INDEX statements into migrateColumns() so they run
after the matching ALTER TABLE. idx_user_identities_user_id stays in
createSchema() because user_identities is a new table whose user_id
column always exists.

Adds a regression test that seeds a pre-0.4.0 schema (no user_id columns
on conversations/workflow_runs/messages, no created_by_user_id on
isolation_environments) and asserts SqliteAdapter construction completes,
migrates the columns, and creates both indexes.

Caught by /test-release brew 0.4.0.
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

SQLite schema migration is reorganized to create user_id indexes in the migration path rather than in initial schema setup, ensuring pre-0.4.0 databases upgrade successfully. A regression test validates the upgrade by seeding a pre-0.4.0 database, confirming the migrated schema contains the new columns and indexes, and verifying queryability.

Changes

Pre-0.4.0 SQLite Schema Upgrade

Layer / File(s) Summary
Index creation migration in schema updates
packages/core/src/db/adapters/sqlite.ts
migrateColumns() now creates idx_conversations_user_id and idx_workflow_runs_user_id indexes using CREATE INDEX IF NOT EXISTS with partial WHERE user_id IS NOT NULL clauses after column additions. createSchema() no longer creates these indexes; comments clarify that they are created during migration.
Regression test for pre-0.4.0 schema upgrade
packages/core/src/db/adapters/sqlite.test.ts
New regression test seeds a pre-0.4.0-shaped database, initializes SqliteAdapter, and asserts user_id columns and indexes exist post-migration. Helper functions (raw_pragma, raw_indexes, raw_query) inspect table schema and run SELECT queries to validate the upgraded state is correct and queryable.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Indexes now migrate with grace,
Old databases find their new place.
Pre-0.4.0 schemas stay intact,
Columns and indexes perfectly tracked!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: deferring user_id index creation until after the column ALTER statement to fix v0.4.0 upgrade breakage.
Description check ✅ Passed The description is comprehensive and closely follows the template structure with all required sections properly filled: Summary, UX Journey, Architecture Diagram, metadata, validation evidence, security/compatibility checks, human verification, blast radius analysis, and rollback plan.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sqlite-migration-order-v0.4.1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/src/db/adapters/sqlite.test.ts`:
- Around line 206-267: The seed Database instance named raw (created via new
Database(dbPath)) is closed only after raw.exec(...), so if exec throws the
handle is leaked; wrap the exec call in a try/finally (or try/catch/finally) so
that raw.close() is always called in the finally block (i.e., try {
raw.exec(...); } finally { raw.close(); }) to guarantee cleanup of the Database
regardless of errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f2d7caa-78be-4282-9309-bf727a8f0e65

📥 Commits

Reviewing files that changed from the base of the PR and between 765417b and dac3563.

📒 Files selected for processing (2)
  • packages/core/src/db/adapters/sqlite.test.ts
  • packages/core/src/db/adapters/sqlite.ts

Comment on lines +206 to +267
const raw = new Database(dbPath);
raw.exec(`
CREATE TABLE remote_agent_codebases (
id TEXT PRIMARY KEY DEFAULT (lower(hex(randomblob(16)))),
name TEXT NOT NULL,
default_cwd TEXT NOT NULL,
repository_url TEXT,
created_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE remote_agent_conversations (
id TEXT PRIMARY KEY DEFAULT (lower(hex(randomblob(16)))),
platform_type TEXT NOT NULL,
platform_conversation_id TEXT NOT NULL,
ai_assistant_type TEXT,
codebase_id TEXT,
cwd TEXT,
isolation_env_id TEXT,
hidden INTEGER DEFAULT 0,
deleted_at TEXT,
last_activity_at TEXT,
created_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE remote_agent_workflow_runs (
id TEXT PRIMARY KEY DEFAULT (lower(hex(randomblob(16)))),
workflow_name TEXT NOT NULL,
conversation_id TEXT,
codebase_id TEXT,
status TEXT DEFAULT 'pending',
user_message TEXT,
metadata TEXT DEFAULT '{}',
parent_conversation_id TEXT,
last_activity_at TEXT,
created_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE remote_agent_messages (
id TEXT PRIMARY KEY DEFAULT (lower(hex(randomblob(16)))),
conversation_id TEXT,
role TEXT NOT NULL,
content TEXT NOT NULL,
metadata TEXT DEFAULT '{}',
created_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE remote_agent_isolation_environments (
id TEXT PRIMARY KEY DEFAULT (lower(hex(randomblob(16)))),
codebase_id TEXT NOT NULL,
workflow_type TEXT NOT NULL,
workflow_id TEXT NOT NULL,
provider TEXT NOT NULL DEFAULT 'worktree',
working_path TEXT NOT NULL,
branch_name TEXT NOT NULL,
created_by_platform TEXT,
metadata TEXT DEFAULT '{}',
status TEXT NOT NULL DEFAULT 'active',
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now'))
);
`);
raw.close();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Close the seed Database in a finally block.

If raw.exec(...) throws, raw.close() is skipped, which can leak the handle and make cleanup brittle.

Proposed fix
-      const raw = new Database(dbPath);
-      raw.exec(`
+      const raw = new Database(dbPath);
+      try {
+        raw.exec(`
         CREATE TABLE remote_agent_codebases (
           id TEXT PRIMARY KEY DEFAULT (lower(hex(randomblob(16)))),
           name TEXT NOT NULL,
           default_cwd TEXT NOT NULL,
           repository_url TEXT,
           created_at TEXT DEFAULT (datetime('now'))
         );
@@
         CREATE TABLE remote_agent_isolation_environments (
           id TEXT PRIMARY KEY DEFAULT (lower(hex(randomblob(16)))),
           codebase_id TEXT NOT NULL,
           workflow_type TEXT NOT NULL,
           workflow_id TEXT NOT NULL,
           provider TEXT NOT NULL DEFAULT 'worktree',
           working_path TEXT NOT NULL,
           branch_name TEXT NOT NULL,
           created_by_platform TEXT,
           metadata TEXT DEFAULT '{}',
           status TEXT NOT NULL DEFAULT 'active',
           created_at TEXT DEFAULT (datetime('now')),
           updated_at TEXT DEFAULT (datetime('now'))
         );
-      `);
-      raw.close();
+        `);
+      } finally {
+        raw.close();
+      }

As per coding guidelines, “For unit tests, test pure functions with mocked external dependencies; for integration tests, test database operations with test database and end-to-end flows with real orchestrator; clean up test data after each test”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/db/adapters/sqlite.test.ts` around lines 206 - 267, The
seed Database instance named raw (created via new Database(dbPath)) is closed
only after raw.exec(...), so if exec throws the handle is leaked; wrap the exec
call in a try/finally (or try/catch/finally) so that raw.close() is always
called in the finally block (i.e., try { raw.exec(...); } finally { raw.close();
}) to guarantee cleanup of the Database regardless of errors.

@Wirasm Wirasm merged commit bbf2788 into dev May 28, 2026
4 checks passed
@Wirasm

Wirasm commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

PR Review Summary (multi-agent)

Reviewed by: code-reviewer, pr-test-analyzer, comment-analyzer, silent-failure-hunter, code-simplifier.

Verdict: APPROVE WITH MINOR FIXES — the production fix is correct and the regression test exercises the exact failure path. Two findings are worth addressing in this commit since the lines are already being touched; the rest are low-impact.


Critical Issues

None.

Important Issues (2)

# Agent(s) Issue Location
I1 silent-failure-hunter, code-reviewer CREATE INDEX now shares a try with the per-table ALTER TABLE block. Before this PR the index lived in createSchema()'s all-or-nothing batch; now an index failure (disk full, transient I/O, future name collision) is silently swallowed with db.sqlite_migration_*_columns_failed while the column ALTER succeeded. New steady state: column present, index absent, no observable error — only degraded perf. Self-heals on next startup if conditions recover, but invisible to the user in the meantime. packages/core/src/db/adapters/sqlite.ts:190-195, 220-225
I2 code-reviewer, code-simplifier Test helpers raw_pragma, raw_indexes, raw_query use snake_case; rest of file (and codebase) uses camelCase. packages/core/src/db/adapters/sqlite.test.ts:301,311,323

Suggested fix for I1 — split the index call into its own try with a distinct event name so an index failure is independently observable from a column-migration failure:

try {
  this.db.run(
    'CREATE INDEX IF NOT EXISTS idx_conversations_user_id ON remote_agent_conversations(user_id) WHERE user_id IS NOT NULL'
  );
} catch (e: unknown) {
  getLog().warn({ err: e as Error }, 'db.sqlite_migration_conversations_user_id_index_failed');
}

Same split for idx_workflow_runs_user_id. This does not touch the pre-existing ALTER TABLE catch-swallow (deferred by author, acknowledged).


Suggestions (5)

# Agent Suggestion Location
S1 comment-analyzer "exec block" terminology is incorrect — createSchema() uses db.run(...), not db.exec(...). Future readers grepping for .exec will find nothing. Replace with "the db.run() call" or "all subsequent statements in the same batch." sqlite.ts:188-189
S2 code-simplifier, comment-analyzer // Same rationale as idx_conversations_user_id above. uses a positional cue ("above"). Robust today, brittle if the migration blocks are ever reordered. Restate the one-sentence rationale inline. sqlite.ts:219
S3 silent-failure-hunter Single catch covers PRAGMA / ALTER / CREATE INDEX with no statement field in the log payload. Add table: 'remote_agent_conversations' and columnsPresent: [...colNames] to the warn so a half-migrated state is debuggable from logs. sqlite.ts:193-195, 223-225
S4 comment-analyzer "pre-0.4.0" in the production code comment will age badly. Test-name version anchors are fine; production-code version refs aren't. Use a structural invariant ("databases where this column has not yet been added") instead of a version label. sqlite.ts:187
S5 pr-test-analyzer Test verifies index names exist in sqlite_master but does not assert the partial-index predicate WHERE user_id IS NOT NULL. Low priority — correctness-neutral, only a perf concern. One-line addition: SELECT sql FROM sqlite_master WHERE name='idx_conversations_user_id' and assert it contains the predicate. sqlite.test.ts:287-289

Strengths

  • The core fix is minimal and correct: 3 lines of production code, faithful root-cause remediation.
  • idx_user_identities_user_id is correctly left in createSchema() (new table, column always exists).
  • Regression test reproduces the exact pre-0.4.0 shape — CREATE TABLE IF NOT EXISTS is a no-op for the seeded tables, so the migration path is the one under test.
  • raw.close() before new SqliteAdapter(dbPath) prevents the bun:sqlite locked-file conflict; currentDbPath = dbPath is set before seeding so afterEach cleanup targets the right file (including -wal/-shm).
  • Inline 60-line DDL heredoc is the correct call — it's load-bearing documentation of the pre-0.4.0 schema and would lose value if extracted.
  • Per-table catch-block silent failure is explicitly acknowledged and deferred in the PR body — out of scope for this hotfix.
  • createSchema() index-split comment (lines 457-462) is well-structured: names symbols rather than positional cues; will survive future refactors.

Documentation Impact

No documentation changes required. The fix is internal to the SQLite adapter; the migration is described as "automatic" in the PR body and matches the existing CLAUDE.md guidance ("SQLite is the default — zero setup").


Recommended Actions

  1. (Important) Split the two CREATE INDEX calls into isolated try blocks with distinct event names (I1) — 4 lines of boilerplate, prevents a new silent partial-migration state.
  2. (Important) Rename raw_pragma / raw_indexes / raw_query to camelCase (I2).
  3. (Minor, in-scope while touching these lines) Fix "exec block" → "db.run() call" (S1) and replace "Same rationale as ... above" with a self-contained sentence (S2).
  4. (Optional) Improve log payload (S3) and drop the version-specific production comment (S4) — same diff, low cost.

Given this is a v0.4.1 release-blocker hotfix, I1 + I2 + S1 + S2 are reasonable to land in the same PR. S3-S5 are optional polish.

@Wirasm Wirasm mentioned this pull request May 28, 2026
jojorgen pushed a commit to cognoco/zdx-archon that referenced this pull request Jun 4, 2026
Upstream base advanced 0adec22c839bcc (40 commits). Notable: GitHub App
multi-installation routing, user_id plumbing, Slack UX upgrade, OpenCode +
Copilot community providers, experimental /console UI, Codex AbortController
crash fixes, large-node-output temp-file spill (coleam00#1717), sqlite user_id
migration + index-order hotfix (coleam00#1792).

Conflicts resolved (4):
- packages/providers/package.json: kept upstream's new deps (copilot, opencode,
  pi 0.73.1) + retained our Patch-4 codex-sdk 0.132.0 pin.
- packages/web/src/App.tsx: kept our ThemeProvider wrapper around upstream's
  new /console route.
- packages/web/src/components/layout/TopNav.tsx: kept both upstream's console
  CTA and our ThemeToggle.
- bun.lock: regenerated via bun install against merged package.json.

Fork patches preserved: dag-executor.ts auto-merged cleanly — our 100MB
maxBuffer, 30min subprocess timeout + idle_timeout fallback, SIGTERM error
classifier, and per-workflow ARCHON_DEPLOYMENT_ENVIRONMENT injection all coexist
with upstream's new NODE_OUTPUT_FILE_THRESHOLD temp-file logic. Verified present
in the rebuilt darwin-arm64 binary.

KNOWN CAVEAT (pre-existing on upstream/dev, not introduced here): the Pi
community provider fails type-check — pi-coding-agent 0.73.1 dropped the
`codingTools` export and changed ExtensionUIContext, but upstream's
options-translator.ts / ui-context-stub.ts still use the old API. Pi is a
lazy-loaded community provider (builtIn:false) we don't use; bun build strips
types so the compiled binary is unaffected. `bun run validate` will be red on
this until upstream fixes pi or we pin pi back.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant