Skip to content

fix: tighten SQLite runtime truth in session refactor#79971

Open
100yenadmin wants to merge 409 commits intoopenclaw:exp-vfsfrom
100yenadmin:feat/openclaw-78595-sqlite-correctness
Open

fix: tighten SQLite runtime truth in session refactor#79971
100yenadmin wants to merge 409 commits intoopenclaw:exp-vfsfrom
100yenadmin:feat/openclaw-78595-sqlite-correctness

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

@100yenadmin 100yenadmin commented May 9, 2026

Summary

This follow-up tightens runtime-truth behavior on top of #78595 without reopening the companion-seam work.

It fixes five concrete correctness gaps in the SQLite session refactor path:

  • onboarding freshness now treats canonical SQLite runtime state as an existing setup
  • doctor integrity now treats SQLite as operational transcript truth instead of letting stale legacy .jsonl files satisfy health checks
  • exported session JSONL now reconstructs the canonical type: "session" header
  • rotated checkpoint trimming now deletes snapshot metadata by source-session ownership
  • recent-window transcript reads no longer prepend off-branch compaction markers by scan order alone

Why this is better

These changes help OpenClaw itself, not just LCM:

  • setup/import decisions now reflect the real runtime store
  • doctor output is aligned with the database-first design instead of legacy leftovers
  • export round-trips against the canonical transcript contract
  • compaction cleanup is safer across rotated sessions
  • lightweight readers stay ancestry-safe for runtime consumers

Scope

Included:

  • setup.migration-import SQLite freshness checks
  • doctor-state-integrity SQLite-truth transcript checks
  • export HTML download header type fix
  • rotated compaction checkpoint snapshot cleanup fix
  • off-branch compaction marker fallback removal in recent transcript readers

Not included:

  • companion cursor/frontier seam work from #79904
  • lineage/discovery seam work from #79903
  • typed transcript projection work from #79905

Review Follow-up

The doctor cleanup path is now explicitly non-destructive.

Fix:

  • orphan legacy transcript remediation now renames files to *.deleted.<timestamp> backups instead of deleting them
  • the warning and confirmation text now say archive rather than delete
  • tests now prove the archived file still contains the original transcript contents

Why this was necessary:

  • SQLite should be treated as operational truth, but doctor should not permanently remove the only remaining legacy transcript copy without import proof

Primary follow-up files:

  • src/commands/doctor-state-integrity.ts
  • src/commands/doctor-state-integrity.test.ts

Real Behavior Proof

Runtime proof from a direct module invocation on the PR branch:

{
  "transcriptStillPresent": false,
  "archived": [
    "orphan-proof.jsonl.deleted.2026-05-09T21-15-17.596Z"
  ],
  "notes": [
    "- Found 1 orphan transcript file in ~/state/agents/main/sessions.\n  These legacy .jsonl files are no longer referenced by SQLite session rows, so they are not part of any active session history.\n  Doctor can archive them safely by renaming each file to *.deleted.<timestamp>.\n  Examples: orphan-proof.jsonl",
    "- Archived 1 orphan transcript file in ~/state/agents/main/sessions as .deleted timestamped backups."
  ]
}

That proof shows:

  • the original orphan transcript is no longer left in place as an active file
  • doctor archives it as a timestamped backup instead of deleting it
  • the operator-facing note text matches the non-destructive behavior

Validation

cd /Volumes/LEXAR/repos/openclaw-pr78595-review
OPENCLAW_TEST_PROJECTS_SERIAL=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=120000 \
  node scripts/run-vitest.mjs \
  src/wizard/setup.migration-import.test.ts \
  src/commands/doctor-state-integrity.test.ts \
  src/auto-reply/reply/commands-export-session.test.ts \
  src/gateway/session-compaction-checkpoints.test.ts \
  src/gateway/session-transcript-readers.test.ts \
  src/agents/main-session-restart-recovery.test.ts \
  src/auto-reply/status.test.ts

Observed locally:

  • 12 files passed, 172 tests passed

Stack placement

This PR is intentionally the correctness lane on top of the database-first runtime refactor.

Related stack:

Why this is separate:

  • OpenClaw runtime truth should be correct even if no companion consumer ever attaches
  • LCM depends on this indirectly, but this PR does not add a new public seam

Context

This PR is intentionally scoped as a correctness follow-up on top of openclaw/openclaw#78595.

steipete added 30 commits May 9, 2026 18:01
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 9, 2026

Codex review: needs maintainer review before merge.

Summary
The branch updates the SQLite session refactor so setup freshness, doctor integrity, session export, compaction checkpoint cleanup, and recent transcript readers treat SQLite state as canonical while archiving orphan legacy transcripts.

Reproducibility: yes. source-reproducible for the reviewed gaps: the base branch deleted orphan legacy transcripts, allowed stale legacy transcript truth in doctor checks, and could include off-branch compaction markers by scan order. I did not execute the runtime paths in this read-only review.

Real behavior proof
Sufficient (live_output): The PR body now includes copied runtime output showing the after-fix orphan-transcript archive behavior and matching operator notes.

Next step before merge
Protected security label, stacked exp-vfs base, and mergeable=false make this maintainer/security review work rather than an autonomous repair candidate.

Security
Cleared: No concrete security or supply-chain regression remains in the focused diff after the orphan-transcript cleanup changed from deletion to timestamped archive.

Review details

Best possible solution:

Keep the SQLite-truth fixes with the base runtime-state refactor, then have a maintainer/security reviewer validate the stacked branch and land it only with the broader database-first migration plan.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible for the reviewed gaps: the base branch deleted orphan legacy transcripts, allowed stale legacy transcript truth in doctor checks, and could include off-branch compaction markers by scan order. I did not execute the runtime paths in this read-only review.

Is this the best way to solve the issue?

Yes for the focused slice: archiving orphan transcripts is the safer fix for the prior data-loss finding, and the other changes keep SQLite as the canonical runtime truth without reintroducing JSONL compatibility paths. Final merge remains a maintainer/security decision because the PR is stacked and protected.

Acceptance criteria:

  • OPENCLAW_TEST_PROJECTS_SERIAL=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=120000 node scripts/run-vitest.mjs src/wizard/setup.migration-import.test.ts src/commands/doctor-state-integrity.test.ts src/auto-reply/reply/commands-export-session.test.ts src/gateway/session-compaction-checkpoints.test.ts src/gateway/session-transcript-readers.test.ts src/agents/main-session-restart-recovery.test.ts src/auto-reply/status.test.ts
  • pnpm check:changed on the final stacked branch if maintainer chooses to merge

What I checked:

  • Protected review gate: Live issue metadata shows the PR is open, authored by a contributor, and labeled security, which requires explicit maintainer handling rather than cleanup close. (49e21721bcb0)
  • Stacked PR metadata: Live PR metadata shows head feat/openclaw-78595-sqlite-correctness at 49e2172, base exp-vfs at 9f3ee80, mergeable=false, and a very broad reported diff. (49e21721bcb0)
  • Previous data-loss finding addressed: The head branch now warns that orphan legacy .jsonl files are outside SQLite session rows, asks to archive, and uses fs.renameSync to move each file to a .deleted.<timestamp> backup instead of deleting it. (src/commands/doctor-state-integrity.ts:971, 49e21721bcb0)
  • Base branch had destructive cleanup: The base branch asked to delete orphan transcript files and used fs.rmSync(orphanPath, { force: true }), matching the prior ClawSweeper data-loss finding that the follow-up removed. (src/commands/doctor-state-integrity.ts:1017, 9f3ee805820f)
  • Archive regression coverage: The updated doctor test confirms the active orphan filename is gone, exactly one archived backup exists, and the archived file still contains the original transcript contents. (src/commands/doctor-state-integrity.test.ts:406, 49e21721bcb0)
  • SQLite freshness check: Setup freshness now checks for state/openclaw.sqlite and per-agent openclaw-agent.sqlite, with a test that rejects canonical SQLite runtime state even without legacy session folders. (src/wizard/setup.migration-import.ts:28, 49e21721bcb0)

Likely related people:

  • steipete: Current-main blame and local history for the affected doctor/setup/session paths point to Peter Steinberger, and the related database-first runtime refactor commits in the PR stack are authored by Peter Steinberger. (role: recent maintainer and SQLite refactor owner; confidence: high; commits: ae81d2aa09a8, 2fff5bd1f6eb, 79e5e67327bd; files: src/commands/doctor-state-integrity.ts, src/wizard/setup.migration-import.ts, src/gateway/session-compaction-checkpoints.ts)

Remaining risk / open question:

  • The PR is stacked on the active exp-vfs SQLite refactor and live metadata reports mergeable=false, so merge safety depends on maintainer review of the broader base stack.
  • The protected security label means this should stay in the maintainer/security lane even though the earlier destructive cleanup concern appears fixed.
  • I did not run tests or live OpenClaw commands because this was a read-only review; validation here is from source, tests, history, PR discussion, and supplied runtime output.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c3d98153202d.

Re-review progress:

@100yenadmin
Copy link
Copy Markdown
Contributor Author

Architecture fit for this slice:

flowchart TD
    A["#78595 database-first runtime"] --> B["#79971 runtime-truth fixes"]
    B --> C["setup freshness uses SQLite truth"]
    B --> D["doctor integrity uses SQLite truth"]
    B --> E["export emits canonical session header"]
    B --> F["checkpoint cleanup uses source-session ownership"]
    B --> G["recent readers stay on active branch"]
Loading

Why this slice exists:

  • it hardens OpenClaw's own SQLite runtime behavior before any companion-specific seam work
  • it removes places where stale JSONL-era assumptions could still win over canonical SQLite truth
  • it keeps future LCM ingestion safer by making setup, doctor, export, and replay-adjacent readers align with the same runtime source of truth

Why it was chosen as a separate PR:

  • these are correctness fixes on top of #78595, not a public SDK expansion
  • maintainers can evaluate them as OpenClaw runtime hardening without needing to accept the companion-seam stack at the same time

Stack placement:

  • base refactor: #78595
  • upstream umbrella: #79902
  • this PR intentionally does not implement #79903, #79904, or #79905

Lossless Claw connection:

  • LCM should consume OpenClaw as canonical truth, so the canonical truth must be correct first
  • this PR reduces false setup/doctor/export signals that would otherwise poison downstream ingest or operator trust

@100yenadmin
Copy link
Copy Markdown
Contributor Author

Follow-up on the live review finding:

  • addressed the destructive orphan-transcript cleanup path
  • pushed in 49e21721bc
  • updated the PR body with direct runtime proof and refreshed validation counts

What changed:

  • doctor now archives orphan legacy transcripts as *.deleted.<timestamp> backups instead of deleting them
  • operator text now says archive, not delete
  • tests now prove the archived file still contains the original transcript contents

Runtime proof is now in the PR body and shows:

  • the original orphan transcript is removed from the active path
  • a timestamped archived backup is kept
  • the operator-facing notes match the non-destructive behavior

This keeps the SQLite-truth direction from #78595 while removing the data-loss risk from doctor remediation.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@steipete steipete force-pushed the exp-vfs branch 9 times, most recently from 66da4a2 to 7e59d6c Compare May 10, 2026 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: macos App: macos app: web-ui App: web-ui channel: discord Channel integration: discord channel: feishu Channel integration: feishu channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: irc channel: line Channel integration: line channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: msteams Channel integration: msteams channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: qa-channel Channel integration: qa-channel channel: qqbot channel: signal Channel integration: signal channel: slack Channel integration: slack channel: synology-chat channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: twitch Channel integration: twitch channel: voice-call Channel integration: voice-call channel: whatsapp-web Channel integration: whatsapp-web channel: zalo Channel integration: zalo channel: zalouser Channel integration: zalouser cli CLI command changes commands Command implementations docker Docker and sandbox tooling docs Improvements or additions to documentation extensions: acpx extensions: anthropic extensions: cloudflare-ai-gateway extensions: codex extensions: device-pair extensions: diagnostics-otel Extension: diagnostics-otel extensions: kilocode extensions: kimi-coding extensions: llm-task Extension: llm-task extensions: lmstudio extensions: memory-core Extension: memory-core extensions: memory-wiki extensions: minimax extensions: openai extensions: phone-control extensions: qa-lab extensions: tts-local-cli gateway Gateway runtime plugin: azure-speech Azure Speech plugin plugin: file-transfer plugin: google-meet plugin: migrate-claude plugin: migrate-hermes scripts Repository scripts security Security documentation size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants