Skip to content

Meter memory for SQLite on a per-DO basis#6380

Merged
joshthoward merged 1 commit into
mainfrom
jhoward/STOR-5096
Apr 10, 2026
Merged

Meter memory for SQLite on a per-DO basis#6380
joshthoward merged 1 commit into
mainfrom
jhoward/STOR-5096

Conversation

@joshthoward

Copy link
Copy Markdown
Contributor

No description provided.

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR adds per-actor SQLite memory metering by installing a custom sqlite3_mem_methods allocator that routes accounting through jsg::ExternalMemoryAdjustment via a thread-local RAII scope.

Issues (highest severity first)

  1. [High] sqliteMemFree can assert-fail when freeing memory allocated outside a scope. SQLite may allocate memory outside any SqliteMemoryScope (e.g., during sqlite3_open_v2() in the ActorSqlite constructor, before any JS turn installs a scope). When that memory is later freed during a JS turn (inside a scope), sqliteMemFree will decrement the adjustment by the full block size — even though the adjustment was never incremented for that allocation. This calls ExternalMemoryAdjustment::adjust(-N) which invokes maybeDeferAdjustment() containing KJ_ASSERT(amount >= -static_cast<ssize_t>(this->amount)). Since this->amount could be 0 (or less than N), this will assert-fail. The symmetric case (allocated in scope, freed outside scope) causes memory to leak from accounting but won't crash.

  2. [High] No tests. This is a new subsystem that replaces the process-wide SQLite allocator and intercepts every malloc/free/realloc call made by SQLite. A unit test exercising the accounting (allocations tracked, frees decremented, realloc delta correct, no-scope passthrough) would catch issues like #1 above and prevent regressions.

  3. [Medium] sqliteMemRealloc under-accounts when oldActual is 0 but scope is active. On line 90, the condition scope != nullptr && oldActual > 0 means if ptr was nullptr (realloc-as-malloc) or usableSize returned 0 for the old pointer, the new allocation is never accounted for. While SQLite internally redirects realloc(NULL, n) to malloc(n), the defensive guard still creates an inconsistency: if usableSize returns 0 for a valid old pointer (the #else fallback), the new allocation's size is also lost. The fix is to account for the new allocation even when oldActual is 0.

  4. [Medium] adjustment is a public data member on SqliteMemoryScope. The KJ style convention is to minimize public surface area. This reference is only accessed by the three friend allocator functions. Making it private (the friends already have access) would prevent misuse.

  5. [Low] Stale comment referencing nonexistent ensureMonkeyPatchedSqlite. The comment on line 138 of sqlite-metering.c++ says "We rely on callers (Vfs constructors, ensureMonkeyPatchedSqlite)" but ensureMonkeyPatchedSqlite does not exist in the codebase.

  6. [Low] Comment references nonexistent SQLITE_MEMORY_METERING autogate. The header comment on line 19 of sqlite-metering.h mentions a SQLITE_MEMORY_METERING autogate, but no such autogate exists in autogate.h. If this feature is meant to be gated behind an autogate for staged rollout (which seems wise given it replaces the global SQLite allocator), the autogate should be added in this PR. If it's planned for a follow-up, the comment should say so.

  7. [Low] Unnecessary #include <workerd/jsg/setup.h> in actor-sqlite.c++. The ExternalMemoryTarget class is fully defined in jsg/jsg.h (which is already transitively included via actor-cache.h). The setup.h header pulls in significantly more than needed.

Comment thread src/workerd/util/sqlite-metering.c++ Outdated
Comment thread src/workerd/util/sqlite-metering.c++
Comment thread src/workerd/util/sqlite-metering.h Outdated
Comment thread src/workerd/util/sqlite-metering.c++ Outdated
Comment thread src/workerd/util/sqlite-metering.h Outdated
Comment thread src/workerd/io/actor-sqlite.c++ Outdated
@ask-bonk

ask-bonk Bot commented Mar 22, 2026

Copy link
Copy Markdown
Contributor

Review posted successfully on PR #6380. Here's a summary of what I found:

7 issues identified, with the two highest severity being:

  1. [High] Assertion crash risk in sqliteMemFree — Memory allocated by SQLite outside a SqliteMemoryScope (e.g., during sqlite3_open_v2()) that is later freed inside a scope will call adjust(-N) when the adjustment's tracked amount may be 0, triggering a KJ_ASSERT failure in ExternalMemoryAdjustment::maybeDeferAdjustment().

  2. [High] No tests — This replaces the global SQLite allocator for the entire process. A unit test covering the basic accounting paths would catch the assertion issue and prevent regressions.

  3. [Medium] Realloc under-accounting when oldActual is 0 but scope is active.

  4. [Medium] Public data member on SqliteMemoryScope that should be private (friends already have access).

5-7. [Low] Stale comment referencing nonexistent ensureMonkeyPatchedSqlite, comment referencing nonexistent SQLITE_MEMORY_METERING autogate, and unnecessary #include <workerd/jsg/setup.h>.

All issues with concrete fixes were posted as inline suggestion comments on the PR.

github run

@github-actions

github-actions Bot commented Mar 22, 2026

Copy link
Copy Markdown

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

@kentonv

kentonv commented Mar 25, 2026

Copy link
Copy Markdown
Member

Are we sure this approach actually works?

I'm worried what happens if sqlite does a lot of caching, and evicts the cache as needed. Worker A does a query, populating the cache with Worker A's database pages. Worker B does a query that requires a lot of memory. This causes Worker A's pages to be evicted. But because Worker B is the one running at that time, the evictions are counted as memory deallocation on behalf of Worker B, even though the pages were originally allocated by Worker A.

@joshthoward joshthoward marked this pull request as ready for review March 26, 2026 00:41
@joshthoward joshthoward requested review from a team as code owners March 26, 2026 00:41
@joshthoward

Copy link
Copy Markdown
Contributor Author

From the SQLite core team each database within a database connection operates independently of all the others. As long as we do not do the following things (which we do not):

  • We do not using shared-cache mode (forcibly disabled using -DSQLITE_OMIT_SHARED_CACHE in this PR).
  • We do not allocate separate cache memory via sqlite3_config(SQLITE_CONFIG_PCACHE).
  • We do not used sqlite3_config(SQLITE_CONFIG_PCACHE2) to implement our own customized page cache algorithm.
  • We do not build SQLite using SQLITE_ENABLE_MEMORY_MANAGEMENT.
    We can assume that one connection will never free pages from another connection's page cache.

@joshthoward joshthoward changed the title Attribute SQLite memory consumed to isolate memory consumed Meter memory for SQLite on a per-DO basis Mar 26, 2026
Comment thread src/workerd/util/sqlite-metering.c++ Outdated
@joshthoward joshthoward force-pushed the jhoward/STOR-5096 branch 2 times, most recently from 05cac88 to 8c91e03 Compare April 2, 2026 16:28
@codecov-commenter

codecov-commenter commented Apr 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.35294% with 77 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.83%. Comparing base (6fc69a6) to head (a09e565).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/workerd/util/sqlite-metering-test.c++ 66.66% 0 Missing and 42 partials ⚠️
src/workerd/util/sqlite-metering.c++ 84.54% 7 Missing and 10 partials ⚠️
src/workerd/util/sqlite-test.c++ 82.00% 0 Missing and 9 partials ⚠️
src/workerd/util/sqlite.c++ 87.50% 4 Missing ⚠️
src/workerd/tests/test-fixture.c++ 0.00% 3 Missing ⚠️
src/workerd/server/server.c++ 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6380      +/-   ##
==========================================
+ Coverage   70.81%   70.83%   +0.01%     
==========================================
  Files         435      437       +2     
  Lines      122668   123224     +556     
  Branches    19268    19393     +125     
==========================================
+ Hits        86872    87283     +411     
- Misses      24383    24439      +56     
- Partials    11413    11502      +89     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread src/workerd/io/actor-sqlite-test.c++ Outdated
@joshthoward joshthoward requested a review from jclee April 3, 2026 18:44
Comment thread src/workerd/io/limit-enforcer.h
Comment thread src/workerd/io/limit-enforcer.h Outdated
Comment thread src/workerd/util/sqlite-metering.c++ Outdated
Comment thread src/workerd/util/sqlite-metering.c++
Comment thread src/workerd/util/sqlite-metering.c++ Outdated
Comment thread src/workerd/util/sqlite-metering.h Outdated
Comment thread src/workerd/util/sqlite-metering.c++
Comment thread src/workerd/util/sqlite-metering.h Outdated
Comment thread src/workerd/util/sqlite-metering.h Outdated
Comment thread src/workerd/util/sqlite.c++ Outdated
@jclee

jclee commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

FWIW, it looks like the windows test failures happen because the sqlite build process ends up generating a file named "VERSION" with the raw sqlite version in it, and then other dependencies like boringssl fail when they try to do #include <version>, because sqlite's "VERSION" file is somehow in the include path and Windows is using a case-insensitive file system.

Since the windows tests build seems to be passing on main, I assume the failure is due to something in this PR. But since the only thing this PR changes about the sqlite3 build process is the "SQLITE_OMIT_SHARED_CACHE" flag, I assume the issue is only arising because the PR needs to rebuild sqlite because of the change. I'm not sure whether we saw the same problems on earlier sqlite3 build changes or what we might have done to resolve them.

Comment thread src/workerd/util/sqlite.h
Comment thread src/workerd/util/sqlite.h Outdated
Comment thread src/workerd/util/sqlite-test.c++
Comment thread src/workerd/util/sqlite-metering.h
Comment thread src/workerd/util/sqlite-metering.h Outdated
Comment thread src/workerd/util/sqlite.c++ Outdated
Comment thread src/workerd/util/sqlite-test.c++
Comment thread src/workerd/util/sqlite-test.c++
Comment thread src/workerd/util/sqlite.c++ Outdated
Comment thread src/workerd/util/sqlite.c++ Outdated
Comment thread src/workerd/util/sqlite.c++
Comment thread src/workerd/util/sqlite-test.c++
@joshthoward joshthoward force-pushed the jhoward/STOR-5096 branch 2 times, most recently from c6d5177 to 8909534 Compare April 8, 2026 19:28
Comment thread src/workerd/util/sqlite-metering.c++
@joshthoward joshthoward force-pushed the jhoward/STOR-5096 branch 2 times, most recently from 5906267 to 2b9eb9a Compare April 9, 2026 19:00
@codspeed-hq

codspeed-hq Bot commented Apr 9, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 70 untouched benchmarks
⏩ 129 skipped benchmarks1


Comparing jhoward/STOR-5096 (a09e565) with main (9dbd5fa)

Open in CodSpeed

Footnotes

  1. 129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@joshthoward joshthoward force-pushed the jhoward/STOR-5096 branch 2 times, most recently from 5f72f54 to e7f69e4 Compare April 9, 2026 19:27
@joshthoward joshthoward enabled auto-merge April 10, 2026 13:56
@joshthoward joshthoward merged commit 1764d60 into main Apr 10, 2026
25 checks passed
@joshthoward joshthoward deleted the jhoward/STOR-5096 branch April 10, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants