Meter memory for SQLite on a per-DO basis#6380
Conversation
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This PR adds per-actor SQLite memory metering by installing a custom sqlite3_mem_methods allocator that routes accounting through jsg::ExternalMemoryAdjustment via a thread-local RAII scope.
Issues (highest severity first)
-
[High]
sqliteMemFreecan assert-fail when freeing memory allocated outside a scope. SQLite may allocate memory outside anySqliteMemoryScope(e.g., duringsqlite3_open_v2()in theActorSqliteconstructor, before any JS turn installs a scope). When that memory is later freed during a JS turn (inside a scope),sqliteMemFreewill decrement the adjustment by the full block size — even though the adjustment was never incremented for that allocation. This callsExternalMemoryAdjustment::adjust(-N)which invokesmaybeDeferAdjustment()containingKJ_ASSERT(amount >= -static_cast<ssize_t>(this->amount)). Sincethis->amountcould be 0 (or less than N), this will assert-fail. The symmetric case (allocated in scope, freed outside scope) causes memory to leak from accounting but won't crash. -
[High] No tests. This is a new subsystem that replaces the process-wide SQLite allocator and intercepts every
malloc/free/realloccall made by SQLite. A unit test exercising the accounting (allocations tracked, frees decremented, realloc delta correct, no-scope passthrough) would catch issues like #1 above and prevent regressions. -
[Medium]
sqliteMemReallocunder-accounts whenoldActualis 0 but scope is active. On line 90, the conditionscope != nullptr && oldActual > 0means ifptrwasnullptr(realloc-as-malloc) orusableSizereturned 0 for the old pointer, the new allocation is never accounted for. While SQLite internally redirectsrealloc(NULL, n)tomalloc(n), the defensive guard still creates an inconsistency: ifusableSizereturns 0 for a valid old pointer (the#elsefallback), the new allocation's size is also lost. The fix is to account for the new allocation even whenoldActualis 0. -
[Medium]
adjustmentis a public data member onSqliteMemoryScope. The KJ style convention is to minimize public surface area. This reference is only accessed by the threefriendallocator functions. Making itprivate(the friends already have access) would prevent misuse. -
[Low] Stale comment referencing nonexistent
ensureMonkeyPatchedSqlite. The comment on line 138 ofsqlite-metering.c++says "We rely on callers (Vfs constructors, ensureMonkeyPatchedSqlite)" butensureMonkeyPatchedSqlitedoes not exist in the codebase. -
[Low] Comment references nonexistent
SQLITE_MEMORY_METERINGautogate. The header comment on line 19 ofsqlite-metering.hmentions aSQLITE_MEMORY_METERINGautogate, but no such autogate exists inautogate.h. If this feature is meant to be gated behind an autogate for staged rollout (which seems wise given it replaces the global SQLite allocator), the autogate should be added in this PR. If it's planned for a follow-up, the comment should say so. -
[Low] Unnecessary
#include <workerd/jsg/setup.h>inactor-sqlite.c++. TheExternalMemoryTargetclass is fully defined injsg/jsg.h(which is already transitively included viaactor-cache.h). Thesetup.hheader pulls in significantly more than needed.
|
Review posted successfully on PR #6380. Here's a summary of what I found: 7 issues identified, with the two highest severity being:
5-7. [Low] Stale comment referencing nonexistent All issues with concrete fixes were posted as inline suggestion comments on the PR. |
|
The generated output of |
a1a4a48 to
2b496fc
Compare
|
Are we sure this approach actually works? I'm worried what happens if sqlite does a lot of caching, and evicts the cache as needed. Worker A does a query, populating the cache with Worker A's database pages. Worker B does a query that requires a lot of memory. This causes Worker A's pages to be evicted. But because Worker B is the one running at that time, the evictions are counted as memory deallocation on behalf of Worker B, even though the pages were originally allocated by Worker A. |
2b496fc to
98ce28c
Compare
|
From the SQLite core team each database within a database connection operates independently of all the others. As long as we do not do the following things (which we do not):
|
98ce28c to
472cffd
Compare
05cac88 to
8c91e03
Compare
8c91e03 to
23fc2d1
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6380 +/- ##
==========================================
+ Coverage 70.81% 70.83% +0.01%
==========================================
Files 435 437 +2
Lines 122668 123224 +556
Branches 19268 19393 +125
==========================================
+ Hits 86872 87283 +411
- Misses 24383 24439 +56
- Partials 11413 11502 +89 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
23fc2d1 to
66c58ad
Compare
|
FWIW, it looks like the windows test failures happen because the sqlite build process ends up generating a file named "VERSION" with the raw sqlite version in it, and then other dependencies like boringssl fail when they try to do Since the windows tests build seems to be passing on main, I assume the failure is due to something in this PR. But since the only thing this PR changes about the sqlite3 build process is the "SQLITE_OMIT_SHARED_CACHE" flag, I assume the issue is only arising because the PR needs to rebuild sqlite because of the change. I'm not sure whether we saw the same problems on earlier sqlite3 build changes or what we might have done to resolve them. |
c6d5177 to
8909534
Compare
5906267 to
2b9eb9a
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
5f72f54 to
e7f69e4
Compare
e7f69e4 to
a09e565
Compare
No description provided.