[MP][Refactor] Centralize L2 adapter byte accounting via AdapterUsage#3045
[MP][Refactor] Centralize L2 adapter byte accounting via AdapterUsage#3045ApostaC merged 8 commits intoLMCache:devfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces per-user cache isolation by adding a "cache_salt" field to ObjectKey and IPCCacheEngineKey, along with validation to prevent the use of the separator character "@". The L2 adapter interface is refactored to centralize byte accounting and provide detailed usage reports through a new AdapterUsage class. Additionally, serialization logic in both C++ and Python is updated to handle salted keys using an "@@" prefix for backward compatibility. Review feedback highlights several opportunities to improve code robustness by using "strict=True" in zip() calls and identifies missing type hints in new or modified constructors as per the project's style guide.
| # pool ran out of slots mid-batch (the loop above ``break``s | ||
| # in that case), so we notify only the keys we actually stored. | ||
| if stored_keys: | ||
| self._notify_keys_stored(stored_keys, stored_sizes) |
There was a problem hiding this comment.
Nixl store double-counts bytes on key re-stores
Medium Severity
The Nixl adapter's _execute_store_in_the_loop unconditionally adds every stored key and its full size to stored_keys/stored_sizes, then calls _notify_keys_stored. Unlike the mock adapter (which skips existing keys) and the native connector (which passes size=0 for re-stores via the _key_sizes check), the Nixl adapter has no deduplication — re-storing the same key adds its full byte size to _total_bytes_used again, inflating usage_fraction and potentially triggering premature eviction.
Reviewed by Cursor Bugbot for commit eca6cfe. Configure here.
There was a problem hiding this comment.
Will we have double count here?
eca6cfe to
6648638
Compare
6648638 to
3d4d1a6
Compare
3d4d1a6 to
aaedb2b
Compare
Replaces ``L2AdapterInterface.get_usage() -> tuple[float, float]`` with a structured ``AdapterUsage`` dataclass (aggregate + per-user byte counts) and moves byte accounting into the base class. Adapters pass ``max_capacity_bytes`` to ``super().__init__()`` and fire ``_notify_keys_stored(keys, sizes)`` / ``_notify_keys_deleted(keys, sizes)``; the base class maintains both ``_total_bytes_used`` and ``_per_user_size_bytes`` (keyed by ``ObjectKey.cache_salt`` from PR Existing LRU eviction behavior is unchanged — this is purely an internal refactor that sets up PR5's per-user quota policy. PR3 in a 6-PR series. Key changes: - ``AdapterUsage`` dataclass with ``total_bytes_used``, ``total_capacity_bytes``, ``per_user_bytes`` (read-only ``MappingProxyType`` snapshot), and ``usage_fraction`` property that preserves the legacy ``-1.0`` "no eviction signal" sentinel. - ``L2AdapterInterface.supports_eviction`` property — ``True`` only when ``max_capacity_bytes > 0``. ``StorageManager`` skips creating an ``L2AdapterEvictionState`` for adapters that don't support eviction (logs a warning if ``eviction_config`` was nevertheless set). - Underflow clamp: ``_notify_keys_deleted`` clamps ``_total_bytes_used`` to 0 with a warning if a buggy caller would drive it negative — without the clamp the ``-1`` sentinel would silently disable eviction forever. - Native connector preserves listener notification on duplicate stores (passes ``size=0``) so LRU policies still ``move_to_end`` on re-store. - Each adapter retains whatever per-key bookkeeping it needs internally (mock keeps ``_current_size_bytes`` for the within-batch capacity gate; native connector keeps ``_key_sizes`` so the demux thread can look up sizes at delete time). - ``NixlObjPool.get_usage`` renamed to ``get_slot_usage`` to avoid colliding with the byte-based adapter ``get_usage``. Tests: 24 new across two files. ``test_l2_adapter_base.py`` covers ``AdapterUsage`` semantics, ``supports_eviction``, base-class accounting (aggregate + per-user, bucket cleanup at zero, underflow clamp, ``MappingProxyType`` immutability, listener-only ``size=0`` notify, concurrent thread safety) and ``test_mock_l2_adapter.py`` adds an end-to-end test that ``cache_salt`` flows from a real adapter store through to ``AdapterUsage.per_user_bytes``. Signed-off-by: royyhuang <roy.y.huang@gmail.com>
aaedb2b to
a62b3b4
Compare
| stored_sizes: list[int] = [] | ||
| try: | ||
| for key, obj in zip(keys, objects, strict=False): | ||
| for key, obj in zip(keys, objects, strict=True): |
There was a problem hiding this comment.
Mock adapter zip strictness change may crash event loop
Low Severity
The zip(keys, objects, strict=True) change from strict=False raises a ValueError inside the asyncio event loop if keys and objects differ in length. The generic except Exception on line 422 catches it and sets success = False, but unlike other adapters (e.g. nixl_store_l2_adapter.py line 732 which kept strict=False), this changes the error-handling contract for the mock adapter inconsistently.
Reviewed by Cursor Bugbot for commit a62b3b4. Configure here.
ApostaC
left a comment
There was a problem hiding this comment.
Please the comments below, thanks!
Two fixes from PR LMCache#3045 review: 1. base.py `_notify_keys_stored` / `_notify_keys_deleted`: aggregate per-salt deltas into a small local dict first, then apply one read+write per unique salt to `_per_cache_salt_size_bytes` (instead of one per key). Reduces hits on the potentially large registry dict from O(N keys) to O(U unique salts). 2. eviction_controller `report_status`: stop embedding the full `per_cache_salt_bytes` map in the status payload. A deployment can have 10k+ salts; surface only the bucket count here and leave per-salt inspection to a paginated endpoint. Signed-off-by: royyhuang <roy.y.huang@gmail.com>
The `per_*` prefix chained with a multi-word identifier produces long, awkward names (`per_cache_salt_bytes`, `_per_cache_salt_size_bytes`). Switch to `by_cache_salt` as the grouping-dimension qualifier: - AdapterUsage.per_cache_salt_bytes → AdapterUsage.bytes_by_cache_salt - L2AdapterInterface._per_cache_salt_size_bytes → _bytes_by_cache_salt - _EMPTY_PER_CACHE_SALT → _EMPTY_BY_CACHE_SALT No behavior change. Tests and the design doc are updated in lockstep. Signed-off-by: royyhuang <roy.y.huang@gmail.com>
| Return (current_usage, usage_after_ongoing_eviction) in [0, 1] for | ||
| the slot pool. Renamed from ``get_usage`` to avoid colliding with | ||
| the byte-based ``L2AdapterInterface.get_usage`` shape — this is | ||
| an internal pool helper, not the adapter-level usage report. |
There was a problem hiding this comment.
Renamed get_slot_usage method appears to be unused
Low Severity
NixlObjPool.get_usage() was renamed to get_slot_usage(), and the comment at the adapter level claims it is "still used internally for capacity-check style decisions." However, the only caller was the now-removed NixlStoreL2Adapter.get_usage() override. No call site for get_slot_usage() exists in the adapter — slot exhaustion is detected via batched_allocate returning an empty list. This makes get_slot_usage dead code with a misleading comment.
Additional Locations (1)
Triggered by project rule: LMCache Code Review Style Guide
Reviewed by Cursor Bugbot for commit c59f18b. Configure here.
| if new_total <= 0: | ||
| self._bytes_by_cache_salt.pop(salt, None) | ||
| else: | ||
| self._bytes_by_cache_salt[salt] = new_total |
There was a problem hiding this comment.
Per-salt underflow silently swallowed without logging warning
Medium Severity
_notify_keys_deleted logs a warning and clamps when _total_bytes_used goes negative, but silently drops per-salt entries that go negative (the if new_total <= 0: pop branch). After a total underflow clamp, _total_bytes_used can be 0 while sum(_bytes_by_cache_salt.values()) remains positive for other salts, creating an accounting inconsistency. Since this PR sets up the per-user quota policy in a follow-up PR, silent per-salt underflows could hide bugs that corrupt per-user quota decisions.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c59f18b. Configure here.
| "current_usage": usage.usage_fraction, | ||
| "total_bytes_used": usage.total_bytes_used, | ||
| "total_capacity_bytes": usage.total_capacity_bytes, | ||
| "num_cache_salt_buckets": len(usage.bytes_by_cache_salt), |
There was a problem hiding this comment.
Breaking status API field removal not called out
Medium Severity
report_status removes the usage_after_ongoing_eviction key from the HTTP status response and replaces it with total_bytes_used, total_capacity_bytes, and num_cache_salt_buckets. This is a backwards-incompatible change to a public status API consumed by monitoring tools and dashboards. The project conventions require breaking changes to be explicitly called out, but the PR description does not mention this field removal.
Triggered by project rule: LMCache Code Review Style Guide
Reviewed by Cursor Bugbot for commit 84cc379. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 5 total unresolved issues (including 4 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 256b493. Configure here.
| deleted_sizes.append(size) | ||
| if deleted_keys: | ||
| self._notify_keys_deleted(deleted_keys) | ||
| self._notify_keys_deleted(deleted_keys, deleted_sizes) |
There was a problem hiding this comment.
Dynamic nixl adapter has dual capacity tracking divergence risk
Low Severity
The adapter now maintains two independent byte counters — _total_bytes (local, for synchronous capacity checks) and _total_bytes_used (base class, for get_usage()). The delete() method decrements _total_bytes inside self._lock but calls _notify_keys_deleted (which decrements _total_bytes_used under _usage_lock) outside it. If a concurrent store reads _total_bytes_used via get_usage() between these two decrements, the counters are temporarily inconsistent. The dual-counter pattern is fragile and undocumented unlike the mock adapter which explicitly comments on this.
Additional Locations (1)
Triggered by project rule: LMCache Code Review Style Guide
Reviewed by Cursor Bugbot for commit 256b493. Configure here.
YaoJiayi
left a comment
There was a problem hiding this comment.
LGTM in general. Left a few minor comments
| obj.increase_pin_count() | ||
| self._completed_lookup_tasks[task_id] = bitmap | ||
| if recovered_keys: | ||
| self._notify_keys_stored(recovered_keys, recovered_sizes) |
There was a problem hiding this comment.
Why do we do _notify_keys_stored in lookup?
There was a problem hiding this comment.
This is used to report usage that is persistent from previous runs. Otherwise, we will lost these "sizes" if LMCache engine restarts (e.g., report 0 bytes used even if user already store some cache from previous LMCache run).
| Return (current_usage, usage_after_ongoing_eviction) in [0, 1] for | ||
| the slot pool. Renamed from ``get_usage`` to avoid colliding with | ||
| the byte-based ``L2AdapterInterface.get_usage`` shape — this is | ||
| an internal pool helper, not the adapter-level usage report. |
| # pool ran out of slots mid-batch (the loop above ``break``s | ||
| # in that case), so we notify only the keys we actually stored. | ||
| if stored_keys: | ||
| self._notify_keys_stored(stored_keys, stored_sizes) |
There was a problem hiding this comment.
Will we have double count here?
sammshen
left a comment
There was a problem hiding this comment.
native adapter changes LGTM!
Address three high-severity Cursor-bot findings on PR LMCache#3064 that were introduced when PR LMCache#3045 centralized L2 adapter byte accounting via ``AdapterUsage``: - ``_notify_keys_stored`` / ``_notify_keys_deleted`` now receive the required ``sizes`` argument, and ``__init__`` passes ``max_capacity_bytes`` to the base class so the inherited byte counters are initialized. The redundant local ``_current_size_bytes`` tracker is removed; ``_key_sizes`` is retained solely so ``delete`` can recover per-key sizes to forward to the base class. - The custom ``get_usage()`` returning ``tuple[float, float]`` is dropped in favor of the inherited implementation that returns an ``AdapterUsage`` snapshot (``usage_fraction == -1.0`` when ``max_capacity_gb == 0``), and ``report_status`` now reads from it. - ``_object_key_to_string`` appends ``@<cache_salt>`` when non-empty, matching ``native_connector_l2_adapter`` and ``fs_l2_adapter`` so two users sharing a model/rank/chunk no longer collide on the same S3 object. Empty ``cache_salt`` preserves the 3-field shape. Tests updated to the ``AdapterUsage`` API and a new ``test_cache_salt_appended`` guards against the collision regression. Full suite: 20 passed (was 13 passed / 6 failed). Signed-off-by: Samuel Shen <slshen@uchciago.edu>
* [Feat] Add S3 L2 adapter for MP mode Wraps the same awscrt.s3 client used by the non-MP S3Connector in the L2AdapterInterface contract: asyncio loop on a daemon thread + 3 eventfds, refcounted submit_unlock, real DeleteObject for eviction, and get_usage() driven by a configurable max_capacity_gb. The existing non-MP S3 path is untouched. Registered via the *_l2_adapter.py auto-discovery convention, selected in the MP server via --l2-adapter with "type": "s3". The ported circuit breaker (3 consecutive connection errors -> disabled) short-circuits all submit paths to all-zero bitmaps. Object naming is "<model>@<kv_rank_hex>@<chunk_hash_hex>" and is not wire-compatible with non-MP S3 naming. Signed-off-by: Samuel Shen <slshen@uchciago.edu> * [Fix][S3] Adapt S3 L2 adapter to AdapterUsage accounting API Address three high-severity Cursor-bot findings on PR #3064 that were introduced when PR #3045 centralized L2 adapter byte accounting via ``AdapterUsage``: - ``_notify_keys_stored`` / ``_notify_keys_deleted`` now receive the required ``sizes`` argument, and ``__init__`` passes ``max_capacity_bytes`` to the base class so the inherited byte counters are initialized. The redundant local ``_current_size_bytes`` tracker is removed; ``_key_sizes`` is retained solely so ``delete`` can recover per-key sizes to forward to the base class. - The custom ``get_usage()`` returning ``tuple[float, float]`` is dropped in favor of the inherited implementation that returns an ``AdapterUsage`` snapshot (``usage_fraction == -1.0`` when ``max_capacity_gb == 0``), and ``report_status`` now reads from it. - ``_object_key_to_string`` appends ``@<cache_salt>`` when non-empty, matching ``native_connector_l2_adapter`` and ``fs_l2_adapter`` so two users sharing a model/rank/chunk no longer collide on the same S3 object. Empty ``cache_salt`` preserves the 3-field shape. Tests updated to the ``AdapterUsage`` API and a new ``test_cache_salt_appended`` guards against the collision regression. Full suite: 20 passed (was 13 passed / 6 failed). Signed-off-by: Samuel Shen <slshen@uchciago.edu> * [Fix][S3] Use logical get_size() for byte accounting ``_execute_store`` tracked bytes via ``obj.get_physical_size()`` but the payload actually uploaded to S3 comes from ``obj.byte_array``, whose length is ``obj.get_size()`` — physical size includes alignment padding that is never sent on the wire. The mismatch inflated ``AdapterUsage.total_bytes_used`` relative to on-S3 bytes and would cause premature aggregate-watermark eviction whenever the padding was non-zero. Switch to ``get_size()`` to match both the on-wire payload and the convention used by ``native_connector_l2_adapter`` and ``mock_l2_adapter``. Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>


Summary
Replaces
L2AdapterInterface.get_usage() -> tuple[float, float]with a structuredAdapterUsagedataclass (aggregate + per-user byte counts) and centralizes byte accounting in the base class. Adapters passmax_capacity_bytestosuper().__init__()and call_notify_keys_stored(keys, sizes)/_notify_keys_deleted(keys, sizes); the base class maintains both_total_bytes_usedand_per_user_size_bytes(keyed byObjectKey.cache_saltfrom #3042) under a dedicated_usage_lock.Existing LRU eviction behavior is unchanged — this is purely an internal refactor that sets up the per-user quota policy in PR5. PR3 of a 6-PR series.
Changes
Base class (
l2_adapters/base.py)AdapterUsagedataclass:total_bytes_used,total_capacity_bytes,per_user_bytes(read-onlyMappingProxyTypesnapshot), and ausage_fractionproperty that preserves the legacy-1.0"no eviction signal" sentinel.supports_evictionproperty:Trueonly whenmax_capacity_bytes > 0._notify_keys_stored(keys, sizes)and_notify_keys_deleted(keys, sizes)drive aggregate + per-user accounting under_usage_lock. Listener callbacks fire outside the lock so a slow listener can't stall further notifies._notify_keys_deletedclamps_total_bytes_usedto 0 with a warning if a buggy caller would drive it negative — without the clamp the-1sentinel would silently disable eviction forever.get_usage()returnsAdapterUsagederived from base counters; adapters no longer override it.Adapters
max_capacity_bytesto super, passes sizes to notifies. Keeps local_current_size_bytesfor the synchronous within-batch capacity check (clearly documented as not the source-of-truth for external reporting).pool.total_objs * align_bytesto super, passes sizes. Only notifies keys actually stored when the pool runs out of slots mid-batch (fixes a pre-existing partial-batch over-notify).NixlObjPool.get_usagerenamed toget_slot_usageto avoid colliding with the byte-based adapter API.max_capacity_gb*1024^3to super; demux thread collects sizes alongside keys and notifies after the lock. The synchronousdelete()no longer notifies — the demux thread does it with accurate per-key sizes drawn from_key_sizes. Re-stores notify withsize=0so LRU policies stillmove_to_endwithout double-counting bytes.get_usageoverride (defaults to base,supports_eviction == False).Storage manager / eviction controller
StorageManagerskipsL2AdapterEvictionStatefor adapters wheresupports_eviction == Falseeven ifeviction_configwas set (logs a warning).eviction_controller.report_statusand_check_and_evictconsumeAdapterUsage.usage_fractioninstead of unpacking the old tuple.report_statusnow exposestotal_bytes_used,total_capacity_bytes, andper_user_bytes.Test plan
test_l2_adapter_base.py(new):AdapterUsagesemantics,supports_eviction, base-class accounting (aggregate + per-user, bucket cleanup at zero, underflow clamp,MappingProxyTypeimmutability, snapshot detachment, listener-onlysize=0notify, concurrent thread safety)test_mock_l2_adapter.py: end-to-end test thatcache_saltflows from a real adapter store through toAdapterUsage.per_user_bytesget_usagetests intest_mock_l2_adapter.pyandtest_native_connector_l2_adapter.pyfor the new dataclass shapetests/v1/distributed+tests/v1/multiprocesssuites: 734 passed, 26 skippedRelated
cache_saltonObjectKey) — required because per-user accounting keys bycache_saltdocs/design/l2_adapters/l2_per_user_quota.md(unchanged in this PR)QuotaManager)Note
Medium Risk
Touches core L2 adapter interfaces and eviction plumbing; while intended as an internal refactor, incorrect size reporting or missed notify paths could skew usage/eviction behavior across adapters.
Overview
Replaces
L2AdapterInterface.get_usage()’s tuple return with anAdapterUsagedataclass that reports aggregate bytes, capacity,usage_fraction, and per-cache_saltbyte buckets (bytes_by_cache_salt) maintained centrally inl2_adapters/base.pyvia_notify_keys_stored(keys, sizes)/_notify_keys_deleted(keys, sizes).Updates L2 adapter implementations (mock, native connector, Nixl store, dynamic Nixl store, FS) to pass
max_capacity_bytestosuper().__init__, forward per-key sizes into the new notify helpers, and drop per-adapterget_usage()overrides; addssupports_global_evictionand filters eviction setup inStorageManageraccordingly, whileL2EvictionControllerstatus/eviction logic now consumesAdapterUsagefields.Adds focused tests covering base-class accounting semantics (immutability, underflow clamp, concurrency) and updates existing adapter usage tests to the new API, plus an end-to-end check that
cache_saltpopulatesbytes_by_cache_salt.Reviewed by Cursor Bugbot for commit ee62411. Bugbot is set up for automated code reviews on this repo. Configure here.