[Feat] Add cache_salt parameter to MP adapter interfaces#3029
[Feat] Add cache_salt parameter to MP adapter interfaces#3029royyhuang merged 5 commits intoLMCache:devfrom
Conversation
Add cache_salt="" default parameter to scheduler and worker adapter methods to prepare for per-user cache isolation. This is a no-op change — all defaults are empty string, preserving existing behavior. Scheduler adapter: - maybe_submit_lookup_request: add cache_salt param - free_lookup_locks: add cache_salt param - _create_key: add cache_salt param (accepted, not yet forwarded) Worker adapter: - submit_store_request: add cache_salt param - submit_retrieve_request: add cache_salt param - batched_submit_store_requests: add cache_salts list param - batched_submit_retrieve_requests: add cache_salts list param - _create_key: add cache_salt param (accepted, not yet forwarded) cache_salt will be forwarded to IPCCacheEngineKey once that field is added in a follow-up PR. Signed-off-by: royyhuang <royyhuang@gmail.com> Signed-off-by: royyhuang <roy.y.huang@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces cache_salt (per-user isolation salt) across various methods in the VLLMMultiProcessAdapter, including lookup, store, retrieve, and their batched counterparts. Feedback suggests using strict=True in the zip calls within the batched store and retrieve methods to ensure that the input lists (request_ids, ops, and cache_salts) are of equal length, preventing potential silent truncation.
| self.submit_store_request(request_id, op, event) | ||
| if cache_salts is None: | ||
| cache_salts = [""] * len(request_ids) | ||
| for request_id, op, salt in zip(request_ids, ops, cache_salts, strict=False): |
There was a problem hiding this comment.
Using strict=True in the zip call ensures that request_ids, ops, and cache_salts have the same length. This prevents silent truncation of the batch if there is a mismatch between these lists, which is particularly important now that a third list (cache_salts) is involved in the iteration.
| for request_id, op, salt in zip(request_ids, ops, cache_salts, strict=False): | |
| for request_id, op, salt in zip(request_ids, ops, cache_salts, strict=True): |
| self.submit_retrieve_request(request_id, op, event) | ||
| if cache_salts is None: | ||
| cache_salts = [""] * len(request_ids) | ||
| for request_id, op, salt in zip(request_ids, ops, cache_salts, strict=False): |
There was a problem hiding this comment.
Using strict=True in the zip call ensures that request_ids, ops, and cache_salts have the same length. This prevents silent truncation of the batch if there is a mismatch between these lists, ensuring all requests in the batch are processed as expected.
| for request_id, op, salt in zip(request_ids, ops, cache_salts, strict=False): | |
| for request_id, op, salt in zip(request_ids, ops, cache_salts, strict=True): |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 656976a. Configure here.
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
…ota-pr1' into feat/l2-per-user-quota-pr1
Adds a ``cache_salt: str = ""`` field to ``ObjectKey`` and ``IPCCacheEngineKey`` so same-content/different-user cache entries produce distinct keys. This is the data-model piece of the per-user L2 cache quota feature (PR2 of 6). PR1a (LMCache#3029) wired cache_salt through the adapter surface but left it a no-op until the data model landed; this PR makes it load-bearing. Key changes: - ``ObjectKey`` and ``IPCCacheEngineKey`` gain ``cache_salt`` field participating in eq/hash. ``__post_init__`` on both rejects ``@`` in the salt — this is the invariant the L2 adapter parsers rely on. - ``ipc_key_to_object_keys(ipc_key, chunk_hashes)`` reads ``ipc_key.cache_salt`` internally. No separate parameter, so callers cannot accidentally drop the salt. - Native connector, FS adapter Python, and FS adapter C++ all use a leading ``@@`` marker to disambiguate salted from legacy keys. Legacy files and wire payloads (empty salt) remain readable — existing stored data is preserved. - msgspec encodes dataclasses as maps, so an old 7-field ``IPCCacheEngineKey`` payload decodes on new code with ``cache_salt`` defaulting to ``""``. New-on-old works because msgspec ignores unknown map keys by default. Tests: 32 new cases covering serialization roundtrip (native + FS), validation (``@`` rejection), isolation semantics (eq/hash per salt), msgspec wire compat (old→new, new→new), and the FS C++ parser's legacy/salted/``@``-in-model paths. Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [Feat] Add cache_salt to ObjectKey for per-user cache isolation Adds a ``cache_salt: str = ""`` field to ``ObjectKey`` and ``IPCCacheEngineKey`` so same-content/different-user cache entries produce distinct keys. This is the data-model piece of the per-user L2 cache quota feature (PR2 of 6). PR1a (#3029) wired cache_salt through the adapter surface but left it a no-op until the data model landed; this PR makes it load-bearing. Key changes: - ``ObjectKey`` and ``IPCCacheEngineKey`` gain ``cache_salt`` field participating in eq/hash. ``__post_init__`` on both rejects ``@`` in the salt — this is the invariant the L2 adapter parsers rely on. - ``ipc_key_to_object_keys(ipc_key, chunk_hashes)`` reads ``ipc_key.cache_salt`` internally. No separate parameter, so callers cannot accidentally drop the salt. - Native connector, FS adapter Python, and FS adapter C++ all use a leading ``@@`` marker to disambiguate salted from legacy keys. Legacy files and wire payloads (empty salt) remain readable — existing stored data is preserved. - msgspec encodes dataclasses as maps, so an old 7-field ``IPCCacheEngineKey`` payload decodes on new code with ``cache_salt`` defaulting to ``""``. New-on-old works because msgspec ignores unknown map keys by default. Tests: 32 new cases covering serialization roundtrip (native + FS), validation (``@`` rejection), isolation semantics (eq/hash per salt), msgspec wire compat (old→new, new→new), and the FS C++ parser's legacy/salted/``@``-in-model paths. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * [Fix] Expand cache_salt validation: reject /, \, NUL; add length cap Addresses review feedback on #3042: - cache_salt must not contain '/', '\', or NUL (in addition to '@'). The FS adapter embeds salt into filenames — '/' would inject directory separators; '\' is a Windows path separator; NUL terminates C strings in the C++ connector. - Max length capped at 128 chars to stay within NAME_MAX (255) after model_name, kv_rank, chunk_hash, and the .data extension are added. - Updated example external connector template (examples/lmc_external_native_connector) to handle the @@ salted key format, matching the in-tree csrc/storage_backends/fs/connector.cpp. - Added tests for /, \, NUL rejection and length cap on both ObjectKey and IPCCacheEngineKey. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * [Refactor] Always use @@-prefixed key format; deprecate legacy All new writes use the unified @@-prefixed format for both salted and unsalted keys: @@<cache_salt>@<model>@<rank>@<hash> For empty cache_salt this becomes @@@model@rank@hash. The legacy 3-field format (no @@ prefix) is still readable by all parsers for backward compatibility but emits a deprecation warning on the FS adapter read path. Re-storing a legacy entry automatically migrates it to the new format. This simplifies parsing — the @@ prefix is always present so there is no need to disambiguate based on field count (model_name may contain @ and would otherwise be ambiguous). Updated Python (native_connector + fs_l2_adapter), C++ (in-tree fs/connector.cpp), and the example external connector template. Tests updated: serialization format assertions reflect the @@@ prefix for empty salt, and roundtrip tests verify the parser still handles legacy filenames. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * [Refactor] Example connector: always emit @@-prefixed filename Aligns the example external native connector template with the in-tree csrc/storage_backends/fs/connector.cpp convention from the previous commit. Previously the example only emitted the @@ prefix when cache_salt was non-empty, leaving empty-salt keys in the legacy 3-field format. Now all files on disk use the unified @@-prefixed format (empty salt produces @@@model@...), and legacy inputs without @@ are still accepted on the read path for backward compatibility. Third-party connectors cloned from this template will produce migrate-compatible filenames out of the box. Signed-off-by: royyhuang <roy.y.huang@gmail.com> --------- Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [Feat] Add cache_salt parameter to MP adapter interfaces Add cache_salt="" default parameter to scheduler and worker adapter methods to prepare for per-user cache isolation. This is a no-op change — all defaults are empty string, preserving existing behavior. Scheduler adapter: - maybe_submit_lookup_request: add cache_salt param - free_lookup_locks: add cache_salt param - _create_key: add cache_salt param (accepted, not yet forwarded) Worker adapter: - submit_store_request: add cache_salt param - submit_retrieve_request: add cache_salt param - batched_submit_store_requests: add cache_salts list param - batched_submit_retrieve_requests: add cache_salts list param - _create_key: add cache_salt param (accepted, not yet forwarded) cache_salt will be forwarded to IPCCacheEngineKey once that field is added in a follow-up PR. Signed-off-by: royyhuang <royyhuang@gmail.com> Signed-off-by: royyhuang <roy.y.huang@gmail.com> * add per user l2 usage design docs Signed-off-by: royyhuang <roy.y.huang@gmail.com> * address comments Signed-off-by: royyhuang <roy.y.huang@gmail.com> --------- Signed-off-by: royyhuang <royyhuang@gmail.com> Signed-off-by: royyhuang <roy.y.huang@gmail.com>

Summary
cache_salt=""default parameter to all scheduler and worker adapter methods in the MP multi-process adaptercache_saltfrom vLLM'sRequest.cache_salt)Changed methods
Scheduler adapter (
LMCacheMPSchedulerAdapter):maybe_submit_lookup_request(request_id, token_ids, cache_salt="")free_lookup_locks(..., cache_salt="")_create_key(..., cache_salt="")Worker adapter (
LMCacheMPWorkerAdapter):submit_store_request(request_id, op, event, cache_salt="")submit_retrieve_request(request_id, op, event, cache_salt="")batched_submit_store_requests(request_ids, ops, event, cache_salts=None)batched_submit_retrieve_requests(request_ids, ops, event, cache_salts=None)_create_key(..., cache_salt="")cache_saltis accepted but not yet forwarded toIPCCacheEngineKey— that field will be added in a follow-up PR.Test plan
Context
PR1a of the per-user L2 quota feature. See design doc:
docs/design/l2_adapters/l2_per_user_quota.mdNote
Low Risk
Low risk interface expansion with default values, intended to be behavior-preserving; main risk is downstream callers needing to pass/ignore the new optional args correctly.
Overview
Introduces a behavior-preserving interface extension to the vLLM multi-process adapter by adding optional
cache_saltparameters to scheduler lookup/free-lock calls and worker store/retrieve APIs (including batched variants), threading the value through to key construction for future per-user cache isolation.Adds a detailed design doc (
docs/design/l2_adapters/l2_per_user_quota.md) describing the upcoming per-user quota + eviction approach and the planned cross-component changes.Reviewed by Cursor Bugbot for commit 829689a. Bugbot is set up for automated code reviews on this repo. Configure here.