[MP]feat: support different kv cache shape and dtype across layers by liuyumoye · Pull Request #2926 · LMCache/LMCache

liuyumoye · 2026-04-01T06:27:42Z

support different kv cache shape and dtype across layers

What this PR does / why we need it:
This PR adds support for heterogeneous KV cache shapes and dtypes across layers (e.g., models where different layers have different KV head dimensions or data types).

Previously, GPUCacheContext assumed all layers share the same shape and dtype. This PR introduces KVLayerGroupsManager to group layers by (shape, dtype), and updates the D2H/H2D transfer logic in server.py to iterate over each group independently, using per-group tmp_gpu_buffer, kv_pointers, and tensor views.

Key changes:

kv_layer_groups.py: Add build_kv_layer_groups_from_list() to build layer groups from a raw list of KV cache tensors (no layer names required), grouping by (shape, dtype).

gpu_context.py: Replace single hidden_dim_size_ / tmp_gpu_buffer_ with per-group lists (hidden_dim_sizes_, tmp_gpu_buffers_); expose get_tmp_gpu_buffer(num_tokens, group_idx), get_kv_buffer_shape(num_tokens, group_idx), and kv_layer_groups_manager property.

server.py: Update get_layout_desc() to produce per-group shapes/dtypes; refactor D2H (store) and H2D (retrieve) loops to iterate over all groups.

memory_management.py: Improve error handling in tensor property and get_tensor() — replace bare assert with descriptive ValueError; make tensor fall back to get_tensor(0) in multi-group scenarios.

mock_l2_adapter.py: Replace bare assert with ValueError for cleaner error messages.

Special notes for your reviewers:
The grouping logic in build_kv_layer_groups_from_list() is order-preserving: groups are sorted by the first layer index they contain.

get_tmp_gpu_buffer and get_kv_buffer_shape are backward-compatible — group_idx defaults to 0, so single-group models are unaffected.

The tensor property on TensorMemoryObj now delegates to get_tensor(0) when per-group metadata is present, maintaining backward compatibility with callers that only use the single-tensor interface.

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Note

Medium Risk
Updates CUDA transfer kernels and the MP server store/retrieve path to handle per-layer-group shapes/dtypes, which can affect correctness and performance of GPU<->CPU KV transfers. Added tests mitigate risk but changes touch core memory copy and layout logic.

Overview
Enables multiprocessing cache store/retrieve to support heterogeneous KV cache shapes and dtypes across layers by grouping layers with identical (shape, dtype) and transferring each group independently.

GPUCacheContext now builds KVLayerGroupsManager, maintains per-group PageBufferShapeDesc and pointer arrays, and replaces the old single temporary KV buffer with a flat uint8 chunk buffer that concatenates all groups (with helpers to view per-group/per-batch slices). server.py’s layout description, store (D2H), and retrieve (H2D) paths are updated to iterate over groups and copy via the new flat buffer.

CUDA multi_layer_block_kv_transfer is generalized to dispatch over vector widths (uint4/uint32_t/uint16_t) based on alignment instead of requiring uint4, and Python GPU memcpy helpers now validate sizes and copy raw bytes for non-lazy allocators. New unit tests cover multi-group temp-buffer layout and non-4-byte-aligned lmcache_memcpy_async copies (e.g. int8 hidden size 132).

^{Reviewed by Cursor Bugbot for commit 89ece3c. Bugbot is set up for automated code reviews on this repo. Configure here.}

gemini-code-assist

Code Review

This pull request implements support for multiple KV layer groups with distinct shapes and dtypes. Key changes include group-aware memory copy operations in gpu_ops.py, a new method in KVLayerGroupsManager to build groups from tensor lists, and a refactored GPUCacheContext that manages per-group buffers and pointers. The multiprocess server now performs transfers iteratively across these groups. Feedback was provided to replace a platform-dependent and potentially unsafe use of array.array and torch.frombuffer with a direct torch.tensor call for collecting data pointers to ensure 64-bit compatibility and memory safety.

gemini-code-assist · 2026-04-01T06:29:54Z

+        import array
+        return torch.frombuffer(array.array("l", pointers), dtype=torch.long)


The use of array.array("l", ...) is platform-dependent; on some systems (like 64-bit Windows), long is 32-bit, which would truncate 64-bit pointers. Additionally, torch.frombuffer does not manage the lifetime of the underlying array.array object, which could lead to memory safety issues if the tensor is used after the temporary array is garbage collected. It is safer and more idiomatic to use torch.tensor(..., dtype=torch.long). This also removes the need for the local import array.

Suggested change

import array

return torch.frombuffer(array.array("l", pointers), dtype=torch.long)

return torch.tensor([kv_caches[i].data_ptr() for i in group.layer_indices], dtype=torch.long)

maobaolong · 2026-04-02T08:03:04Z

@liuyumoye It seems getting conflict with dev now, would you like to resolve the conflict first? Hope to merge this PR so that MP mode can support DSA.

ApostaC

High-level comments:

Tmp buffer and lmcache_async_memcpy_h2d/d2h should not be aware of kv cache group information. This can reduce the number of code changes by a lot
Please add some unit test for 132 int8s to test the kernel support.

Please see the detailed comments below

ApostaC · 2026-04-02T20:24:31Z

I don't think we need to touch this file.
lmcache_memcpy_async_h2d/d2h doesn't need to know the layout inside the memory object, and it should be called outside the for group in kv_groups: loop.

ApostaC · 2026-04-02T20:30:45Z

+        # Backward-compat scalar aliases (group 0)
+        self.hidden_dim_size_ = self.hidden_dim_sizes_[0]
+        self.num_heads_ = self.group_num_heads_[0]
+        self.head_size_ = self.group_head_sizes_[0]
+        self.shape_desc_ = self.shape_descs_[0]


Do we really need to keep this backward compatibility? I feel like we can force all the codes in server.py to use new interfaces

ApostaC · 2026-04-02T20:31:14Z

@@ -119,17 +154,27 @@ def __init__(
            0, self.block_size_, dtype=torch.long, device=self.device_
        ).unsqueeze(0)
        self.slot_mapping_tensor_ = (offsets + block_ids * self.block_size_).reshape(
-            (self.num_blocks, self.block_size_)
+            (self.num_blocks_, self.block_size_)
        )


Actually, this can be dropped.

And also the old slot_mapping related apis

ApostaC · 2026-04-02T20:34:20Z

-        tmp_buffer_shape = self.get_kv_buffer_shape(
-            lmcache_chunk_size * self.max_batch_size
-        )
-        self.tmp_gpu_buffer_ = torch.empty(
-            tmp_buffer_shape, dtype=self.dtype, device=self.device_
-        )
+        self.tmp_gpu_buffers_: list[torch.Tensor] = [
+            torch.empty(
+                self.get_kv_buffer_shape(
+                    lmcache_chunk_size * self.max_batch_size, group_idx
+                ),
+                dtype=group.dtype,
+                device=self.device_,
+            )
+            for group_idx, group in enumerate(
+                self.kv_layer_groups_manager_.kv_layer_groups
+            )
+        ]
+        # Single-group alias for backward compatibility
+        self.tmp_gpu_buffer_ = self.tmp_gpu_buffers_[0]


As I mentioned above, for tmp_gpu_buffer, we don't need to create tmp_gpu_buffer for each group, but just a "flat" one for all the groups.

We can have a helper function called something like _get_kv_buffer_shape_unified_group() to get the shapes.

ApostaC · 2026-04-02T20:48:07Z

+                for group_idx in range(num_groups):
+                    tmp_buffers = gpu_context.get_tmp_gpu_buffer_batched(
+                        self.chunk_size, batch_len, group_idx
+                    )
+                    group_kv_pointers = gpu_context.get_group_kv_pointers(group_idx)
+
+                    # H2D copy for all chunks in the batch
+                    for tmp_buffer, memory_obj in zip(
+                        tmp_buffers, memory_obj_batch, strict=False
+                    ):
+                        lmcache_memcpy_async_h2d(memory_obj, tmp_buffer, group_idx)
+
+                    lmc_ops.multi_layer_block_kv_transfer(
+                        group_kv_pointers,
+                        [tb.data_ptr() for tb in tmp_buffers],
+                        chunk_block_ids_gpu,
+                        gpu_context.device,
+                        lmc_ops.TransferDirection.H2D,
+                        gpu_context.get_shape_desc(group_idx),
+                        self.chunk_size,
+                        gpu_context.gpu_kv_format_,
+                        skip_blocks_in_chunk,
+                    )


With my proposal above, the code will be something like this:

Suggested change

for group_idx in range(num_groups):

tmp_buffers = gpu_context.get_tmp_gpu_buffer_batched(

self.chunk_size, batch_len, group_idx

)

group_kv_pointers = gpu_context.get_group_kv_pointers(group_idx)

# H2D copy for all chunks in the batch

for tmp_buffer, memory_obj in zip(

tmp_buffers, memory_obj_batch, strict=False

):

lmcache_memcpy_async_h2d(memory_obj, tmp_buffer, group_idx)

lmc_ops.multi_layer_block_kv_transfer(

group_kv_pointers,

[tb.data_ptr() for tb in tmp_buffers],

chunk_block_ids_gpu,

gpu_context.device,

lmc_ops.TransferDirection.H2D,

gpu_context.get_shape_desc(group_idx),

self.chunk_size,

gpu_context.gpu_kv_format_,

skip_blocks_in_chunk,

)

# H2D copy for all chunks in the batch

tmp_buffers = gpu_context.get_tmp_gpu_buffer_batched(

self.chunk_size, batch_len

)

lmcache_memcpy_async_h2d(memory_obj, tmp_buffer, group_idx)

for group_idx in range(num_groups):

group_kv_pointers = gpu_context.get_group_kv_pointers(group_idx)

### New code to get buffer offset from gpu_context by group_idx

tmp_buffer_offsets = gpu_context.get_tmp_gpu_buffer_offset(group_idx)

lmc_ops.multi_layer_block_kv_transfer(

group_kv_pointers,

[tb.data_ptr() + tmp_buffer_offsets for tb in tmp_buffers],

chunk_block_ids_gpu,

gpu_context.device,

lmc_ops.TransferDirection.H2D,

gpu_context.get_shape_desc(group_idx),

self.chunk_size,

gpu_context.gpu_kv_format_,

skip_blocks_in_chunk,

)

cursor · 2026-04-07T03:04:32Z

+
+            self.hidden_dim_sizes_.append(hidden_dim)
+            self.group_num_heads_.append(nh)
+            self.group_head_sizes_.append(hs)


Unused attributes stored but never read

Low Severity

group_num_heads_ and group_head_sizes_ are populated in the constructor but never read anywhere in the codebase. These lists are dead stores — the equivalent values (nh and hs) are already stored inside each PageBufferShapeDesc in shape_descs_, which is what callers actually use. Keeping unused state in the class adds confusion for future maintainers who may wonder where these are consumed.

^{Triggered by project rule: LMCache Code Review Style Guide}

^{Reviewed by Cursor Bugbot for commit 2b762e3. Configure here.}

liuyumoye · 2026-04-07T03:46:22Z

@liuyumoye It seems getting conflict with dev now, would you like to resolve the conflict first? Hope to merge this PR so that MP mode can support DSA.

Thanks for pointing that out! I've resolved the merge conflict with the latest dev branch. The PR is ready for review again. Please let me know if there are any other issues.

ApostaC

All are nit comments. Otherwise LGTM!

ApostaC · 2026-04-07T03:51:25Z

nit note: usually we put #define and #undef outside the function body.

ApostaC · 2026-04-07T03:54:46Z

    """
-    assert memory_obj.tensor is not None
-    assert memory_obj.tensor.numel() == gpu_buffer.numel()
+    src_tensor = memory_obj.raw_data


nit: not sure MemoryObj.raw_data is a public&stable API or not. But I do see there is data_ptr() property define in the MemoryObj base class. We can use that directly when calling lmc_ops.lmcache_memcpy_async instead.

ApostaC · 2026-04-07T03:55:06Z

    """
-    assert memory_obj.tensor is not None
-    assert memory_obj.tensor.numel() == gpu_buffer.numel()
+    dst_tensor = memory_obj.raw_data


same nit comment as above

ApostaC · 2026-04-07T04:02:56Z

+        given group."""
+        return self.group_kv_pointers_[group_idx]
+
+    def get_tmp_gpu_buffer_flat(self, chunk_idx: int = 0) -> torch.Tensor:


nit: let's avoid using a default parameter for chunk_idx. We should make sure that the caller understands it needs to pass in chunk_idx because it directly relates to the batching logic.

ApostaC · 2026-04-07T04:04:39Z

+        The returned slice has exactly ``tmp_chunk_bytes_`` bytes and its
+        layout matches ``MemoryObj.raw_data`` (groups concatenated in order),
+        so it can be copied to/from a MemoryObj with a single memcpy.


nit: Let's avoid using tmp_chunk_bytes_ and MemoryObj.raw_data in docstring to avoid confusion for other developers. Ideally, we can say something like:

The returned tensor will fit a memory full object corresponding ``self.chunk_size`` tokens.

ApostaC · 2026-04-07T04:06:32Z

-        num_elems = shape.numel()
-        return self.tmp_gpu_buffer_.flatten()[:num_elems].view(shape)
+        Returns a view of the temporary GPU buffer for the given group,
+        sized for a single request of ``num_tokens`` tokens (chunk 0).


nit: num_tokens --> lmcache_chunk_size. Also, the (chunk 0) at the end is a bit confusing.

ApostaC · 2026-04-07T04:10:29Z

                    "num_layers": ctx.num_layers,
                    "block_size": ctx.block_size,
-                    "hidden_dim_size": ctx.hidden_dim_size,
+                    "hidden_dim_sizes": str(ctx.hidden_dim_sizes_),


nit: we should not use private members here. Let's have a property defined as hidden_dim_sizes in GPUCacheContext

- gpu_ops: add group_idx param to lmcache_memcpy_async_h2d/d2h, use memory_obj.get_tensor(group_idx) instead of memory_obj.tensor - kv_layer_groups: add build_kv_layer_groups_from_list() to group layers by (shape, dtype) from a plain tensor list - gpu_context: introduce per-group shape_descs_, hidden_dim_sizes_, group_kv_pointers_, and tmp_gpu_buffers_; update get_kv_buffer_shape, get_tmp_gpu_buffer, get_tmp_gpu_buffer_batched to accept group_idx; add get_shape_desc(group_idx) and get_group_kv_pointers(group_idx) - server: update get_layout_desc, _store_loop, _retrieve_loop to iterate over all groups; fix skip_tokens_in_chunk upper bound to use batch_len instead of _BATCH_SIZE Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>

liuyumoye · 2026-04-07T09:00:31Z

All are nit comments. Otherwise LGTM!

Thanks for the review! All nit comments have been addressed.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit cbf4d52. Configure here.}

cursor · 2026-04-07T09:07:41Z

+                        lmc_ops.TransferDirection.D2H,
+                        gpu_context.get_shape_desc(group_idx),
+                        self.chunk_size,
+                        gpu_context.gpu_kv_format_,


Private member access across class boundary in enforced directory

Medium Severity

New code in server.py accesses gpu_context.gpu_kv_format_ (a private _-suffixed attribute) from outside GPUCacheContext. This violates the project's SLF rule, which is enforced by CI in lmcache/v1/multiprocess/. GPUCacheContext exposes gpu_kv_format_name() for the string name but has no public accessor for the format enum itself. A public property like gpu_kv_format is needed to pass the value to the kernel without cross-class private member access.

Additional Locations (1)

lmcache/v1/multiprocess/server.py#L496-L497

^{Triggered by project rule: LMCache Code Review Style Guide}

^{Reviewed by Cursor Bugbot for commit cbf4d52. Configure here.}

maobaolong

@liuyumoye Thanks for this feature, LGTM.

chunxiaozheng · 2026-04-07T09:52:49Z

+  TORCH_CHECK(head_bytes % sizeof(uint16_t) == 0, "head_size * element_size (",
+              head_bytes, ") must be divisible by 2 for vectorized access");
+
+  if (head_bytes % sizeof(uint4) == 0) {


@liuyumoye could we add some comments to indicate how many bytes?

chunxiaozheng · 2026-04-07T10:06:26Z

+              head_bytes, ") must be divisible by 2 for vectorized access");
+
+  if (head_bytes % sizeof(uint4) == 0) {
+    LAUNCH_TEMPLATED(uint4);


Besides, could we add some 8 bytes or 1 bytes copy?

After discuss offline, there is no need to add 8 bytes or 1 bytes.

Add scalar type fallback hierarchy for block KV transfer kernel: head_bytes % 16 == 0 -> uint4 (16B, fastest) head_bytes % 4 == 0 -> uint32_t (4B) head_bytes % 2 == 0 -> uint16_t (2B) This fixes the runtime error for MLA models where head_size=132 (uint8), giving head_bytes=132 which is not divisible by 16 but is divisible by 4. Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>

chunxiaozheng

LGTM!

princepride · 2026-04-08T05:41:29Z

So can I use mp LMCache when I deploy GLM-5 now?

maobaolong · 2026-04-08T06:02:45Z

@princepride yes.

princepride · 2026-04-08T06:14:43Z

@princepride yes.

Thank you! Have you seen I joined the latest 月球大叔 live streaming? I still remember about 1 year ago, we left comments under his channel😊, It's not very long ago. BTW, I want add your wechat.

princepride · 2026-04-08T06:30:32Z

@princepride yes.

I left comments on your slack

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

Align with the rename introduced in LMCache#2926 where hidden_dim_size was changed to hidden_dim_sizes (List[int]) to support kv_groups. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update test fixture and assertion in test_describe.py to match the hidden_dim_size -> hidden_dim_sizes rename from LMCache#2926. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Align with the rename introduced in LMCache#2926 where hidden_dim_size was changed to hidden_dim_sizes (List[int]) to support kv_groups. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

Update test fixture and assertion in test_describe.py to match the hidden_dim_size -> hidden_dim_sizes rename from LMCache#2926. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

* fix typo bug Signed-off-by: princepride <wangzhipeng628@gmail.com> * fix: rename hidden_dim_size to hidden_dim_sizes in describe and server Align with the rename introduced in #2926 where hidden_dim_size was changed to hidden_dim_sizes (List[int]) to support kv_groups. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> * fix: update test fixture to use hidden_dim_sizes key Update test fixture and assertion in test_describe.py to match the hidden_dim_size -> hidden_dim_sizes rename from #2926. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> --------- Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…MCache#2926) * multiprocess: support per-group KV cache transfer with group_idx - gpu_ops: add group_idx param to lmcache_memcpy_async_h2d/d2h, use memory_obj.get_tensor(group_idx) instead of memory_obj.tensor - kv_layer_groups: add build_kv_layer_groups_from_list() to group layers by (shape, dtype) from a plain tensor list - gpu_context: introduce per-group shape_descs_, hidden_dim_sizes_, group_kv_pointers_, and tmp_gpu_buffers_; update get_kv_buffer_shape, get_tmp_gpu_buffer, get_tmp_gpu_buffer_batched to accept group_idx; add get_shape_desc(group_idx) and get_group_kv_pointers(group_idx) - server: update get_layout_desc, _store_loop, _retrieve_loop to iterate over all groups; fix skip_tokens_in_chunk upper bound to use batch_len instead of _BATCH_SIZE Signed-off-by: liuyumoye <adeline_ly2023@outlook.com> * fix: support vectorized KV transfer for non-16B-aligned head sizes Add scalar type fallback hierarchy for block KV transfer kernel: head_bytes % 16 == 0 -> uint4 (16B, fastest) head_bytes % 4 == 0 -> uint32_t (4B) head_bytes % 2 == 0 -> uint16_t (2B) This fixes the runtime error for MLA models where head_size=132 (uint8), giving head_bytes=132 which is not divisible by 16 but is divisible by 4. Signed-off-by: liuyumoye <adeline_ly2023@outlook.com> --------- Signed-off-by: liuyumoye <adeline_ly2023@outlook.com> Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>

* fix typo bug Signed-off-by: princepride <wangzhipeng628@gmail.com> * fix: rename hidden_dim_size to hidden_dim_sizes in describe and server Align with the rename introduced in LMCache#2926 where hidden_dim_size was changed to hidden_dim_sizes (List[int]) to support kv_groups. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> * fix: update test fixture to use hidden_dim_sizes key Update test fixture and assertion in test_describe.py to match the hidden_dim_size -> hidden_dim_sizes rename from LMCache#2926. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> --------- Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Apr 1, 2026

View reviewed changes

cursor Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/server.py

Comment thread lmcache/v1/kv_layer_groups.py Outdated

liuyumoye force-pushed the lmcache_support_dsa branch from e5ea198 to e53a747 Compare April 1, 2026 06:53

liuyumoye changed the title ~~feat: support different kv cache shape and dtype across layers~~ [MP]feat: support different kv cache shape and dtype across layers Apr 1, 2026

liuyumoye force-pushed the lmcache_support_dsa branch from e53a747 to 634f596 Compare April 2, 2026 07:02

cursor Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/server.py Outdated

liuyumoye force-pushed the lmcache_support_dsa branch 2 times, most recently from 75af69c to 14a720d Compare April 2, 2026 14:05

cursor Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread lmcache/v1/gpu_connector/gpu_ops.py Outdated

Comment thread lmcache/v1/kv_layer_groups.py

ApostaC requested changes Apr 2, 2026

View reviewed changes

liuyumoye force-pushed the lmcache_support_dsa branch from 14a720d to ecf01b6 Compare April 3, 2026 14:46

cursor Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/gpu_context.py

Comment thread lmcache/v1/multiprocess/gpu_context.py Outdated

Comment thread lmcache/v1/multiprocess/server.py Outdated

liuyumoye force-pushed the lmcache_support_dsa branch 3 times, most recently from d4ba284 to c363473 Compare April 7, 2026 02:52

liuyumoye requested review from YaoJiayi, deng451e, hickeyma and sammshen as code owners April 7, 2026 02:52

cursor Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/gpu_context.py

liuyumoye force-pushed the lmcache_support_dsa branch from c363473 to 2b762e3 Compare April 7, 2026 02:59

liuyumoye requested a review from ApostaC April 7, 2026 02:59

cursor Bot reviewed Apr 7, 2026

View reviewed changes

liuyumoye force-pushed the lmcache_support_dsa branch from 2b762e3 to b108af3 Compare April 7, 2026 03:18

ApostaC approved these changes Apr 7, 2026

View reviewed changes

liuyumoye force-pushed the lmcache_support_dsa branch from b108af3 to 91c4630 Compare April 7, 2026 08:53

liuyumoye force-pushed the lmcache_support_dsa branch from 91c4630 to cbf4d52 Compare April 7, 2026 08:59

cursor Bot reviewed Apr 7, 2026

View reviewed changes

maobaolong approved these changes Apr 7, 2026

View reviewed changes

chunxiaozheng reviewed Apr 7, 2026

View reviewed changes

liuyumoye force-pushed the lmcache_support_dsa branch from cbf4d52 to 89ece3c Compare April 7, 2026 11:28

chunxiaozheng approved these changes Apr 7, 2026

View reviewed changes

maobaolong enabled auto-merge (squash) April 7, 2026 12:54

github-actions Bot added the full Run comprehensive tests on this PR label Apr 7, 2026

maobaolong merged commit 28c33b9 into LMCache:dev Apr 7, 2026
58 of 60 checks passed

deng451e mentioned this pull request Apr 8, 2026

[BugFix]: Fix typo bug #2980

Merged

maobaolong added a commit to maobaolong/LMCache that referenced this pull request Apr 9, 2026

Update hidden_dim_size to hidden_dim_sizes after LMCache#2926

106a632

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

		import array
		return torch.frombuffer(array.array("l", pointers), dtype=torch.long)

	import array
	return torch.frombuffer(array.array("l", pointers), dtype=torch.long)
	return torch.tensor([kv_caches[i].data_ptr() for i in group.layer_indices], dtype=torch.long)

Conversation

liuyumoye commented Apr 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maobaolong commented Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Apr 7, 2026

Choose a reason for hiding this comment

Unused attributes stored but never read

Uh oh!

liuyumoye commented Apr 7, 2026

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liuyumoye commented Apr 7, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 7, 2026

Choose a reason for hiding this comment

Private member access across class boundary in enforced directory

Uh oh!

maobaolong left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

liuyumoye commented Apr 1, 2026 •

edited by cursor Bot

Loading