Refactor KV cache shape/dtype extraction for robustness#2537
Refactor KV cache shape/dtype extraction for robustness#2537deng451e merged 16 commits intoLMCache:devfrom
Conversation
- Support `kv_cache` as a tuple or list of tensors (e.g., HPU K/V caches). - Skip None tensors when extracting shape and dtype. - Preserve original behavior for single tensor inputs. - Improves robustness and maintainability without changing existing logic. Signed-off-by: Tony Lin <tony.lin@intel.com>
Summary of ChangesHello @hlin99, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the KV cache shape and dtype extraction mechanism to enhance its robustness and flexibility. By accommodating Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the KV cache shape and dtype extraction to be more robust by supporting kv_cache as a tuple or list of tensors and skipping None values. The change is well-contained and improves maintainability. I've added one comment with a suggestion to simplify the implementation further and to update the related type hint for better code clarity.
Signed-off-by: Tony Lin <tony.lin@intel.com>
|
Hi @maobaolong would you mind to take a look at this one? Thanks |
maobaolong
left a comment
There was a problem hiding this comment.
@hlin99 Thanks for introduce the TPU case and supported to the existing kv_layer_group. It would be very helpful if you can add a comment to explain the TPU use case and how the layer tensors shape like for TPU. I guess there are list of same shape tensors for each layer for TPU. Feel free to correct me if I misunderstood.
thx @maobaolong for the comments. i will update the PR soon |
Signed-off-by: Tony Lin <tony.lin@intel.com>
hi @maobaolong added comments for possible kv_cache types on non cuda alike devices. would you take a look again? thanks for your time. |
Signed-off-by: Tony Lin <tony.lin@intel.com>
* Refactor KV cache shape/dtype extraction for robustness - Support `kv_cache` as a tuple or list of tensors (e.g., HPU K/V caches). - Skip None tensors when extracting shape and dtype. - Preserve original behavior for single tensor inputs. - Improves robustness and maintainability without changing existing logic. Signed-off-by: Tony Lin <tony.lin@intel.com> * changes to address gemini's comments Signed-off-by: Tony Lin <tony.lin@intel.com> * add comments for possible kv_cache types on non cuda alike devices Signed-off-by: Tony Lin <tony.lin@intel.com> * streamline the logic and clear comments Signed-off-by: Tony Lin <tony.lin@intel.com> --------- Signed-off-by: Tony Lin <tony.lin@intel.com>
* Refactor KV cache shape/dtype extraction for robustness - Support `kv_cache` as a tuple or list of tensors (e.g., HPU K/V caches). - Skip None tensors when extracting shape and dtype. - Preserve original behavior for single tensor inputs. - Improves robustness and maintainability without changing existing logic. Signed-off-by: Tony Lin <tony.lin@intel.com> * changes to address gemini's comments Signed-off-by: Tony Lin <tony.lin@intel.com> * add comments for possible kv_cache types on non cuda alike devices Signed-off-by: Tony Lin <tony.lin@intel.com> * streamline the logic and clear comments Signed-off-by: Tony Lin <tony.lin@intel.com> --------- Signed-off-by: Tony Lin <tony.lin@intel.com> Signed-off-by: Aaron Wu <aaron.wu@dell.com>
* Refactor KV cache shape/dtype extraction for robustness - Support `kv_cache` as a tuple or list of tensors (e.g., HPU K/V caches). - Skip None tensors when extracting shape and dtype. - Preserve original behavior for single tensor inputs. - Improves robustness and maintainability without changing existing logic. Signed-off-by: Tony Lin <tony.lin@intel.com> * changes to address gemini's comments Signed-off-by: Tony Lin <tony.lin@intel.com> * add comments for possible kv_cache types on non cuda alike devices Signed-off-by: Tony Lin <tony.lin@intel.com> * streamline the logic and clear comments Signed-off-by: Tony Lin <tony.lin@intel.com> --------- Signed-off-by: Tony Lin <tony.lin@intel.com>
* Refactor KV cache shape/dtype extraction for robustness - Support `kv_cache` as a tuple or list of tensors (e.g., HPU K/V caches). - Skip None tensors when extracting shape and dtype. - Preserve original behavior for single tensor inputs. - Improves robustness and maintainability without changing existing logic. Signed-off-by: Tony Lin <tony.lin@intel.com> * changes to address gemini's comments Signed-off-by: Tony Lin <tony.lin@intel.com> * add comments for possible kv_cache types on non cuda alike devices Signed-off-by: Tony Lin <tony.lin@intel.com> * streamline the logic and clear comments Signed-off-by: Tony Lin <tony.lin@intel.com> --------- Signed-off-by: Tony Lin <tony.lin@intel.com>
* Refactor KV cache shape/dtype extraction for robustness - Support `kv_cache` as a tuple or list of tensors (e.g., HPU K/V caches). - Skip None tensors when extracting shape and dtype. - Preserve original behavior for single tensor inputs. - Improves robustness and maintainability without changing existing logic. Signed-off-by: Tony Lin <tony.lin@intel.com> * changes to address gemini's comments Signed-off-by: Tony Lin <tony.lin@intel.com> * add comments for possible kv_cache types on non cuda alike devices Signed-off-by: Tony Lin <tony.lin@intel.com> * streamline the logic and clear comments Signed-off-by: Tony Lin <tony.lin@intel.com> --------- Signed-off-by: Tony Lin <tony.lin@intel.com>
* Refactor KV cache shape/dtype extraction for robustness - Support `kv_cache` as a tuple or list of tensors (e.g., HPU K/V caches). - Skip None tensors when extracting shape and dtype. - Preserve original behavior for single tensor inputs. - Improves robustness and maintainability without changing existing logic. Signed-off-by: Tony Lin <tony.lin@intel.com> * changes to address gemini's comments Signed-off-by: Tony Lin <tony.lin@intel.com> * add comments for possible kv_cache types on non cuda alike devices Signed-off-by: Tony Lin <tony.lin@intel.com> * streamline the logic and clear comments Signed-off-by: Tony Lin <tony.lin@intel.com> --------- Signed-off-by: Tony Lin <tony.lin@intel.com>
* Refactor KV cache shape/dtype extraction for robustness - Support `kv_cache` as a tuple or list of tensors (e.g., HPU K/V caches). - Skip None tensors when extracting shape and dtype. - Preserve original behavior for single tensor inputs. - Improves robustness and maintainability without changing existing logic. Signed-off-by: Tony Lin <tony.lin@intel.com> * changes to address gemini's comments Signed-off-by: Tony Lin <tony.lin@intel.com> * add comments for possible kv_cache types on non cuda alike devices Signed-off-by: Tony Lin <tony.lin@intel.com> * streamline the logic and clear comments Signed-off-by: Tony Lin <tony.lin@intel.com> --------- Signed-off-by: Tony Lin <tony.lin@intel.com>
What this PR does / why we need it:
kv_cacheas a tuple or list of tensors (e.g., HPU K/V caches).