Resolve invalid argument index error for SDPA backend execution by byungilm · Pull Request #35021 · openvinotoolkit/openvino

byungilm · 2026-03-29T13:14:00Z

Details:

Description : benchmark failed in execution when it enabled KV-cache SDPA backend on dGPU (systolic array)
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:246: Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_stream.cpp:277: [GPU] [CL_EXT] setArgUsm in KernelIntel failed, error code: -49 CL_INVALID_ARG_INDEX

The code and line that caused this issue

StorageType is selected by supports_immad at kv_cache_compression.cpp:145~148
StarageType::Planar needs separate scale and zp variable
sdpa_opt kernel is missing zp input param

Reproduction step and snapshot

Can be reproduced by benchmark.py
python benchmark.py -m qwen2.5-7b-instruct/pytorch/ov/OV_FP16-INT8_ASYM -d GPU.1 -n 0 --genai -mc 1 -pf repo-prompts/32_1024/qwen2.5-7b-instruct.jsonl -lc enable_sdpa_cache_u8_by-channel.json
cat enable_sdpa_cache_u8_by-channel.json
{ "ATTENTION_BACKEND": "SDPA", "KV_CACHE_PRECISION": "u8", "KEY_CACHE_QUANT_MODE": "BY_CHANNEL" }

Tickets:

CVS-183903

AI Assistance:

AI assistance used: yes
If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).
Generated unit-tests for this fix

Signed-off-by: Min, Byungil <byungil.min@intel.com>

isanghao · 2026-03-31T03:45:12Z

src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa_opt.cl

+#if HAS_KV_CACHE_ZP_INPUT
+            VALUE_COMPRESSION_SCALE_TYPE comp_zp = val_zp[comp_offset];
+#else
            VALUE_COMPRESSION_SCALE_TYPE comp_zp = val_scale[comp_offset + 1];


what about introducing macro like this? I guess this will be easy to read.

#if HAS_KV_CACHE_ZP_INPUT #define GET_ZP(zp, scale, comp_offset) ((zp)[(comp_offset)]) #else #define GET_ZP(zp, scale, comp_offset) ((scale)[(comp_offset) + 1]) #endif

looks good. also added GET_SCALE together.

Copilot

Pull request overview

This PR fixes an “invalid argument index” failure when executing the GPU SDPA optimized OpenCL kernel (sdpa_opt) with KV-cache compression in Planar output storage mode (used when supports_immad=true), where zero-points (ZP) are provided as separate buffers rather than interleaved with scales.

Changes:

Update sdpa_opt.cl kernel signatures and dequantization logic to accept optional key_zp / val_zp inputs when asymmetric quantization is used with non-interleaved (Planar) scale/ZP storage.
Add a unit transformation test that validates the KV-cache compression rewrite for Planar storage, including separate ZP buffers passed into IndirectSDPA.
Extend functional KV-cache+SDPA dynamic tests to include compressed beam-search cases with batch > 1 to exercise the indirect sdpa_opt path.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`src/plugins/intel_gpu/tests/unit/transformations/kv_cache_compression.cpp`	Adds a Planar-mode KV-cache compression transformation test that wires separate scale and ZP buffers into `IndirectSDPA`.
`src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/kv_cache_sdpa.cpp`	Adds compressed beam-search parameter sets (`batch=2`) to cover the indirect optimized SDPA path.
`src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa_opt.cl`	Adds conditional kernel args for separate key/value ZP buffers and uses them in asymmetric dequantization when scales and ZP are not combined.

Signed-off-by: Min, Byungil <byungil.min@intel.com>

Resolve invalid argument index error for SDPA backend execution

91f1cc4

Signed-off-by: Min, Byungil <byungil.min@intel.com>

byungilm requested review from a team as code owners March 29, 2026 13:14

github-actions bot added the category: GPU OpenVINO GPU plugin label Mar 29, 2026

Add an unit-test

41be0a3

Signed-off-by: Min, Byungil <byungil.min@intel.com>

byungilm force-pushed the fix_sdpa_invalid_arg_index branch from 872430c to 41be0a3 Compare March 29, 2026 15:21

Bugfix to resolve accuracy issue

20078df

Signed-off-by: Min, Byungil <byungil.min@intel.com>

isanghao reviewed Mar 31, 2026

View reviewed changes

isanghao requested a review from Copilot March 31, 2026 03:45

Copilot started reviewing on behalf of isanghao March 31, 2026 03:46 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Apply comments to add macro

66742bb

Signed-off-by: Min, Byungil <byungil.min@intel.com>

isanghao approved these changes Apr 1, 2026

View reviewed changes

isanghao enabled auto-merge April 1, 2026 11:08

isanghao added this pull request to the merge queue Apr 1, 2026

Merged via the queue into openvinotoolkit:master with commit b5b4f08 Apr 1, 2026
187 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve invalid argument index error for SDPA backend execution#35021

Resolve invalid argument index error for SDPA backend execution#35021
isanghao merged 4 commits intoopenvinotoolkit:masterfrom
byungilm:fix_sdpa_invalid_arg_index

byungilm commented Mar 29, 2026 •

edited

Loading

Uh oh!

isanghao Mar 31, 2026

Uh oh!

byungilm Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

byungilm commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

The code and line that caused this issue

Reproduction step and snapshot

Tickets:

AI Assistance:

Uh oh!

isanghao Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

byungilm Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

byungilm commented Mar 29, 2026 •

edited

Loading