Skip to content

Resolve invalid argument index error for SDPA backend execution#35021

Merged
isanghao merged 4 commits intoopenvinotoolkit:masterfrom
byungilm:fix_sdpa_invalid_arg_index
Apr 1, 2026
Merged

Resolve invalid argument index error for SDPA backend execution#35021
isanghao merged 4 commits intoopenvinotoolkit:masterfrom
byungilm:fix_sdpa_invalid_arg_index

Conversation

@byungilm
Copy link
Copy Markdown
Contributor

@byungilm byungilm commented Mar 29, 2026

Details:

Description : benchmark failed in execution when it enabled KV-cache SDPA backend on dGPU (systolic array)
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:246: Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_stream.cpp:277: [GPU] [CL_EXT] setArgUsm in KernelIntel failed, error code: -49 CL_INVALID_ARG_INDEX

The code and line that caused this issue

  • StorageType is selected by supports_immad at kv_cache_compression.cpp:145~148
  • StarageType::Planar needs separate scale and zp variable
  • sdpa_opt kernel is missing zp input param

Reproduction step and snapshot

  • Can be reproduced by benchmark.py
    python benchmark.py -m qwen2.5-7b-instruct/pytorch/ov/OV_FP16-INT8_ASYM -d GPU.1 -n 0 --genai -mc 1 -pf repo-prompts/32_1024/qwen2.5-7b-instruct.jsonl -lc enable_sdpa_cache_u8_by-channel.json
  • cat enable_sdpa_cache_u8_by-channel.json
    { "ATTENTION_BACKEND": "SDPA", "KV_CACHE_PRECISION": "u8", "KEY_CACHE_QUANT_MODE": "BY_CHANNEL" }

Tickets:

AI Assistance:

  • AI assistance used: yes
  • If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).
    Generated unit-tests for this fix

Signed-off-by: Min, Byungil <byungil.min@intel.com>
@byungilm byungilm requested review from a team as code owners March 29, 2026 13:14
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Mar 29, 2026
Signed-off-by: Min, Byungil <byungil.min@intel.com>
@byungilm byungilm force-pushed the fix_sdpa_invalid_arg_index branch from 872430c to 41be0a3 Compare March 29, 2026 15:21
Signed-off-by: Min, Byungil <byungil.min@intel.com>
#if HAS_KV_CACHE_ZP_INPUT
VALUE_COMPRESSION_SCALE_TYPE comp_zp = val_zp[comp_offset];
#else
VALUE_COMPRESSION_SCALE_TYPE comp_zp = val_scale[comp_offset + 1];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about introducing macro like this? I guess this will be easy to read.

  #if HAS_KV_CACHE_ZP_INPUT
    #define GET_ZP(zp, scale, comp_offset) ((zp)[(comp_offset)])
  #else
    #define GET_ZP(zp, scale, comp_offset) ((scale)[(comp_offset) + 1])
  #endif

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. also added GET_SCALE together.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an “invalid argument index” failure when executing the GPU SDPA optimized OpenCL kernel (sdpa_opt) with KV-cache compression in Planar output storage mode (used when supports_immad=true), where zero-points (ZP) are provided as separate buffers rather than interleaved with scales.

Changes:

  • Update sdpa_opt.cl kernel signatures and dequantization logic to accept optional key_zp / val_zp inputs when asymmetric quantization is used with non-interleaved (Planar) scale/ZP storage.
  • Add a unit transformation test that validates the KV-cache compression rewrite for Planar storage, including separate ZP buffers passed into IndirectSDPA.
  • Extend functional KV-cache+SDPA dynamic tests to include compressed beam-search cases with batch > 1 to exercise the indirect sdpa_opt path.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/plugins/intel_gpu/tests/unit/transformations/kv_cache_compression.cpp Adds a Planar-mode KV-cache compression transformation test that wires separate scale and ZP buffers into IndirectSDPA.
src/plugins/intel_gpu/tests/functional/subgraph_tests/dynamic/kv_cache_sdpa.cpp Adds compressed beam-search parameter sets (batch=2) to cover the indirect optimized SDPA path.
src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa_opt.cl Adds conditional kernel args for separate key/value ZP buffers and uses them in asymmetric dequantization when scales and ZP are not combined.

Signed-off-by: Min, Byungil <byungil.min@intel.com>
@isanghao isanghao enabled auto-merge April 1, 2026 11:08
@isanghao isanghao added this pull request to the merge queue Apr 1, 2026
Merged via the queue into openvinotoolkit:master with commit b5b4f08 Apr 1, 2026
187 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants