Skip to content

[GPU][PTL] WA: Set max WG size to 512 for reduce_ref kernel in PTL#35072

Merged
isanghao merged 1 commit intoopenvinotoolkit:masterfrom
jade-cho:wa_reduce_ref_wg_issue_in_ptl
Apr 2, 2026
Merged

[GPU][PTL] WA: Set max WG size to 512 for reduce_ref kernel in PTL#35072
isanghao merged 1 commit intoopenvinotoolkit:masterfrom
jade-cho:wa_reduce_ref_wg_issue_in_ptl

Conversation

@jade-cho
Copy link
Copy Markdown
Contributor

@jade-cho jade-cho commented Mar 31, 2026

Description of the issue (symptom, root-cause, how it was resolved)

Reduce kernel CL_INVALID_WORK_GROUP_SIZE on PTL (-54)

  • Symptom: Reduce kernel fails with CL_INVALID_WORK_GROUP_SIZE (-54) on Panther Lake (xe3 integrated GPU)
  • Root-cause: IGC may allocate more than 128 GRF per thread on PTL, which reduces the per-kernel max work-group size below the device-reported max. The runtime dispatches with the original (larger) WGS, causing the OpenCL error.
  • Resolution: Limit maxWorkGroupSize to 512 before computing optimal LWS in reduce_kernel_ref.cpp

The code and line that caused this issue (if it is not changed directly)

  • intel_gpu/src/kernel_selector/kernels/reduce/reduce_kernel_ref.cpp

Checklist

  • Is it a proper fix? (not a workaround)
    • Add WA code for CL_INVALID_WORK_GROUP_SIZE.
    • This workaround will be removed after the IGC driver issue is fixed.
  • Did you include test case for this fix, if necessary?
  • Did you review existing test that can be extended to cover this scenario? Which test did you review?

Tickets:

  • 183530

@jade-cho jade-cho requested review from a team as code owners March 31, 2026 03:17
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Mar 31, 2026
@p-durandin p-durandin added this to the 2026.2 milestone Mar 31, 2026
Copy link
Copy Markdown
Contributor

@Lyamin-Roman Lyamin-Roman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a WA

dispatchData.lws = GetOptimalLocalWorkGroupSizes(dispatchData.gws, engineInfo, in_layout, out_layout, dims_by_gws);
} else {
dispatchData.lws = GetOptimalLocalWorkGroupSizes(dispatchData.gws, params.engineInfo, in_layout, out_layout, dims_by_gws);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about just limiting to 512 unconditionally? as this is a ref kernel, I hope it should be fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a workaround code. Since it will be removed once the driver issue is fixed, I tried to keep the changes as simple as possible. Is it necessary to add a new function or logic here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

try {
_command_queue.enqueueNDRangeKernel(kern, cl::NullRange, global, local, dep_events_ptr, set_output_event ? &ret_ev : nullptr);
} catch (cl::Error const& err) {
if (err.err() == CL_INVALID_WORK_GROUP_SIZE && local != cl::NullRange) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether it is a proper place to handle this failure. We have other error code, but why should it be handled differently?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be removed after fixing the driver issue. If this debugging code doesn’t seem appropriate, I’ll remove it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@jade-cho jade-cho force-pushed the wa_reduce_ref_wg_issue_in_ptl branch from 9f99a0f to 8d24ed8 Compare April 1, 2026 10:10
Copy link
Copy Markdown
Contributor

@isanghao isanghao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@isanghao isanghao added this pull request to the merge queue Apr 2, 2026
Merged via the queue into openvinotoolkit:master with commit bfb47d6 Apr 2, 2026
188 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants