Skip to content

[GPU] Fix rank-changing reorder fusion for depth_to_space#35099

Merged
isanghao merged 1 commit intoopenvinotoolkit:masterfrom
ahnyoung-paul:fix_depth_to_space_issue
Apr 2, 2026
Merged

[GPU] Fix rank-changing reorder fusion for depth_to_space#35099
isanghao merged 1 commit intoopenvinotoolkit:masterfrom
ahnyoung-paul:fix_depth_to_space_issue

Conversation

@ahnyoung-paul
Copy link
Copy Markdown
Contributor

@ahnyoung-paul ahnyoung-paul commented Apr 1, 2026

Description of the issue:

Issues

Symptom:

clBuildProgram kernel compilation failure when running super-resolution model (pnat-v1-fp16-576x672-2x-ox.onnx) on GPU. The failing node is depthtospace:/gnet/DepthToSpace with depth_to_space_ref implementation.

compile_graph: depthtospace:/gnet/DepthToSpace
  Type: depth_to_space
  Input layout 0: f16:b_fs_yx_fsv16:8x12x576x672 (4D)
  Input layout 1: f16:bfyx:8x3x1152x1344:nopad (4D)
  Output layout 0: f16:bfzyx:8x3x1x1152x1344 (5D)   ← problem here
  Selected impl: depth_to_space_ref
error: too many arguments provided to function-like macro invocation
  INPUT0_GET_INDEX(batch, input_feature, input_z, input_y, input_x)  ← 5 args
note: macro 'INPUT0_GET_INDEX' defined here
  #define INPUT0_GET_INDEX(b, f, y, x)  ← only accepts 4 args

Root causes

depth_to_space is rank-preserving by op semantics — input and output must have the same number of dimensions. However, can_fuse_reorder() and can_fuse_reorder_to_prev() in layout_optimizer.cpp unconditionally returned true for depth_to_space, allowing a rank-changing reorder (bfyx → bfzyx) to be fused in.

The fusion chain:

  1. reorder_inputs pass inserts reorder_42 (bfyx → bfzyx) after depth_to_space for downstream Reshape_2
  2. remove_redundant_reorders pass calls can_fuse_reorder_to_prev() which returns true unconditionally
  3. reorder_42 is fused into depth_to_space, changing its output from 4D (bfyx) to 5D (bfzyx)
  4. Kernel compilation: INPUT0 is 4D (b_fs_yx_fsv16) but OUTPUT is now 5D (bfzyx) → INPUT0_GET_INDEX macro argument count mismatch → clBuildProgram crash

How to fix it

Add a rank equality guard in two functions of layout_optimizer.cpp — separate depth_to_space from the blanket "return true" group and allow reorder fusion only when source and target ranks match:

Same-rank reorder fusion (e.g., bfyx → b_fs_yx_fsv16) remains allowed. The rank-changing reorder stays as a separate optimized node with memory sharing (zero-copy), so there is no performance penalty.

After fix rank-changing reorder fusion
image Reorder and Reshape nodes behind of depth_to_space are optimized out.

Reproduced steps

benchmark_app -d GPU -m pnat-v1-fp16-576x672-2x-ox.onnx -nireq 1 -nstreams 1 -hint none

Tickets:

AI Assistance:

  • AI assistance used: yes
  • AI was used for debug log analysis (identifying the reorder fusion chain through 130K-line GPU debug logs), OpenVINO op spec audit (confirming depth_to_space is rank-preserving), and drafting test cases. All code changes were reviewed, the fix was validated by running the model end-to-end (121 iterations, 2.00 FPS), and unit tests were manually verified.

@ahnyoung-paul ahnyoung-paul requested review from a team as code owners April 1, 2026 00:44
@ahnyoung-paul ahnyoung-paul added the category: GPU OpenVINO GPU plugin label Apr 1, 2026
Copy link
Copy Markdown
Contributor

@wilson-seok wilson-seok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@isanghao isanghao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@isanghao isanghao enabled auto-merge April 2, 2026 05:17
@ahnyoung-paul ahnyoung-paul force-pushed the fix_depth_to_space_issue branch from 6a13e4d to 7976ad4 Compare April 2, 2026 05:58
@isanghao isanghao added this pull request to the merge queue Apr 2, 2026
Merged via the queue into openvinotoolkit:master with commit 0914d74 Apr 2, 2026
188 checks passed
@isanghao isanghao deleted the fix_depth_to_space_issue branch April 2, 2026 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants