docs(gdn): document -1 padding index semantics for pool+indices path#3019
Conversation
Clarify per-backend behavior for negative indices in gated_delta_rule_decode_pretranspose: - bf16/MTP path: redirects -1 to slot 0 (null buffer), output undefined - float32 legacy path: skips -1 entries entirely, output written as zero Also note that the bf16 path requires slot 0 to be reserved as a sacrificial null buffer (pool_size = num_real_slots + 1). AI-assisted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR updates documentation for Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the documentation for the gated_delta_rule_decode_pretranspose function to clarify how padding indices are handled across different backends. Specifically, it adds details on the sacrificial null buffer used in the bf16 fast path and the zeroing behavior in the float32 legacy path. Feedback was provided to refine the description of the legacy path to ensure technical accuracy regarding how the output slot is modified.
| neither the state pool nor the output are touched for that batch entry; | ||
| the output slot is written as **zero**. |
There was a problem hiding this comment.
The phrase "neither the state pool nor the output are touched" is slightly misleading because the output slot is indeed written to (it is explicitly zeroed by the kernel). It would be clearer to state that the state pool is not modified and the output is zeroed.
| neither the state pool nor the output are touched for that batch entry; | |
| the output slot is written as **zero**. | |
| the state pool is not modified for that batch entry, and the output slot | |
| is written as **zero**. |
|
vLLM uses |
…lashinfer-ai#3019) Document the `-1` padding index semantics for `gated_delta_rule_decode_pretranspose`'s pool+indices path which is a subtle but important behavioral difference between the bf16 fast path (redirects to a sacrificial slot 0 null buffer, output undefined) and the float32 legacy path (skips entirely, output zeroed) that framework integrators must be aware of when handling inactive sequences. cc. @kahyunnam @hlu1 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Clarified documentation regarding edge case handling and behavior across different computational backends, including how padding sequences are processed and what output to expect in specific scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Document the
-1padding index semantics forgated_delta_rule_decode_pretranspose's pool+indices path which is a subtle but important behavioral difference between the bf16 fast path (redirects to a sacrificial slot 0 null buffer, output undefined) and the float32 legacy path (skips entirely, output zeroed) that framework integrators must be aware of when handling inactive sequences.cc. @kahyunnam @hlu1
Summary by CodeRabbit