GDN rejects packed_seq_params in BSHD mode — upstream fixed in Megatron-LM PR #2645, missed by fork sync

## Summary

The `radixark/Megatron-LM` fork (`miles-main`) rejects **any** `packed_seq_params` in `GatedDeltaNet.forward`, but Megatron passes a `PackedSeqParams` object to all attention modules uniformly — even in BSHD mode. This crashes GDN (Gated Delta Net) training/inference for Qwen3.5/3.6 with `qkv_format="bshd"`. **Upstream NVIDIA/Megatron-LM has already fixed this (PR #2645, merged 2026-05-14); our fork missed it because its last upstream sync predates that fix.**

## Symptom

```
NotImplementedError: GDN does not support packed sequence for now.
  File ".../megatron/core/ssm/gated_delta_net.py", line 300, in forward
```

## Root cause

The fork's `megatron/core/ssm/gated_delta_net.py` (at pinned commit `23924a0b`) has a blanket rejection:

```python
if packed_seq_params is not None:
    # TODO: support packed sequence
    raise NotImplementedError("GDN does not support packed sequence for now.")
```

In BSHD mode (`qkv_format="bshd"`), Megatron still passes a `PackedSeqParams` object to every attention module uniformly. GDN doesn't actually need it in BSHD mode, but the blanket `is not None` check raises anyway.

## Upstream already fixed this — it's a sync gap

NVIDIA/Megatron-LM `main` now guards on the THD format specifically and **implements** packed-sequence support for GDN:

```python
if packed_seq_params is not None and packed_seq_params.qkv_format == 'thd':
    ... # actual cu_seqlens handling
```

In BSHD mode it simply proceeds. This landed in:
- **NVIDIA/Megatron-LM PR #2645 "[main] feat(moe): Support packed sequence for gated delta net (GDN)"**, merged **2026-05-14**.

The fork's last upstream sync was commit `038e8e5a` **"Upgrade Megatron from Dec 17 to Feb 13"** (2026-03-04, syncing to upstream's ~Feb-13 state). PR #2645 (May 14) postdates that, so the fork never received it.

## Proposed fix

Re-sync `radixark/Megatron-LM` to an upstream commit that includes PR #2645 (≥ 2026-05-14). This brings in the proper guard (and real THD packed-seq support for GDN).

This is the same incomplete-sync class of bug as the MTP naming mismatch (see #1289). Both would be resolved by a proper re-sync of the fork.

## Affected versions

- megatron-core `0.16.0rc0` @ radixark/Megatron-LM commit `23924a0b`

## References

- Upstream fix: NVIDIA/Megatron-LM PR #2645
- Related: #1289, #1293


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDN rejects packed_seq_params in BSHD mode — upstream fixed in Megatron-LM PR #2645, missed by fork sync #1292

Summary

Symptom

Root cause

Upstream already fixed this — it's a sync gap

Proposed fix

Affected versions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GDN rejects packed_seq_params in BSHD mode — upstream fixed in Megatron-LM PR #2645, missed by fork sync #1292

Description

Summary

Symptom

Root cause

Upstream already fixed this — it's a sync gap

Proposed fix

Affected versions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions