Skip to content

Fix EP post merge#43730

Merged
ArthurZucker merged 35 commits into
mainfrom
fix-ep
Feb 5, 2026
Merged

Fix EP post merge#43730
ArthurZucker merged 35 commits into
mainfrom
fix-ep

Conversation

@ArthurZucker

@ArthurZucker ArthurZucker commented Feb 4, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Fix GPT-OSS ???

EP sharding works, the forward is broken for eager (does not work) and output giberissh for grouped_mm.
Will work on a fix to have in the patch!

Fix any TP model

  • fix get shard tensor
  • fix get packed shard weights
  • fix tensor_idx

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker changed the title up Fix EP post merge Feb 4, 2026
shape[dim] = end - start
return tuple(shape)

class AllReduce(TensorParallelLayer):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should rename it else it becomes hard to understand given we have

def all_reduce_forward(x, device_mesh):
    """All-reduce forward, identity backward. Use after rowwise layers."""
    return _AllReduceForward.apply(x, device_mesh)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will use moe_tp_experts instead (handle backwards)

Comment thread src/transformers/models/solar_open/modular_solar_open.py Outdated
Comment thread src/transformers/models/minimax_m2/modular_minimax_m2.py Outdated
Comment thread src/transformers/quantizers/quantizer_fbgemm_fp8.py Outdated
Comment thread src/transformers/core_model_loading.py Outdated
@github-actions

github-actions Bot commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm4_moe_lite, gpt_oss, minimax_m2, solar_open, fbgemm_fp8

@3outeille

Copy link
Copy Markdown
Member

lgtm nice job !

@ArthurZucker ArthurZucker merged commit e1f3766 into main Feb 5, 2026
24 of 26 checks passed
@ArthurZucker ArthurZucker deleted the fix-ep branch February 5, 2026 15:23
ArthurZucker added a commit that referenced this pull request Feb 5, 2026
* restore

* add all reduce for ep

* fix init and bias sharding

* fix finalize weight init

* add full stacktracing

* fix

* okay big improvement here

* the only case shard index should be used is when we are acctually collecting for mergeModuleList

* more fixes

* fix EP forward gpt oss

* revert some shit

* when you are stupid sometimes you really need a brain :) :) :) :)

* fix TP

* Ok GPT oss is fixed now

* try to fix perms

* attempt to fix

* am I a doomer and AI is not that bad?

* fix

* it "passes" but the output is shit

* style my man

* outputs are gonna be giberish but at least the forward pass "works"

* dtyle

* fix mixtral

* okay shape fixes

* tensor idx is only for groupped gemm / EP

* fix gate_up shard

* fix :)

* revert some EP changes that are breaking other stuff

* style

* use moe_tp_experts

* revert unrelated, last nits and style

* good?

* fix modlar

---------

Co-authored-by: 3outeille <ferdinand.mom@epita.fr>
jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026
* restore

* add all reduce for ep

* fix init and bias sharding

* fix finalize weight init

* add full stacktracing

* fix

* okay big improvement here

* the only case shard index should be used is when we are acctually collecting for mergeModuleList

* more fixes

* fix EP forward gpt oss

* revert some shit

* when you are stupid sometimes you really need a brain :) :) :) :)

* fix TP

* Ok GPT oss is fixed now

* try to fix perms

* attempt to fix

* am I a doomer and AI is not that bad?

* fix

* it "passes" but the output is shit

* style my man

* outputs are gonna be giberish but at least the forward pass "works"

* dtyle

* fix mixtral

* okay shape fixes

* tensor idx is only for groupped gemm / EP

* fix gate_up shard

* fix :)

* revert some EP changes that are breaking other stuff

* style

* use moe_tp_experts

* revert unrelated, last nits and style

* good?

* fix modlar

---------

Co-authored-by: 3outeille <ferdinand.mom@epita.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants