Skip to content

🐛 CI failure: 'functools.partial' has no attribute 'submodules' in test_paged_stashing #4935

@ko3n1g

Description

@ko3n1g

Describe the bug

CI tests in tests/unit_tests/transformer/moe/test_paged_stashing.py are failing with an AttributeError because the test treats a functools.partial as if it were a ModuleSpec:

tests/unit_tests/transformer/moe/test_paged_stashing.py:139: AttributeError
E   AttributeError: 'functools.partial' object has no attribute 'submodules'

Failing nodes:

  • TestPagedStashing::test_forward_backward_4_layers
  • TestPagedStashingOverBudget::test_overload_factor_and_over_budget

Tag @NVIDIA/mcore-oncall to get oncall's attention to this issue.

Root cause (likely)

At tests/unit_tests/transformer/moe/test_paged_stashing.py:139 the test does:

transformer_layer_spec = get_gpt_layer_with_transformer_engine_spec(
    num_experts=self.config.num_moe_experts, moe_grouped_gemm=True
)
...
MoELayer(self.config, transformer_layer_spec.submodules.mlp.submodules)

In the MoE branch, transformer_layer_spec.submodules.mlp is set to partial(MoELayer, submodules=MoESubmodules(...)) (see megatron/core/models/gpt/moe_module_specs.py:66, introduced by #3435 on 2026-05-10). A functools.partial does not expose .submodules, so the attribute access at line 139 crashes.

Commit history:

  • 5e31514165 (#3435, 2026-05-10, @nick-schank) switched MoE spec to return a partial.
  • f007db77b9 (#4247, 2026-05-22, @nanz-nv) added the test that assumes the old ModuleSpec shape.

Failing run

Field Value
PR #4931: test: enable NVTE_CUTEDSL_FUSED_GROUPED_MLP via pytest fixture (surfaced here; the PR itself does not touch paged stashing)
Run 26290097610
Job tests/unit_tests/transformer/moe/**/*.py - latest

Error (verbatim, abridged)

tests/unit_tests/transformer/moe/test_paged_stashing.py:139: AttributeError
E   AttributeError: 'functools.partial' object has no attribute 'submodules'

FAILED tests/unit_tests/transformer/moe/test_paged_stashing.py::TestPagedStashing::test_forward_backward_4_layers
FAILED tests/unit_tests/transformer/moe/test_paged_stashing.py::TestPagedStashingOverBudget::test_overload_factor_and_over_budget

Steps/Code to reproduce bug

Re-run the failing CI job linked above, or locally inside the dev container:

pytest tests/unit_tests/transformer/moe/test_paged_stashing.py

Additional context

Triaged automatically via /create-issue. Fix is to update _create_moe_layer to construct MoESubmodules directly (or to read it via .keywords["submodules"] on the partial) rather than chaining .submodules.mlp.submodules.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions