Skip to content

[data] fix: Support Energon 7 metadata and stateless cookers#4090

Merged
yaoyu-33 merged 5 commits into
mainfrom
ko3n1g/fix/cook-stateless
May 30, 2026
Merged

[data] fix: Support Energon 7 metadata and stateless cookers#4090
yaoyu-33 merged 5 commits into
mainfrom
ko3n1g/fix/cook-stateless

Conversation

@ko3n1g

@ko3n1g ko3n1g commented May 30, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Bumps MCore to 52d1d681d6000e5da867b9b100c14648e4f85de9 and updates uv.lock, including the move to megatron-energon 7.3.2.
  • Adds Energon metadata helpers that construct sample and batch kwargs from the installed Energon base dataclasses, keeping Energon 6 compatibility guards.
  • Updates Bridge Energon task encoders and focused tests for the Energon 7 metadata contract.
  • Marks diffusion Cooker.cook functions @stateless for the Energon 7 cooker contract.
  • Supersedes [data] fix: Support Energon 7 metadata fields #4089.

Original Bump

  • Bump PR: chore(beep boop 🤖): Bump uv.lock (main, mcore-main) (2026-05-30) #4087
  • Target: main
  • Classification: Fix must ship with the bump
  • Root cause: MCore commit 52d1d681d6000e5da867b9b100c14648e4f85de9 updates the MCore Energon requirement from ~=6.0 to ~=7.0, and the bump lockfile moves megatron-energon from 6.0.1 to 7.3.2. Energon 7 removes the singular sample __subflavor__ field, requires batch __key__ / __restore_key__ metadata, and asserts cooker functions are stateless.
  • Guards: Added local Energon 6 compatibility guards in src/megatron/bridge/data/energon/metadata.py with TODO removal condition: remove when Bridge no longer supports megatron-energon 6.x. Removed stale guards: none.

Validation

  • 2026-05-30, CW interactive partition: uv run pre-commit run --all-files - passed.
  • 2026-05-30, CW interactive partition: uv run python -m pytest tests/unit_tests/data/energon/test_hf_encoder_task_encoder.py tests/unit_tests/data/energon/test_nemotron_omni_task_encoder.py tests/unit_tests/recipes/qwen_vl/data/energon/test_task_encoder.py tests/unit_tests/diffusion/data/common/test_diffusion_sample.py tests/unit_tests/diffusion/data/common/test_diffusion_task_encoder.py tests/unit_tests/diffusion/data/flux/test_flux_taskencoder.py tests/unit_tests/diffusion/data/wan/test_wan_taskencoder.py tests/unit_tests/models/nemotron_omni/test_nemotron_omni_conversion.py -q - passed, 59 passed, 32 warnings in 1.24s.
  • 2026-05-30, original [data] fix: Support Energon 7 metadata and stateless cookers #4090 validation: uvx ruff check / uvx ruff format --check on the three diffusion files - clean.

dimapihtar and others added 3 commits May 30, 2026 06:10
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Energon's get_stateless(cooker.cook) check fires per-sample and aborts the run with 'AssertionError: Cooker must be stateless'. The three diffusion cook functions (wan, flux, common SP base) are pure dict reshapes, so wrap them with @stateless to satisfy the contract.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 30, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@copy-pr-bot

copy-pr-bot Bot commented May 30, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ko3n1g ko3n1g marked this pull request as ready for review May 30, 2026 16:00
keys = [s.__key__ for s in samples]
batch_kwargs: Dict = dict(
**batch_metadata_kwargs(keys=keys),
__keys__=[s.__key__ for s in samples],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: keys is already computed on the line above but __keys__ re-creates the same list. The other two callers (nemotron_omni_task_encoder.py and qwen_vl/task_encoder.py) reuse the variable. Suggest:

Suggested change
__keys__=[s.__key__ for s in samples],
__keys__=keys,

@claude

claude Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

Review — Light

Clean fix: @stateless on the three diffusion cook functions resolves the Energon 7 assertion, and the new metadata.py compatibility layer centralizes the Energon 6/7 field differences nicely.

Observations

  1. hf_encoder_task_encoder.py:348 — duplicate list comprehension (inline comment posted): keys is computed on line 345 but __keys__ re-creates the same list instead of reusing keys. The other two callers (nemotron_omni_task_encoder.py, qwen_vl/task_encoder.py) already use __keys__=keys.

  2. No dedicated unit test for metadata.py — The sample_metadata_kwargs and batch_metadata_kwargs helpers contain conditional logic for Energon 6 vs 7 field presence (__subflavor__, __key__ on Batch, etc.). These code paths are exercised indirectly through existing tests, but a small focused test would make the compat contract explicit and catch regressions if field names change again in a future Energon release.

  3. Functional test uses inline compat logic — tests/functional_tests/test_groups/data/energon/test_hf_encoder_task_encoder.py (lines 40-57) has its own hand-rolled version of sample_metadata_kwargs. Consider using the new helper there too to avoid the two implementations drifting apart.

Suggested test cases

No perf tests impacted.

@ko3n1g

ko3n1g commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 96bd844

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33 yaoyu-33 changed the title fix(data): mark diffusion Cooker.cook functions @stateless [data] fix: Support Energon 7 metadata and stateless cookers May 30, 2026
@yaoyu-33 yaoyu-33 added bug Something isn't working area:data Dataset builders, preprocessing, and samplers needs-review PR is ready for code review and waiting on a reviewer needs-more-tests Requires additional L0 and L1 test coverage before merge full-test-suite labels May 30, 2026
@yaoyu-33

Copy link
Copy Markdown
Contributor

/ok to test 37dac61

@yaoyu-33 yaoyu-33 merged commit e00a65e into main May 30, 2026
166 checks passed
@yaoyu-33 yaoyu-33 deleted the ko3n1g/fix/cook-stateless branch May 30, 2026 19:17
@ko3n1g ko3n1g restored the ko3n1g/fix/cook-stateless branch May 30, 2026 19:18
vasunvidia pushed a commit to vasunvidia/Megatron-Bridge that referenced this pull request Jun 10, 2026
…NeMo#4090)

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: dimapihtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:data Dataset builders, preprocessing, and samplers bug Something isn't working full-test-suite needs-more-tests Requires additional L0 and L1 test coverage before merge needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants