Skip to content

[examples] Add dynamic context parallel example#3892

Merged
cuichenx merged 3 commits into
NVIDIA-NeMo:mainfrom
ilml:codex/dynamic-cp-example
May 28, 2026
Merged

[examples] Add dynamic context parallel example#3892
cuichenx merged 3 commits into
NVIDIA-NeMo:mainfrom
ilml:codex/dynamic-cp-example

Conversation

@ilml

@ilml ilml commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a minimal long-context Dynamic CP packing demo; it does not train a model or launch distributed workers.
  • Use Megatron-Core dev DefaultDynamicCPScheduler to schedule toy variable-length samples, then print the packed THD metadata per DPxCP rank: tokens.shape, cu_seqlens, max_seqlen, and local_cp_size.
  • Add examples/training_features/long_context/README.md explaining how to run the demo, how to read the output, and how the printed metadata maps to the real DCP forward-step path.
  • Show the Bridge config knobs users set in a real run: dynamic_context_parallel=True, sequence_packing_scheduler="default_dynamic_cp", max_seqlen_per_dp_cp_rank, min_dynamic_context_parallel_size, and micro_batch_size=1.
  • Pass Dynamic CP initialization kwargs through Bridge distributed setup when the installed MCore supports them, so dev Dynamic CP groups are created correctly after the MCore bump.

Testing

  • python -m py_compile examples/training_features/long_context/dynamic_context_parallel.py
  • git diff --check
  • tmux window 1, Docker container 7366a763896a after ./scripts/switch_mcore.sh dev + uv sync: uv run python examples/training_features/long_context/dynamic_context_parallel.py (prints per-sample gpus_needed and scheduled packed microbatches with local_cp_size)
  • tmux window 1: uv run ruff check examples/training_features/long_context/dynamic_context_parallel.py src/megatron/bridge/training/initialize.py
  • tmux window 1: uv run ruff format --check examples/training_features/long_context/dynamic_context_parallel.py src/megatron/bridge/training/initialize.py
  • tmux window 1: uv run pre-commit run --all-files

@copy-pr-bot

copy-pr-bot Bot commented May 19, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ilml ilml marked this pull request as draft May 19, 2026 21:13
@ilml ilml force-pushed the codex/dynamic-cp-example branch 5 times, most recently from ec5a517 to 529ce57 Compare May 27, 2026 17:39
Signed-off-by: ilml <tolong@nvidia.com>
@ilml ilml force-pushed the codex/dynamic-cp-example branch from 529ce57 to c2df21f Compare May 27, 2026 21:34
@ilml ilml marked this pull request as ready for review May 27, 2026 22:33
yaoyu-33
yaoyu-33 previously approved these changes May 27, 2026
@yaoyu-33 yaoyu-33 added area:training Training loop, callbacks, and runtime integration community-request feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels May 27, 2026
Signed-off-by: ilml <tolong@nvidia.com>
@cuichenx

Copy link
Copy Markdown
Contributor

/ok to test 51bb3c5

@cuichenx cuichenx merged commit d1067ce into NVIDIA-NeMo:main May 28, 2026
98 of 99 checks passed
vasunvidia pushed a commit to vasunvidia/Megatron-Bridge that referenced this pull request Jun 10, 2026
Signed-off-by: ilml <tolong@nvidia.com>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:training Training loop, callbacks, and runtime integration feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants