Skip to content

Add named layouts to HyperCommGrid for heterogeneous parallelism#5148

Merged
yashaswikarnati merged 6 commits into
NVIDIA:mainfrom
yashaswikarnati:ykarnati/upstream-hypercommgrid-multilayout
Jun 9, 2026
Merged

Add named layouts to HyperCommGrid for heterogeneous parallelism#5148
yashaswikarnati merged 6 commits into
NVIDIA:mainfrom
yashaswikarnati:ykarnati/upstream-hypercommgrid-multilayout

Conversation

@yashaswikarnati

@yashaswikarnati yashaswikarnati commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

What

Add named views to HyperCommGrid: one grid (a single rank span) can register extra factorizations beyond the implicit base view, then create / retrieve / enumerate process groups against any of them via a keyword-only view=.

  • register_view(name, shape, dim_names, shared_dims=None) — register and validate a named factorization over the same ranks.
  • create_pg / get_pg / get_rank_enum accept view= (defaults to base, so the single-view path is unchanged).
  • Base-view group keys are byte-for-byte unchanged; view-private groups use namespaced keys, and dims listed in shared_dims reuse the base group instead of duplicating it.

Why

Foundational for heterogeneous / non-colocated parallelism, where a dense (tp/cp/dp/pp) and an expert (expt_tp/ep/expt_dp/pp) factorization span the same ranks with different shapes. These are alternate tilings of one rank set — not orthogonal axes, so they can't be a single cube. register_view models each as a separate factorization that must agree on any shared_dims.

How

  • _RankViewSpec holds each factorization; the base view is auto-registered from the constructor args.
  • Rank enumeration is generalized to any view's shape/dim_names via np.moveaxis + reshape (drops the einops dependency).
  • register_view proves each shared_dim enumerates identically to the base view, so shared groups are reused rather than rebuilt.
  • Robustness: destroy() skips groups this rank isn't a member of (NON_GROUP_MEMBER sentinel) and frees each shared group once; the rank-0 log is guarded by is_initialized().

Fully backward compatible — all new params are optional and keyword-only. Covered by tests/unit_tests/test_hyper_comm_grid.py, including a real-distributed 8-GPU view test.

@copy-pr-bot

copy-pr-bot Bot commented Jun 4, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yashaswikarnati yashaswikarnati force-pushed the ykarnati/upstream-hypercommgrid-multilayout branch from 22c7543 to a17e366 Compare June 4, 2026 05:05
@yashaswikarnati yashaswikarnati changed the title Add multi-layout support to HyperCommGrid Add named layouts to HyperCommGrid for heterogeneous parallelism Jun 4, 2026
@yashaswikarnati

Copy link
Copy Markdown
Contributor Author

Reworked per review — thanks for the careful read. Summary of changes:

  • Shared dims are reused, not duplicated. register_layout(..., shared_dims=["pp"]) validates that a shared dimension's rank enumeration matches the base layout's, and a group spanning only shared dims now returns the base grid's group object (same ranks). This honors the invariant that dense and expert pipeline groups must be identical (parallel_state.py decoder_rank_generator.get_ranks("pp") == expert_decoder_rank_generator.get_ranks("pp")). The previous "distinct expert:pp" behavior is gone.
  • No implicit layout inference. Removed base-precedence resolution. Layout-private groups are reachable only through an explicit GridLayout handle (grid.get_layout("expert").get_pg(...)); the base grid's create_pg/get_pg/get_rank_enum are unchanged and operate on the base layout only.
  • Smaller surface / internal keys. Dropped has_layout and the "<layout>:<dims>" string namespacing; layout-private groups are keyed by (layout_name, ordered_dims) tuples. Base-grid behavior is byte-for-byte unchanged for existing callers.
  • Real distributed coverage. Replaced the monkey-patched create_pg tests with a real 8-rank integration test (test_real_distributed_registered_layout) that registers base + expert layouts, creates groups in both, asserts actual rank membership of an expert-private group (with a real all_reduce), asserts the shared pp group is the same object/ranks as base, and that destroy() frees reused groups exactly once. Pure-Python register_layout validation tests are kept.

Validated on 1 node × 8 GPUs (torch.distributed.run --nproc-per-node 8): all green.

Comment thread megatron/core/hyper_comm_grid.py Outdated
Comment thread megatron/core/hyper_comm_grid.py Outdated
Comment thread megatron/core/hyper_comm_grid.py Outdated
Comment thread megatron/core/hyper_comm_grid.py Outdated
Allow a single HyperCommGrid (one rank span) to carry additional named
factorizations beyond its base layout, so dense and expert parallel
factorizations can share the same ranks with different shapes.

- register_layout(name, shape, dim_names, shared_dims=None) returns a
  GridLayout handle; get_layout(name) retrieves it. The handle exposes
  explicit create_pg / get_pg / get_rank_enum against that layout. The
  base grid's own methods are unchanged and operate only on the base
  layout (no implicit cross-layout inference).
- shared_dims declares dimensions that must coincide with the base
  layout (e.g. pipeline parallelism, which must span identical ranks for
  the dense and expert parts). Registration validates that a shared
  dimension's rank enumeration matches the base layout's, and a group
  spanning only shared dims reuses the base grid's group object rather
  than creating a duplicate.
- Layout-private groups are keyed by (layout_name, ordered_dims).

Also add two partial-participation robustness guards: skip the rank-0
log when torch.distributed is not initialized, and in destroy() only
tear down groups this rank is a member of (deduping reused groups by
identity so a shared group is not freed twice).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yashaswikarnati yashaswikarnati force-pushed the ykarnati/upstream-hypercommgrid-multilayout branch from a17e366 to 98f093f Compare June 4, 2026 16:44
Comment thread megatron/core/hyper_comm_grid.py Outdated
Comment thread megatron/core/hyper_comm_grid.py Outdated
Comment thread megatron/core/hyper_comm_grid.py Outdated
Comment thread megatron/core/hyper_comm_grid.py Outdated
Comment thread tests/unit_tests/test_hyper_comm_grid.py Outdated
yashaswikarnati and others added 3 commits June 5, 2026 13:20
- Make `view` keyword-only on create_pg/get_pg/get_rank_enum so a stray
  positional arg cannot silently bind to it.
- Accept numpy integer shape entries in register_view, matching the base
  grid constructor (isinstance check now uses numbers.Integral).
- Reject duplicate shared_dims with a clear ValueError instead of a cryptic
  numpy "repeated axis" error.
- Drive the rank-0 creation log off the canonical key so a shared-dim
  request canonicalized onto the base group is labelled as base, not view.
- Derive the enumeration size from the passed shape in _gen_rank_enum_for
  instead of self.size, removing an implicit instance coupling.
- Remove the now-dead _order_dims wrapper (no production callers); point its
  unit tests at _order_dims_for, collapsing three ordering helpers to two.

Add regression tests for the numpy-int shape and duplicate shared_dims cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merge test_register_view_stores_rank_view into the copy-semantics test
(renamed test_register_view_success_stores_copied_metadata), which already
covered the same registration path. The merged test now also asserts the
None return value and the stored view name, so no coverage is lost.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread megatron/core/hyper_comm_grid.py Outdated
Comment thread megatron/core/hyper_comm_grid.py Outdated
@yashaswikarnati yashaswikarnati marked this pull request as ready for review June 8, 2026 18:55
@yashaswikarnati yashaswikarnati requested review from a team as code owners June 8, 2026 18:55
@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Jun 8, 2026
Address review feedback: collapse the _is_process_group_member docstring to
one line and condense the class-level views paragraph to two lines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@svcnvidia-nemo-ci svcnvidia-nemo-ci removed the Final Review PR is in the "final review" stage label Jun 8, 2026
@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Approved All necessary approvals have been made label Jun 8, 2026
@yashaswikarnati

Copy link
Copy Markdown
Contributor Author

/ok to test 644aab2

@yashaswikarnati yashaswikarnati enabled auto-merge June 9, 2026 17:37
@yashaswikarnati

Copy link
Copy Markdown
Contributor Author

/ok to test dd90fc6

@svcnvidia-nemo-ci

Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/27233418142

Merged via the queue into NVIDIA:main with commit ba71ec2 Jun 9, 2026
178 checks passed
@yashaswikarnati yashaswikarnati deleted the ykarnati/upstream-hypercommgrid-multilayout branch June 9, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved All necessary approvals have been made complexity: medium Run tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants