Summary
Introduce a default, annotation-based mechanism for injecting NVIDIA ComputeDomain membership into pods created from a PodClique. This provides a low-friction and explicit way for users to control MNNVL participation without requiring verbose ComputeDomain specifications, while maintaining clear precedence and error semantics when interacting with existing automatic MNNVL setup mechanisms in Grove.
Background
Currently the first implementation of automatic MNNVL setup in Grove is underway and nearing completion.
This implementation:
- Assumes NVIDIA DRA drivers are installed and correctly configured on all nodes
- Requires explicit opt-in by the cluster administrator via operator configuration
- Preserves the familiar “it just works” NVLink experience, where:
- NVLink automatically exists between GPUs on NVLink-enabled nodes
- Users do not need to manually define domains or topology for same-node NVLink
This cluster-level automatic behavior is necessary and should remain supported.
However, this model alone is insufficient for a few scenarios:
- Users may want fine-grained control over MNNVL participation, where only a subset of pods (or PodCliques) within a
PodCliqueSet replica should participate in a shared MNNVL domain
- Grove needs a way to provide some level of automatic ComputeDomain injection in default configurations, without requiring fully automatic cluster-wide MNNVL enablement
- Heterogeneous clusters with mixed hardware from multiple vendors may not be able to install NVIDIA DRA drivers on all nodes, making cluster-wide automatic MNNVL infeasible while still requiring explicit MNNVL support on NVIDIA-capable subsets
Consequently Grove needs to provide a simple mechanism to:
- Explicitly control which pods participate in a given MNNVL domain
- Avoid authoring verbose, PVC-style
ComputeDomain resources
- Express intent in a way that composes naturally with Grove primitives and heterogeneous cluster environments
Proposal
Add annotation-based ComputeDomain injection at the PodClique level, using a vendor-scoped annotation.
Core Behavior
If a PodClique includes the annotation:
nvidia.com/computedomain: <name>
then pods created from that PodClique will have the following behavior applied:
- The pods claim a
ComputeDomain with the specified name
- The pods participate in the same MNNVL domain for NVLink / interconnect setup
- If the
ComputeDomain the pods claim is not created, the Operator will create it
The exact set of pods affected is defined by the PodClique and PodCliqueSet replica semantics, rather than being implicitly inferred.
This provides a minimal and explicit contract for MNNVL participation without requiring users to define full ComputeDomain specifications, while allowing users to opt into MNNVL on a per-PodClique basis even when broader automation is undesirable or unavailable.
Defaults and Precedence
Default Availability
- Annotation-based ComputeDomain injection must be enabled by default
- It must be available regardless of whether cluster-level automatic MNNVL is enabled
This ensures that operators can provide baseline ComputeDomain injection behavior out of the box, while still allowing users to selectively opt into MNNVL only where appropriate.
Interaction with Automatic MNNVL Configuration
Cluster Admin Has Not Opted In to Automatic MNNVL
- Users may freely use the
nvidia.com/computedomain annotation
- Annotation-based ComputeDomain injection is honored unconditionally
This enables MNNVL usage in clusters where:
- Automatic MNNVL cannot be enabled globally
- NVIDIA DRA drivers are present only on a subset of nodes
- Multiple accelerator vendors coexist within the same cluster
Cluster Admin Has Opted In to Automatic MNNVL
When automatic MNNVL is enabled at the cluster level:
- Automatic ComputeDomain injection applies by default
- The user API allows opting out of automatic MNNVL injection at the PodCliqueSet replica level
Valid Usage
- If a user opts out of automatic MNNVL injection for a given
PodCliqueSet replica:
- The user may specify the
nvidia.com/computedomain annotation
- The annotation must be honored and result in ComputeDomain injection
This allows users to retain fine-grained control over MNNVL participation when the default automatic behavior is too coarse for a given workload.
Invalid Usage (Must Error)
- If a user does not opt out of automatic MNNVL injection and specifies the
nvidia.com/computedomain annotation:
- This configuration is invalid
- The system must reject the workload
- The error must clearly explain that:
- Automatic MNNVL injection is enabled for this
PodCliqueSet replica
- Explicit
ComputeDomain annotations require opting out of automatic injection
This enforces a clear separation between automatic cluster-driven behavior and explicit user-driven intent, and avoids ambiguous or conflicting configuration.
Rational For Proposed Vendor Specific Annotation Key
The ComputeDomain resource and its semantics are currently NVIDIA-specific, used to support MNNVL / NVLink setup via NVIDIA DRA.
To make ownership explicit and avoid collisions with other accelerator vendors or future abstractions, the annotation used to request ComputeDomain injection must be vendor-scoped:
nvidia.com/computedomain: <name>
Multi-Vendor Extensibility and Future Support
Although this proposal introduces a NVIDIA-scoped annotation and implementation, the design and implementation must not assume that Grove only supports NVIDIA ComputeDomains.
Specifically:
- The injection mechanism should be structured so that:
- Vendor-specific ComputeDomain implementations are pluggable
- Annotation handling can be extended to support equivalent abstractions for other hardware vendors
- The use of a vendor-scoped annotation allows:
- MNNVL and NVIDIA DRA support where available
- Coexistence with other accelerator types in heterogeneous clusters
This proposal explicitly does not preclude Grove from supporting equivalent domain or interconnect abstractions for other accelerators as they become available.
Goals
- Preserve the zero-configuration NVLink experience under automatic MNNVL
- Provide fine-grained user control over MNNVL participation at the PodClique level
- Enable partial and selective MNNVL adoption within a PodCliqueSet replica
- Support heterogeneous clusters where cluster-wide automatic MNNVL is not possible
- Avoid forcing users into verbose
ComputeDomain definitions
- Make MNNVL participation obvious, intentional, and composable with
PodClique semantics
- Enforce a clear and debuggable intent hierarchy between cluster and user configuration
- Ensure Grove remains extensible across accelerator vendors
Non-Goals
- This enhancement does not deprecate:
- The current automatic MNNVL implementation
- Fully specified
ComputeDomain resources
- This enhancement does not define:
- Scheduling or topology placement policy
- Lock Grove into only supporting Nvidia's CR for scale-up high-bandwidth domain
Summary
Introduce a default, annotation-based mechanism for injecting NVIDIA
ComputeDomainmembership into pods created from aPodClique. This provides a low-friction and explicit way for users to control MNNVL participation without requiring verboseComputeDomainspecifications, while maintaining clear precedence and error semantics when interacting with existing automatic MNNVL setup mechanisms in Grove.Background
Currently the first implementation of automatic MNNVL setup in Grove is underway and nearing completion.
This implementation:
This cluster-level automatic behavior is necessary and should remain supported.
However, this model alone is insufficient for a few scenarios:
PodCliqueSetreplica should participate in a shared MNNVL domainConsequently Grove needs to provide a simple mechanism to:
ComputeDomainresourcesProposal
Add annotation-based ComputeDomain injection at the PodClique level, using a vendor-scoped annotation.
Core Behavior
If a
PodCliqueincludes the annotation:then pods created from that
PodCliquewill have the following behavior applied:ComputeDomainwith the specified nameComputeDomainthe pods claim is not created, the Operator will create itThe exact set of pods affected is defined by the
PodCliqueandPodCliqueSetreplica semantics, rather than being implicitly inferred.This provides a minimal and explicit contract for MNNVL participation without requiring users to define full
ComputeDomainspecifications, while allowing users to opt into MNNVL on a per-PodClique basis even when broader automation is undesirable or unavailable.Defaults and Precedence
Default Availability
This ensures that operators can provide baseline ComputeDomain injection behavior out of the box, while still allowing users to selectively opt into MNNVL only where appropriate.
Interaction with Automatic MNNVL Configuration
Cluster Admin Has Not Opted In to Automatic MNNVL
nvidia.com/computedomainannotationThis enables MNNVL usage in clusters where:
Cluster Admin Has Opted In to Automatic MNNVL
When automatic MNNVL is enabled at the cluster level:
Valid Usage
PodCliqueSetreplica:nvidia.com/computedomainannotationThis allows users to retain fine-grained control over MNNVL participation when the default automatic behavior is too coarse for a given workload.
Invalid Usage (Must Error)
nvidia.com/computedomainannotation:PodCliqueSetreplicaComputeDomainannotations require opting out of automatic injectionThis enforces a clear separation between automatic cluster-driven behavior and explicit user-driven intent, and avoids ambiguous or conflicting configuration.
Rational For Proposed Vendor Specific Annotation Key
The
ComputeDomainresource and its semantics are currently NVIDIA-specific, used to support MNNVL / NVLink setup via NVIDIA DRA.To make ownership explicit and avoid collisions with other accelerator vendors or future abstractions, the annotation used to request ComputeDomain injection must be vendor-scoped:
Multi-Vendor Extensibility and Future Support
Although this proposal introduces a NVIDIA-scoped annotation and implementation, the design and implementation must not assume that Grove only supports NVIDIA ComputeDomains.
Specifically:
This proposal explicitly does not preclude Grove from supporting equivalent domain or interconnect abstractions for other accelerators as they become available.
Goals
ComputeDomaindefinitionsPodCliquesemanticsNon-Goals
ComputeDomainresources