Enhancement: MNNVL V2, ComputeDomain Injection Via Annotation

## Summary

Introduce a default, annotation-based mechanism for injecting NVIDIA `ComputeDomain` membership into pods created from a `PodClique`. This provides a low-friction and explicit way for users to control MNNVL participation without requiring verbose `ComputeDomain` specifications, while maintaining clear precedence and error semantics when interacting with existing automatic MNNVL setup mechanisms in Grove.

---

## Background

Currently the **first implementation of automatic MNNVL setup** in Grove is underway and nearing completion.

This implementation:
- Assumes **NVIDIA DRA drivers are installed and correctly configured on all nodes**
- Requires **explicit opt-in by the cluster administrator via operator configuration**
- Preserves the familiar **“it just works” NVLink experience**, where:
  - NVLink automatically exists between GPUs on NVLink-enabled nodes
  - Users do not need to manually define domains or topology for same-node NVLink

This cluster-level automatic behavior is necessary and should remain supported.

However, this model alone is insufficient for a few scenarios:
- Users may want **fine-grained control over MNNVL participation**, where only a subset of pods (or PodCliques) within a `PodCliqueSet` replica should participate in a shared MNNVL domain
- Grove needs a way to provide **some level of automatic ComputeDomain injection in default configurations**, without requiring fully automatic cluster-wide MNNVL enablement
- Heterogeneous clusters with **mixed hardware from multiple vendors** may not be able to install NVIDIA DRA drivers on all nodes, making cluster-wide automatic MNNVL infeasible while still requiring explicit MNNVL support on NVIDIA-capable subsets

Consequently Grove needs to provide a simple mechanism to:
- Explicitly control which pods participate in a given MNNVL domain
- Avoid authoring verbose, PVC-style `ComputeDomain` resources
- Express intent in a way that composes naturally with Grove primitives and heterogeneous cluster environments

---

## Proposal

Add **annotation-based ComputeDomain injection** at the **PodClique level**, using a vendor-scoped annotation.

### Core Behavior

If a `PodClique` includes the annotation:

```yaml
nvidia.com/computedomain: <name>
```

then pods created from that `PodClique` will have the following behavior applied:
- The pods claim a `ComputeDomain` with the specified name
- The pods participate in the same MNNVL domain for NVLink / interconnect setup
-  If the `ComputeDomain` the pods claim is not created, the Operator will create it

The exact set of pods affected is defined by the `PodClique` and `PodCliqueSet` replica semantics, rather than being implicitly inferred.

This provides a **minimal and explicit contract** for MNNVL participation without requiring users to define full `ComputeDomain` specifications, while allowing users to opt into MNNVL on a per-PodClique basis even when broader automation is undesirable or unavailable.

---

## Defaults and Precedence

### Default Availability

- Annotation-based ComputeDomain injection **must be enabled by default**
- It must be available regardless of whether cluster-level automatic MNNVL is enabled

This ensures that operators can provide **baseline ComputeDomain injection behavior** out of the box, while still allowing users to selectively opt into MNNVL only where appropriate.

---

## Interaction with Automatic MNNVL Configuration

### Cluster Admin Has *Not* Opted In to Automatic MNNVL

- Users may freely use the `nvidia.com/computedomain` annotation
- Annotation-based ComputeDomain injection is honored unconditionally

This enables MNNVL usage in clusters where:
- Automatic MNNVL cannot be enabled globally
- NVIDIA DRA drivers are present only on a subset of nodes
- Multiple accelerator vendors coexist within the same cluster

---

### Cluster Admin Has Opted In to Automatic MNNVL

When automatic MNNVL is enabled at the cluster level:

- Automatic ComputeDomain injection applies by default
- The user API allows opting **out of automatic MNNVL injection** at the **PodCliqueSet replica level**

#### Valid Usage

- If a user **opts out** of automatic MNNVL injection for a given `PodCliqueSet` replica:
  - The user may specify the `nvidia.com/computedomain` annotation
  - The annotation must be honored and result in ComputeDomain injection

This allows users to retain **fine-grained control** over MNNVL participation when the default automatic behavior is too coarse for a given workload.

#### Invalid Usage (Must Error)

- If a user **does not opt out** of automatic MNNVL injection **and** specifies the `nvidia.com/computedomain` annotation:
  - This configuration is invalid
  - The system must reject the workload
  - The error must clearly explain that:
    - Automatic MNNVL injection is enabled for this `PodCliqueSet` replica
    - Explicit `ComputeDomain` annotations require opting out of automatic injection

This enforces a clear separation between **automatic cluster-driven behavior** and **explicit user-driven intent**, and avoids ambiguous or conflicting configuration.

---

## Rational For Proposed Vendor Specific Annotation Key

The `ComputeDomain` resource and its semantics are currently **NVIDIA-specific**, used to support MNNVL / NVLink setup via NVIDIA DRA.

To make ownership explicit and avoid collisions with other accelerator vendors or future abstractions, the annotation used to request ComputeDomain injection must be vendor-scoped:

```yaml
nvidia.com/computedomain: <name>
```

---

## Multi-Vendor Extensibility and Future Support

Although this proposal introduces a **NVIDIA-scoped annotation and implementation**, the design and implementation **must not assume that Grove only supports NVIDIA ComputeDomains**.

Specifically:
- The injection mechanism should be structured so that:
  - Vendor-specific ComputeDomain implementations are **pluggable**
  - Annotation handling can be extended to support equivalent abstractions for other hardware vendors
- The use of a vendor-scoped annotation allows:
  - MNNVL and NVIDIA DRA support where available
  - Coexistence with other accelerator types in heterogeneous clusters

This proposal explicitly **does not preclude** Grove from supporting equivalent domain or interconnect abstractions for other accelerators as they become available.

---

## Goals

- Preserve the **zero-configuration NVLink experience** under automatic MNNVL
- Provide **fine-grained user control** over MNNVL participation at the PodClique level
- Enable **partial and selective MNNVL adoption** within a PodCliqueSet replica
- Support **heterogeneous clusters** where cluster-wide automatic MNNVL is not possible
- Avoid forcing users into verbose `ComputeDomain` definitions
- Make MNNVL participation **obvious, intentional, and composable** with `PodClique` semantics
- Enforce a **clear and debuggable intent hierarchy** between cluster and user configuration
- Ensure Grove remains **extensible across accelerator vendors**

---

## Non-Goals

- This enhancement does **not** deprecate:
  - The current automatic MNNVL implementation
  - Fully specified `ComputeDomain` resources
- This enhancement does **not** define:
  - Scheduling or topology placement policy
  - Lock Grove into only supporting Nvidia's CR for scale-up high-bandwidth domain


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: MNNVL V2, ComputeDomain Injection Via Annotation #417

Summary

Background

Proposal

Core Behavior

Defaults and Precedence

Default Availability

Interaction with Automatic MNNVL Configuration

Cluster Admin Has Not Opted In to Automatic MNNVL

Cluster Admin Has Opted In to Automatic MNNVL

Valid Usage

Invalid Usage (Must Error)

Rational For Proposed Vendor Specific Annotation Key

Multi-Vendor Extensibility and Future Support

Goals

Non-Goals

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Enhancement: MNNVL V2, ComputeDomain Injection Via Annotation #417

Description

Summary

Background

Proposal

Core Behavior

Defaults and Precedence

Default Availability

Interaction with Automatic MNNVL Configuration

Cluster Admin Has Not Opted In to Automatic MNNVL

Cluster Admin Has Opted In to Automatic MNNVL

Valid Usage

Invalid Usage (Must Error)

Rational For Proposed Vendor Specific Annotation Key

Multi-Vendor Extensibility and Future Support

Goals

Non-Goals

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions