What you would like to be added?
We need to define groups of pods within a PodCliqueScalingGroup (PCSG) that share the exact same GPU(s). When scaling the PCSG, all GPU-sharing groups should be replicated together.
Use Case
Within a single PCSG, define multiple "GPU-sharing groups" where pods share GPUs:
Each sharing group has 2+ pods (e.g., primary-shadow pair)
Pods within a sharing group access the same GPU(s)
Different sharing groups use different GPU sets
When scaling the PCSG, all groups are replicated together
Example:
PCSG (replicas: 2)
├── Replica 0
│ ├── Sharing Group "worker1": primary-1 + shadow-1 → share GPU 0-7
│ └── Sharing Group "worker2": primary-2 + shadow-2 → share GPU 8-15
└── Replica 1
├── Sharing Group "worker3": primary-3 + shadow-3 → share GPU 16-23
└── Sharing Group "worker4": primary-4 + shadow-4 → share GPU 24-31
Current Gap
No way to specify which pods within a PCSG should share GPUs
No mechanism to define multiple independent GPU-sharing groups
Traditional GPU requests allocate separate GPUs to each pod
Why is this needed?
Dynamo GPU emory service
What you would like to be added?
We need to define groups of pods within a PodCliqueScalingGroup (PCSG) that share the exact same GPU(s). When scaling the PCSG, all GPU-sharing groups should be replicated together.
Use Case
Within a single PCSG, define multiple "GPU-sharing groups" where pods share GPUs:
Each sharing group has 2+ pods (e.g., primary-shadow pair)
Pods within a sharing group access the same GPU(s)
Different sharing groups use different GPU sets
When scaling the PCSG, all groups are replicated together
Example:
Current Gap
No way to specify which pods within a PCSG should share GPUs
No mechanism to define multiple independent GPU-sharing groups
Traditional GPU requests allocate separate GPUs to each pod
Why is this needed?
Dynamo GPU emory service