Pod discovery env var#117
Conversation
Implement Phase 1 of environment variable injection for Grove pods: - GROVE_PGS_NAME from PodGangSet name label - GROVE_PGS_INDEX from PodGangSet replica index label - GROVE_PCLQ_NAME from PodClique name label - GROVE_PCSG_NAME from PodCliqueScalingGroup label (conditional) - POD_NAMESPACE from pod namespace metadata Uses Kubernetes Downward API for automatic population from existing pod labels and metadata. Includes comprehensive unit tests. Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Replace hardcoded label strings with constants in Downward API field paths: - LabelPartOfKey for app.kubernetes.io/part-of - LabelPodGangSetReplicaIndex for grove.io/podgangset-replica-index - LabelPodCliqueName for grove.io/podclique - LabelPodCliqueScalingGroup for grove.io/podcliquescalinggroup Improves maintainability and consistency across the codebase. Signed-off-by: Ron Kahn <rkahn@nvidia.com>
- Add MergeEnvVars utility function to prevent duplicate environment variables - Add GROVE_PCLQ_INDEX and GROVE_PCSG_INDEX environment variables - Add new label constants for replica indices - Add utility functions for extracting replica indices from FQN names - Add comprehensive tests for deduplication and idempotent behavior - Update pod creation to use environment variable merging Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…gGroup Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…ironment variable handling Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…nce pod index assignment logic Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…nd refactor IndexManager documentation Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…ce initialization with max assigned index Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…ssociated tests Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
# Conflicts: # operator/api/core/v1alpha1/constants.go # operator/internal/component/podclique/pod/pod.go # operator/internal/component/podcliquescalinggroup/podclique/podclique.go
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…ethods Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
renormalize
left a comment
There was a problem hiding this comment.
Thanks for the PR @Ronkahn21!
Have a few thoughts that I've posted. I will post another round of reviews once I get a better idea about the index calculation.
…rences Signed-off-by: Ron Kahn <rkahn@nvidia.com>
| func (r _resource) createPods(ctx context.Context, logger logr.Logger, pclq *grovecorev1alpha1.PodClique, podGangName string, numPods int) (int, error) { | ||
| func (r _resource) createPods(ctx context.Context, logger logr.Logger, pclq *grovecorev1alpha1.PodClique, podGangName string, numPods int, existingPods []*corev1.Pod) (int, error) { | ||
| createTasks := make([]utils.Task, 0, numPods) | ||
| indexManager := indexer.InitIndexManger(existingPods, int(pclq.Spec.Replicas), logger) |
There was a problem hiding this comment.
Just to clarify, will this create an index manager for every pod?
There was a problem hiding this comment.
For every reconcile loop when required a adding new pods
There was a problem hiding this comment.
Ok. Could we also simply hand out an index to every task here instead of calling GetIndex inside buildResource?
| pod.Spec = *pclq.Spec.PodSpec.DeepCopy() | ||
|
|
||
| pod.Spec.SchedulingGates = []corev1.PodSchedulingGate{{Name: podGangSchedulingGate}} | ||
| podIndex, err := indexMg.GetIndex(pod) |
There was a problem hiding this comment.
When creating a new pod, is it possible for index manager to contain an existing entry for this pod name?
There was a problem hiding this comment.
Its should not happend but yes,
|
|
||
| func (r _resource) buildResource(pclq *grovecorev1alpha1.PodClique, podGangName string, pod *corev1.Pod) error { | ||
| labels, err := getLabels(pclq.ObjectMeta, podGangName) | ||
| func (r _resource) buildResource(pclq *grovecorev1alpha1.PodClique, podGangName string, pod *corev1.Pod, indexMg *indexer.PodIndexTracker) error { |
There was a problem hiding this comment.
Could you please add some comments that explain what is the role of the index manager in this function?
|
|
||
| // constants for Grove environment variables | ||
| const ( | ||
| envVarGrovePGSName = "GROVE_PGS_NAME" |
There was a problem hiding this comment.
should these be part of the API? When Grove operator sets an environment variable then it effectively states that consumers should look at these and consume it for discovery purpose.
Also Is the variable names need not have Grove in them. The environment variable name should have GROVE in it as it helps disambiguate and also group env vars for the application consuming these env vars.
There was a problem hiding this comment.
We can move these constants to api.
| } | ||
|
|
||
| // addGroveEnvironmentVariables adds Grove-specific environment variables | ||
| func addGroveEnvironmentVariables(pod *corev1.Pod, pgsName string, pgsReplicaIndex, podIndex int) { |
There was a problem hiding this comment.
rename this function to setEnvironmentVariables
| }, | ||
| { | ||
| Name: envVarGrovePGSIndex, | ||
| ValueFrom: &corev1.EnvVarSource{ |
There was a problem hiding this comment.
Same as above, you are already passing PGS replica index. Please use that directly.
| { | ||
| Name: envVarGrovePCLQName, | ||
| ValueFrom: &corev1.EnvVarSource{ | ||
| FieldRef: &corev1.ObjectFieldSelector{ |
There was a problem hiding this comment.
you already have the information available in the calling function. Just pass the value.
| ) | ||
|
|
||
| const ( | ||
| envVarGrovePCSGName = "GROVE_PCSG_NAME" |
There was a problem hiding this comment.
why are these env vars required on PCLQ?
Also how is this linked to discovery at the Pod level?
There was a problem hiding this comment.
I think the goal is to propagate these env vars downwards to be set in pods by setting labels on the PodClique.
| return nil | ||
| } | ||
|
|
||
| func (r _resource) getPCSGroupReplicaTotalPodSize(pgs *grovecorev1alpha1.PodGangSet, pcsg *grovecorev1alpha1.PodCliqueScalingGroup) int { |
There was a problem hiding this comment.
can you please revert this code and not include this as part of the PR? The scope of this PR is limited. We can discuss why you need to put env vars and then take a call in a later PR. I currently do not see any usage of these env vars. So instead of starting another discussion as part of discovery functionality and thereby stopping the merge of this PR, I would prefer that we delink this change from this PR and address that concern later.
There was a problem hiding this comment.
@unmarshall I might be misunderstanding here but are you saying you're not sure about why we need PCSGGroupReplicaTotalPodSize (I believe the actual env var name is something simpler) but we definitely need to have an env var with this value in order to know our world size when starting up inference frameworks.
There was a problem hiding this comment.
Can we call the env var to be GROVE_PCSG_TEMPLATE_NUM_PODS, and then rename the function here to getPCSGTemplateNumPods and variable to pcsgTemplateNumPods?
|
|
||
| // PodIndexTracker manages pod indices for PodCliques. | ||
| // Prioritizes filling lower indices first by maintaining a continuous sequence from 0. | ||
| type PodIndexTracker struct { |
There was a problem hiding this comment.
Is this really required?
All we need to do is keep track of available indexes and nothing else.
I also confirmed with @sanjaychatterjee today morning that there is no use case that the replacement pod should preserve the same index as the one that it replaces.
There was a problem hiding this comment.
I agree that the tracker is not required.
| } | ||
|
|
||
| // validatePodUniqueness checks if the pod name is already indexed | ||
| func validatePodUniqueness(podName string, hostname string, indexedPods map[string]int) error { |
There was a problem hiding this comment.
I don't think this validation is required.
| continue | ||
| } | ||
| index, err := extractIndexFromHostname(pod.Spec.Hostname) | ||
| if validationErr := validateHostNameIndex(err, pod, index, indexedPods, usedIndices); validationErr != nil { |
There was a problem hiding this comment.
We should simplify by moving the validations into extractIndexFromHostname.
| usedIndices.Insert(index) | ||
| } | ||
|
|
||
| return NewPodIndexTracker(indexedPods, maxAssigned) |
There was a problem hiding this comment.
I am not sure we need this tracker. We need to identify used indices as above to figure out the first available index. Also, if we are constructing separate trackers for each pod, then there would be a data-race when identifying the index to be used for the pod. We need to protect that logic in a critical section.
| } | ||
|
|
||
| // assignAvailableIndex assigns the next available index to a pod. | ||
| func (im *PodIndexTracker) assignAvailableIndex(podName string) int { |
There was a problem hiding this comment.
Can we protect this code from data-races when multiple pods are processed concurrently?
|
|
||
| // PodIndexTracker manages pod indices for PodCliques. | ||
| // Prioritizes filling lower indices first by maintaining a continuous sequence from 0. | ||
| type PodIndexTracker struct { |
There was a problem hiding this comment.
I agree that the tracker is not required.
|
|
||
| // constants for Grove environment variables | ||
| const ( | ||
| envVarGrovePGSName = "GROVE_PGS_NAME" |
There was a problem hiding this comment.
We can move these constants to api.
| } | ||
|
|
||
| // GetIndex returns the index for the given pod, assigning a new one if needed. | ||
| func (im *PodIndexTracker) GetIndex(pod *corev1.Pod) (int, error) { |
There was a problem hiding this comment.
GetIndex should be protected from data-races when multiple pods are processed concurrently.
| func (r _resource) createPods(ctx context.Context, logger logr.Logger, pclq *grovecorev1alpha1.PodClique, podGangName string, numPods int) (int, error) { | ||
| func (r _resource) createPods(ctx context.Context, logger logr.Logger, pclq *grovecorev1alpha1.PodClique, podGangName string, numPods int, existingPods []*corev1.Pod) (int, error) { | ||
| createTasks := make([]utils.Task, 0, numPods) | ||
| indexManager := indexer.InitIndexManger(existingPods, int(pclq.Spec.Replicas), logger) |
There was a problem hiding this comment.
Ok. Could we also simply hand out an index to every task here instead of calling GetIndex inside buildResource?
- Add Grove environment variable constants for better consistency - Simplify environment variable handling by removing mergeEnvVars function - Replace PodIndexTracker with stateless functions for better design - Remove unused environment variables and related references - Update related test files to reflect tracker changes Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…addition Signed-off-by: Ron Kahn <rkahn@nvidia.com>
sanjaychatterjee
left a comment
There was a problem hiding this comment.
Looks a lot better. Thanks for simplifying the PR.
| const ( | ||
| envVarGrovePCSGName = "GROVE_PCSG_NAME" | ||
| envVarGrovePCSGIndex = "GROVE_PCSG_INDEX" | ||
| evnVarGrovePCSGReplicaTotalPodSize = "GROVE_PCSG_TEMPLATE_PODS_PER_REPLICA" |
There was a problem hiding this comment.
Can we move these to api constants as well?
| ) | ||
|
|
||
| const ( | ||
| envVarGrovePCSGName = "GROVE_PCSG_NAME" |
There was a problem hiding this comment.
I think the goal is to propagate these env vars downwards to be set in pods by setting labels on the PodClique.
| return nil | ||
| } | ||
|
|
||
| func (r _resource) getPCSGroupReplicaTotalPodSize(pgs *grovecorev1alpha1.PodGangSet, pcsg *grovecorev1alpha1.PodCliqueScalingGroup) int { |
There was a problem hiding this comment.
Can we call the env var to be GROVE_PCSG_TEMPLATE_NUM_PODS, and then rename the function here to getPCSGTemplateNumPods and variable to pcsgTemplateNumPods?
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
| for foundIndexCounter < requiredIndicesCount { | ||
| if !usedIndices.Has(currentIndex) { | ||
| availableIndices[foundIndexCounter] = currentIndex | ||
| usedIndices.Insert(currentIndex) |
There was a problem hiding this comment.
I think this is not required as you are already setting the available index in availableIndices map and also incrementing foundIndexCounter as well.
| if err != nil { | ||
| logger.Error(err, "failed to extract index from hostname", | ||
| "pod", pod.Name, "hostname", pod.Spec.Hostname) | ||
| continue |
There was a problem hiding this comment.
if there is an error extracting the index from host name then this should be returned and you should exit early.
| } | ||
|
|
||
| if usedIndices.Has(index) { | ||
| logger.Error(fmt.Errorf("duplicate index %d", index), |
There was a problem hiding this comment.
if there is an error extracting the index from host name then this should be returned and you should exit early. This should ideally never happen and if it does then it indicates a coding error which should never be silently logged and forgotten. Please return the error.
| usedIndices := sets.New[int]() | ||
|
|
||
| for _, pod := range existingPods { | ||
| if pod == nil { |
There was a problem hiding this comment.
i think there is no need to check if it is nil as it can never happen. Even if it happens you insist to check it then you should error out instead of ignoring it completely. This is again a programming error.
| } | ||
|
|
||
| // isValidIndex checks if an index is valid (non-negative). | ||
| func isValidIndex(index int) bool { |
There was a problem hiding this comment.
this function is only used in unit tests, can you please move this function to test file?
| // prevent duplicate environment variables, when the updating podclique | ||
| // is being created by the PodCliqueScalingGroup controller, as the | ||
| // environment variables are already defined in the PodCliqueTemplateSpec. | ||
| podTemplate.Containers[i].Env = utils.MergeEnvVars(podTemplate.Containers[i].Env, pcsgEnvVars) |
There was a problem hiding this comment.
It is not correct to change the template spec as that is defined by the consumer. I would recommend we remove this function from here.
There was a problem hiding this comment.
It does not change the env in the pgs api, only the create PCQL by the scaling group
I think it's the right place to inject stuff it as the PCSG holds the info
| // limitations under the License. | ||
| // */ | ||
|
|
||
| package indexer |
There was a problem hiding this comment.
Can you rename indexer to index
…ions Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
| }, | ||
| } | ||
|
|
||
| // Add Grove environment variables to all containers in the pod |
There was a problem hiding this comment.
Should we add the env vars to init containers as well?
| }, | ||
| }, | ||
| } | ||
| pcqlObjPodSpec := &pclq.Spec.PodSpec |
There was a problem hiding this comment.
Should we add the env vars to init containers as well?
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…for containers and init containers Signed-off-by: Ron Kahn <rkahn@nvidia.com>
existing PodCliques properly. * Simplified findAvailableIndices function further. * replaced AddEnvToAllContainers and AddEnvToAllInitContainer functions with componentutils.AddEnvVarsToContainers function. Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
Signed-off-by: madhav bhargava <madhav.bhargava@sap.com>
This pull request introduces several enhancements and refactorings to improve the handling of PodClique and PodCliqueScalingGroup resources in the Grove operator. Key changes include the addition of environment variables for better configurability, new utility methods for managing pod indices, and updates to service discovery and labeling logic.
Enhancements to PodClique and PodCliqueScalingGroup:
Environment Variable Support:
EnvVarPGSName,EnvVarPCSGName, and others, to improve configurability (operator/api/core/v1alpha1/constants.go).operator/internal/component/podclique/pod/pod.go,operator/internal/component/podcliquescalinggroup/podclique/podclique.go) [1] [2].Service Discovery Enhancements:
GenerateHeadlessServiceAddressto generate fully qualified headless service addresses for Pods (operator/api/core/v1alpha1/namegen.go).operator/internal/component/podclique/pod/pod.go).Utility Improvements:
indexpackage to calculate the next available pod indices, ensuring no conflicts during pod creation (operator/internal/index/tracker.go).createPodsmethod to leverage the new index utility for assigning unique pod indices (operator/internal/component/podclique/pod/syncflow.go).Refactorings and Code Simplifications:
Labeling Logic:
operator/internal/component/podclique/pod/pod.go,operator/internal/component/podcliquescalinggroup/podclique/podclique.go) [1] [2].GeneratePodName, to streamline the codebase (operator/api/core/v1alpha1/namegen.go).Error Handling:
buildResourceandcreatePods, to provide more descriptive error messages (operator/internal/component/podclique/pod/pod.go,operator/internal/component/podclique/pod/syncflow.go) [1] [2].These changes collectively enhance the robustness, configurability, and maintainability of the Grove operator, particularly in managing complex PodClique and PodCliqueScalingGroup setups.
Example: