feat(recipes): add GKE COS inference and Dynamo overlay recipes#414
Merged
yuanchen8911 merged 1 commit intoNVIDIA:mainfrom Mar 16, 2026
Merged
Conversation
mchmarny
approved these changes
Mar 16, 2026
Add GKE inference overlay chain mirroring the EKS inference pattern:
gke-cos → gke-cos-inference → h100-gke-cos-inference
→ h100-gke-cos-inference-dynamo
- gke-cos-inference: GKE COS inference base with kgateway components
- h100-gke-cos-inference: H100 GKE inference with skyhook tuning, CDI
- h100-gke-cos-inference-dynamo: Dynamo platform with standard-rwo
storage, DRA, full deployment + conformance validation
Also adds:
- demos/cuj2-gke.md: CUJ2 inference walkthrough for GKE
- Conformance test for h100-gke-cos-inference-dynamo overlay selection
Validated on GKE aicr-demo2 (2x a3-megagpu-8g, COS, K8s 1.35):
- 14 components deployed, all pods healthy
- CNCF conformance: 10/11 passed (cluster-autoscaling skipped)
- CNCF submission evidence: 8/8 passed
- Dynamo vLLM inference workload serving on Qwen3-0.6B
Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
44b8b03 to
43e9073
Compare
janetkuo
reviewed
Mar 16, 2026
janetkuo
left a comment
There was a problem hiding this comment.
Could https://github.com/NVIDIA/aicr/blob/main/recipes/README.md be updated as well, to reflect this addition?
xdu31
pushed a commit
to xdu31/aicr
that referenced
this pull request
Mar 24, 2026
…IA#414) Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add GKE inference overlay recipes and CUJ2 walkthrough for Dynamo inference on GKE.
Motivation / Context
GKE lacked inference overlay recipes — only training overlays existed. This adds the full inference chain including Dynamo platform support, mirroring the existing EKS inference pattern.
Fixes: N/A
Related: CUJ2 for GKE
Type of Change
Component(s) Affected
pkg/recipe)docs/,examples/)Implementation Notes
Overlay inheritance chain:
gke-cos-inference: kgateway components (same as EKS inference)h100-gke-cos-inference: H100 skyhook tuning, CDI enabledh100-gke-cos-inference-dynamo: Dynamo platform withstandard-rwostorage (GKE PD CSI), DRA, full deployment + conformance validationdemos/cuj2-gke.md: End-to-end inference walkthrough for GKETesting
Validated on live GKE cluster (aicr-demo2, 2x a3-megagpu-8g, COS, K8s 1.35):
Risk Assessment
Rollout notes: N/A — additive recipe overlays, no impact on existing recipes.
Checklist
make testwith-race)make lint)git commit -S)