Skip to content

feat(recipes): add GKE COS inference and Dynamo overlay recipes#414

Merged
yuanchen8911 merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:feat/gke-inference-dynamo-overlays
Mar 16, 2026
Merged

feat(recipes): add GKE COS inference and Dynamo overlay recipes#414
yuanchen8911 merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:feat/gke-inference-dynamo-overlays

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

Summary

Add GKE inference overlay recipes and CUJ2 walkthrough for Dynamo inference on GKE.

Motivation / Context

GKE lacked inference overlay recipes — only training overlays existed. This adds the full inference chain including Dynamo platform support, mirroring the existing EKS inference pattern.

Fixes: N/A
Related: CUJ2 for GKE

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Documentation update

Component(s) Affected

  • Recipe engine / data (pkg/recipe)
  • Docs/examples (docs/, examples/)

Implementation Notes

Overlay inheritance chain:

gke-cos → gke-cos-inference → h100-gke-cos-inference → h100-gke-cos-inference-dynamo
  • gke-cos-inference: kgateway components (same as EKS inference)
  • h100-gke-cos-inference: H100 skyhook tuning, CDI enabled
  • h100-gke-cos-inference-dynamo: Dynamo platform with standard-rwo storage (GKE PD CSI), DRA, full deployment + conformance validation
  • demos/cuj2-gke.md: End-to-end inference walkthrough for GKE
  • Conformance test added for overlay selection regression prevention

Testing

make test  # Recipe conformance test passes

Validated on live GKE cluster (aicr-demo2, 2x a3-megagpu-8g, COS, K8s 1.35):

  • 14 components deployed, all pods healthy
  • Conformance: 10/11 passed (cluster-autoscaling skipped — no Karpenter on GKE)
  • CNCF submission evidence: 8/8 passed
  • Dynamo vLLM workload (Qwen3-0.6B) serving via inference gateway

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert

Rollout notes: N/A — additive recipe overlays, no impact on existing recipes.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@yuanchen8911 yuanchen8911 requested review from a team as code owners March 16, 2026 18:59
@yuanchen8911 yuanchen8911 added enhancement New feature or request area/recipes labels Mar 16, 2026
@yuanchen8911 yuanchen8911 requested a review from mchmarny March 16, 2026 19:22
Add GKE inference overlay chain mirroring the EKS inference pattern:

  gke-cos → gke-cos-inference → h100-gke-cos-inference
    → h100-gke-cos-inference-dynamo

- gke-cos-inference: GKE COS inference base with kgateway components
- h100-gke-cos-inference: H100 GKE inference with skyhook tuning, CDI
- h100-gke-cos-inference-dynamo: Dynamo platform with standard-rwo
  storage, DRA, full deployment + conformance validation

Also adds:
- demos/cuj2-gke.md: CUJ2 inference walkthrough for GKE
- Conformance test for h100-gke-cos-inference-dynamo overlay selection

Validated on GKE aicr-demo2 (2x a3-megagpu-8g, COS, K8s 1.35):
- 14 components deployed, all pods healthy
- CNCF conformance: 10/11 passed (cluster-autoscaling skipped)
- CNCF submission evidence: 8/8 passed
- Dynamo vLLM inference workload serving on Qwen3-0.6B

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
@yuanchen8911 yuanchen8911 force-pushed the feat/gke-inference-dynamo-overlays branch from 44b8b03 to 43e9073 Compare March 16, 2026 21:54
@yuanchen8911 yuanchen8911 merged commit 500b561 into NVIDIA:main Mar 16, 2026
49 checks passed
Copy link
Copy Markdown

@janetkuo janetkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could https://github.com/NVIDIA/aicr/blob/main/recipes/README.md be updated as well, to reflect this addition?

xdu31 pushed a commit to xdu31/aicr that referenced this pull request Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants