Skip to content

feat(gke): update skyhook to not be runtimeRequired#380

Merged
yuanchen8911 merged 1 commit intofeat/gke-cos-training-overlaysfrom
feat/gke-cos-training-overlays-skyhookUpdate
Mar 12, 2026
Merged

feat(gke): update skyhook to not be runtimeRequired#380
yuanchen8911 merged 1 commit intofeat/gke-cos-training-overlaysfrom
feat/gke-cos-training-overlays-skyhookUpdate

Conversation

@ayuskauskas
Copy link
Copy Markdown
Contributor

Also change to 0.1.1 for nvidia-tuning-gke to better support gb200

Summary

Motivation / Context

Fixes:
Related:

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: ____________

Implementation Notes

Testing

# Commands run (prefer `make qualify` for non-trivial changes)
make qualify

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rollout notes:

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@ayuskauskas ayuskauskas requested a review from a team as a code owner March 12, 2026 16:35
@yuanchen8911 yuanchen8911 force-pushed the feat/gke-cos-training-overlays branch 3 times, most recently from 837dda2 to d059741 Compare March 12, 2026 16:39
@yuanchen8911 yuanchen8911 requested a review from a team as a code owner March 12, 2026 16:39
@ayuskauskas ayuskauskas force-pushed the feat/gke-cos-training-overlays-skyhookUpdate branch from 47bd023 to d3fcc18 Compare March 12, 2026 16:42
Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR removes runtimeRequired from GKE tuning Skyhook spec (tuning-gke.yaml#L38 (https://github.com/NVIDIA/aicr/blob/feat/gke-
cos-training-overlays-skyhookUpdate/recipes/components/skyhook-customizations/manifests/tuning-gke.yaml#L38)), so runtime-
required gating is no longer enabled for this path.

 - Also change to 0.1.1 for nvidia-tuning-gke to better support gb200
 - Add documentation about the differences between gke and other services
@ayuskauskas ayuskauskas force-pushed the feat/gke-cos-training-overlays-skyhookUpdate branch from d3fcc18 to a2e37ab Compare March 12, 2026 17:00
@yuanchen8911 yuanchen8911 merged commit 929b390 into feat/gke-cos-training-overlays Mar 12, 2026
2 checks passed
@yuanchen8911 yuanchen8911 deleted the feat/gke-cos-training-overlays-skyhookUpdate branch March 12, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants