Releases: NVIDIA/aicr
Releases · NVIDIA/aicr
v0.11.1
Immutable
release. Only release title and notes can be modified.
Changelog
New Features
- 76d27c7: feat(recipes): bump kai-scheduler to v0.13.0, fix DRA gang scheduling (#450) (@yuanchen8911)
Bug Fixes
- 0d267c9: fix(api): add b200 accelerator to OpenAPI spec enum (#455) (@nvidiajeff)
- cdc9bf4: fix(cli): replace broken shell completion with full flag+alias support (#454) (@nvidiajeff)
- 692bbf0: fix(validator): templatize EKS NCCL runtime for dynamic EFA and instance type discovery (#447) (@xdu31)
Other Tasks
v0.11.0
Immutable
release. Only release title and notes can be modified.
Changelog
New Features
- 500b561: feat(recipes): add GKE COS inference and Dynamo overlay recipes (#414) (@yuanchen8911)
- 3e46e47: feat(snapshot): add --runtime-class flag for CDI environments (#434) (@atif1996)
- d3fd483: feat(validator): add EKS/GKE cluster autoscaling fallback (#438) (@yuanchen8911)
- 87fd28f: feat: Add AKS (Azure Kubernetes Service) H100 recipe overlays (#415) (@Jont828)
- 0866ef0: feat: add B200 accelerator type support (#437) (@atif1996)
- 46736f8: feat: add query command for hydrated recipe value extraction (#445) (@mchmarny)
Bug Fixes
- 7c377c1: fix(bundler): clean up orphaned KAI and Kubeflow Trainer CRDs on undeploy (#416) (@yuanchen8911)
- 437126c: fix(gke): remove CAP_ prefix from capability names in TCPXO manifests (#428) (@yuanchen8911)
- f2ec6b2: fix(gke): update TCPXO to NRI profile without hostNetwork (#420) (@yuanchen8911)
- 8a65335: fix(validator): add retry for ai-service-metrics Prometheus query (#393) (@yuanchen8911)
- d99235e: fix(validator): remove hostNetwork and privileged from GKE NCCL runtime, use NRI device injection (#427) (@xdu31)
- e15a3c6: fix(validator): source NCCL env from host profile instead of hardcoding (#422) (@xdu31)
- 70efe82: fix: ArgoCD deployer generates valid YAML, add structural validation (#410) (#413) (@lockwobr)
Other Tasks
- 84f3c4c: chore: bump nvsentinel from v0.10.x to v1.1.0 (#423) (@mchmarny)
- 75092d8: chore: deps: bump github.com/in-toto/attestation from 1.1.2 to 1.2.0 (#431) (@dependabot[bot])
- ea19bdf: chore: deps: bump github/codeql-action from 4.32.6 to 4.33.0 (#418) (@dependabot[bot])
- a10d4b3: chore: deps: bump google.golang.org/grpc from 1.79.2 to 1.79.3 (#430) (@dependabot[bot])
- 9e81d69: chore: deps: bump the kubernetes group with 3 updates (#446) (@dependabot[bot])
- f23ade5: chore: ignore movies (@mchmarny)
- d4e818f: ci(kwok): implement tiered testing strategy per ADR-003 (#432) (@mchmarny)
- 9101d29: ci: build and publish validator images on merge to main (#412) (@yuanchen8911)
- ff9c66d: docs(conformance): update CNCF evidence for multi-platform and training (#425) (@yuanchen8911)
- 5d4aa7c: docs(validator): add custom image testing and private registry guide (#417) (@xdu31)
v0.10.16
Immutable
release. Only release title and notes can be modified.
v0.10.15
v0.10.14
Immutable
release. Only release title and notes can be modified.
Changelog
Bug Fixes
- 23f2a02: fix(brew): escape backslashes in caveats for proper multiline display (#402) (@mchmarny)
- 7d79830: fix(bundler): clean up kai-resource-reservation namespace on undeploy (#394) (@yuanchen8911)
- 87cb118: fix(evidence): track check results at runtime instead of scanning directory (#396) (@yuanchen8911)
Other Tasks
- d3ff136: chore: deps: bump actions/stale from 10.1.1 to 10.2.0 (#400) (@dependabot[bot])
- 220ed15: chore: deps: bump actions/upload-pages-artifact from 3.0.1 to 4.0.0 (#399) (@dependabot[bot])
- e44b763: chore: deps: bump sigstore/cosign-installer from 4.0.0 to 4.1.0 (#398) (@dependabot[bot])
- c06950e: site: eliminate docs duplication with build-time sync (#385) (@tabern)
v0.10.13
Immutable
release. Only release title and notes can be modified.
Changelog
New Features
- d992630: feat(recipes): add GKE COS training overlays for H100 (#383) (@yuanchen8911)
Bug Fixes
- a5d501b: fix(bundler): skip components with overrides.enabled: false (#382) (@xdu31)
- 8550939: fix(install): cosign version grep fails silently due to pipefail (#384) (@lockwobr)
- d802b3d: fix(test): update offline e2e to skip disabled aws-ebs-csi-driver (@mchmarny)
- 9bb2c7b: fix(validator): remove helm-values check (Helm values stored in secrets, never available in snapshot) (#388) (@xdu31)
Other Tasks
v0.10.12
Immutable
release. Only release title and notes can be modified.
v0.10.11
Immutable
release. Only release title and notes can be modified.
Changelog
New Features
- 4267972: feat(bundler): add pre-flight checks to deploy.sh and post-flight to undeploy.sh (#364) (@yuanchen8911)
- 8312960: feat(validator): add Kubeflow Trainer to robust-controller and skip inference-gateway on training clusters (#349) (@yuanchen8911)
Bug Fixes
- 662809b: fix(ci): use root directory for github-actions dependabot scanning (@mchmarny)
- ca0551d: fix(recipe): bump NCCL all-reduce bandwidth threshold to 300 Gbps (#350) (@xdu31)
- 48e878b: fix(test): eliminate dead tests, non-deterministic skips, and flaky sleeps (@mchmarny)
- a9162f0: fix(validator): truncate long stdout lines to prevent oversized reports (#363) (@xdu31)
- 103f5b0: fix: replace magic duration literals with named constants from pkg/defaults (@mchmarny)
- 8945569: fix: wrap bare errors and check writable Close() returns (@mchmarny)
Other Tasks
- bb53543: chore(ci): bump actions/cache to v5.0.3 and goreleaser-action to v7.0.0 (@mchmarny)
- 4ea330f: chore: dep update (@mchmarny)
- 99a96f9: chore: deps: bump actions/download-artifact from 4.1.8 to 8.0.1 (#370) (@dependabot[bot])
- 208b836: chore: deps: bump actions/github-script from 7.0.1 to 8.0.0 (#376) (@dependabot[bot])
- a2364eb: chore: deps: bump actions/setup-go from 6.2.0 to 6.3.0 (#368) (@dependabot[bot])
- 204aafa: chore: deps: bump actions/setup-node from 4.4.0 to 6.3.0 (#372) (@dependabot[bot])
- 1d5a104: chore: deps: bump actions/upload-artifact from 6.0.0 to 7.0.0 (#369) (@dependabot[bot])
- 9481db5: chore: deps: bump aquasecurity/trivy-action from 0.34.1 to 0.35.0 (#367) (@dependabot[bot])
- 9057566: chore: deps: bump aws-actions/configure-aws-credentials from 5.1.1 to 6.0.0 (#371) (@dependabot[bot])
- 24e41ad: chore: deps: bump docker/build-push-action from 6.15.0 to 7.0.0 (#373) (@dependabot[bot])
- 091e497: chore: deps: bump docker/setup-buildx-action from 3.10.0 to 4.0.0 (#375) (@dependabot[bot])
- 77aade3: chore: deps: bump github/codeql-action from 4.32.0 to 4.32.6 (#374) (@dependabot[bot])
- 15584cd: chore: deps: update hashicorp/aws requirement from ~> 5.0 to ~> 6.36 in /infra/uat-aws-account (#366) (@dependabot[bot])
- 413b808: chore: ignore GHSA-67mh-4wv8-2f99 (esbuild) in grype scan (@mchmarny)
- eab65d9: core: image update (@mchmarny)
- 9c8b1af: docs(api): add missing bundle params and document CLI-only gaps (@mchmarny)
- c5a9115: docs(install): add Homebrew installation option (#357) (@mchmarny)
- 42e8ff5: docs(site): align Go version requirements to 1.26 (#362) (@yuanchen8911)
- 647d10b: site: migrate from Hugo/Docsy to VitePress (#360) (@tabern)
v0.10.10
Immutable
release. Only release title and notes can be modified.
v0.10.9
Immutable
release. Only release title and notes can be modified.