Skip to content

add GPU DRA Support#138

Merged
eliranw merged 40 commits intomainfrom
eliranw/RUN-34332-add-dra
Dec 16, 2025
Merged

add GPU DRA Support#138
eliranw merged 40 commits intomainfrom
eliranw/RUN-34332-add-dra

Conversation

@eliranw
Copy link
Copy Markdown
Contributor

@eliranw eliranw commented Dec 7, 2025

  1. Added DRA gpu plugin component based on https://github.com/NVIDIA/k8s-dra-driver-gpu/
  2. Added helm chart new resources as well as enable/disable to all components
  3. Added unit testing + integration testing - currently only for dra-driver-gpu + setup/teardown scripts
  4. updated cicd/docker/makefiles for the new component
  5. updated nvidia-smi binary to try to utilize local env for topology before fallback to http

…gration setup

- Added integration test suite with setup and teardown scripts for Kubernetes cluster.
- Introduced new Makefile targets for integration testing: `setup-integration`, `test-integration`, and `teardown-integration`.
- Created `values.yaml` for configuration of GPU resources and device plugins.
- Updated device class name in deployment manifests to align with new naming conventions.
- Enhanced health check configurations for the DRA plugin.
- Removed deprecated deployment example file.
@eliranw eliranw changed the title add DRA Support add GPU DRA Support Dec 9, 2025
eliranw and others added 18 commits December 9, 2025 14:43
… directory and initialize checkpoints. Added checkpoint.json for managing prepared claims.
- Introduced a new 'integration' job in the CI workflow to run integration tests.
- Set up Go and Docker Buildx for the integration environment.
- Installed 'kind' for Kubernetes cluster management.
- Updated the release-docker job dependencies to include the new integration tests.
…app runner

- Simplified the main application logic by creating a new app runner in dra-plugin-gpu.
- Moved configuration and initialization logic into a dedicated DraPluginGpuApp struct.
- Enhanced environment variable handling for NODE_NAME in CDI specifications.
- Improved directory validation and creation processes.
- Updated tests to ensure NODE_NAME is correctly set in the CDI spec files.
…andard log package

- Removed usage of klog in favor of the standard log package for logging.
- Updated Shutdown methods to eliminate logger parameters.
- Adjusted logging statements throughout the codebase for consistency and clarity.
- Cleaned up go.mod and go.sum by removing unused dependencies.
@eliranw eliranw requested review from gshaibi and noaamran December 10, 2025 11:54
…ty. Updated comments to indicate that error scenarios are covered by integration tests and noted the need for dependency injection in certain tests.
… dependency on kubeletplugin.Helper and note that this is covered by integration tests.
@eliranw eliranw merged commit ce5d79c into main Dec 16, 2025
6 of 7 checks passed
@eliranw eliranw deleted the eliranw/RUN-34332-add-dra branch December 16, 2025 09:05
eliranw added a commit that referenced this pull request Dec 16, 2025
1. Added DRA gpu plugin component based on
https://github.com/NVIDIA/k8s-dra-driver-gpu/
2. Added helm chart new resources as well as enable/disable to all
components
3. Added unit testing + integration testing - currently only for
dra-driver-gpu + setup/teardown scripts
4. updated cicd/docker/makefiles for the new component
5. updated nvidia-smi binary to try to utilize local env for topology
before fallback to http
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants