Skip to content

gb200: update dockerfile to latest kernel#9522

Merged
zhyncs merged 6 commits intomainfrom
ishan/gb200a
Sep 9, 2025
Merged

gb200: update dockerfile to latest kernel#9522
zhyncs merged 6 commits intomainfrom
ishan/gb200a

Conversation

@ishandhanani
Copy link
Copy Markdown
Collaborator

@ishandhanani ishandhanani commented Aug 22, 2025

fp8 disagg working

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ishandhanani, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the maintainability and flexibility of the GB200 Docker image by updating its Dockerfile. The primary goal is to streamline the process of updating the SGL kernel, ensuring that the build environment can easily adapt to newer kernel releases without manual code changes. This improves the overall agility of the build system for the GB200 environment.

Highlights

  • New Build Argument: A new build argument, SGL_KERNEL_VERSION, has been introduced in the Dockerfile. This allows for the SGL kernel version to be specified and managed more easily.
  • Dynamic SGL Kernel Versioning: The sgl_kernel installation command within the Dockerfile has been updated to utilize the newly introduced SGL_KERNEL_VERSION argument. This change replaces a hardcoded version with a dynamic one, simplifying future updates to the SGL kernel.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Dockerfile to use a parameterized version for sgl_kernel and adjusts the installation order. The changes are a good improvement for maintainability. I've included one suggestion to simplify the shell logic within the RUN command by removing a redundant conditional check, which should improve readability.

Comment thread docker/Dockerfile.gb200
Comment on lines 65 to +68
&& if [ "$CUDA_VERSION" = "12.9.1" ]; then \
python3 -m pip install --no-cache-dir nvidia-nccl-cu12==2.27.6 --force-reinstall --no-deps ; \
python3 -m pip install --no-cache-dir https://github.com/sgl-project/whl/releases/download/v0.3.4/sgl_kernel-0.3.4+cu129-cp310-abi3-manylinux2014_$(uname -m).whl --force-reinstall --no-deps ; \
fi
python3 -m pip install --no-cache-dir https://github.com/sgl-project/whl/releases/download/v${SGL_KERNEL_VERSION}/sgl_kernel-${SGL_KERNEL_VERSION}+cu129-cp310-abi3-manylinux2014_$(uname -m).whl --force-reinstall --no-deps ; \
fi \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The case statement on lines 61-64 already ensures that the script will exit if $CUDA_VERSION is not 12.9.1. This makes the if condition on this line redundant. You can simplify the script by removing the if statement and promoting the commands inside it.

&& python3 -m pip install --no-cache-dir nvidia-nccl-cu12==2.27.6 --force-reinstall --no-deps \
&& python3 -m pip install --no-cache-dir https://github.com/sgl-project/whl/releases/download/v${SGL_KERNEL_VERSION}/sgl_kernel-${SGL_KERNEL_VERSION}+cu129-cp310-abi3-manylinux2014_$(uname -m).whl --force-reinstall --no-deps \

@yuhsuan-t
Copy link
Copy Markdown
Contributor

@ishandhanani is this planned to be merged soon?

@nWEIdia nWEIdia mentioned this pull request Sep 3, 2025
4 tasks
@zhyncs zhyncs merged commit 148022f into main Sep 9, 2025
19 checks passed
@zhyncs zhyncs deleted the ishan/gb200a branch September 9, 2025 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants