[CI] Fix AMD CI by inlining dummy_grok config#18044
[CI] Fix AMD CI by inlining dummy_grok config#18044Kangyan-Zhou merged 1 commit intosgl-project:mainfrom
Conversation
Summary of ChangesHello @sunxxuns, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a recurring issue in the AMD continuous integration pipeline where jobs were failing due to an inability to access a necessary configuration file hosted on Azure blob storage. The change stabilizes the CI by embedding the configuration directly into the installation script, thereby removing a brittle external dependency and ensuring the reliable execution of AMD CI jobs. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request resolves a CI failure in AMD jobs by inlining a dummy grok configuration file that was previously failing to download. The approach of using a heredoc is a direct and effective fix. I have one suggestion to refine the implementation by piping the configuration directly into the Docker container, which would make the script slightly cleaner by avoiding the creation of temporary files on the host.
| mkdir -p dummy-grok | ||
| cat > dummy-grok/config.json << 'EOF' | ||
| { | ||
| "architectures": [ | ||
| "Grok1ModelForCausalLM" | ||
| ], | ||
| "embedding_multiplier_scale": 78.38367176906169, | ||
| "output_multiplier_scale": 0.5773502691896257, | ||
| "vocab_size": 131072, | ||
| "hidden_size": 6144, | ||
| "intermediate_size": 32768, | ||
| "max_position_embeddings": 8192, | ||
| "num_experts_per_tok": 2, | ||
| "num_local_experts": 8, | ||
| "num_attention_heads": 48, | ||
| "num_hidden_layers": 64, | ||
| "num_key_value_heads": 8, | ||
| "head_dim": 128, | ||
| "rms_norm_eps": 1e-05, | ||
| "rope_theta": 10000.0, | ||
| "model_type": "mixtral", | ||
| "torch_dtype": "bfloat16" | ||
| } | ||
| EOF |
There was a problem hiding this comment.
Instead of creating a temporary directory and file on the host and then copying them to the container, you can pipe the configuration directly into a file inside the container. This approach is cleaner as it avoids creating temporary artifacts on the host.
Important: If you apply this suggestion, you must also manually remove the docker cp ./dummy-grok ci_sglang:/ command on line 152, as it will no longer be needed and will cause an error.
cat << 'EOF' | docker exec -i ci_sglang sh -c 'cat > /dummy-grok/config.json'
{
"architectures": [
"Grok1ModelForCausalLM"
],
"embedding_multiplier_scale": 78.38367176906169,
"output_multiplier_scale": 0.5773502691896257,
"vocab_size": 131072,
"hidden_size": 6144,
"intermediate_size": 32768,
"max_position_embeddings": 8192,
"num_experts_per_tok": 2,
"num_local_experts": 8,
"num_attention_heads": 48,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"head_dim": 128,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"model_type": "mixtral",
"torch_dtype": "bfloat16"
}
EOFThe Azure blob storage endpoint (sharkpublic.blob.core.windows.net) is returning 403 Forbidden errors, causing all AMD CI jobs to fail at the "Install dependencies" step. This fix removes the wget download and creates the config.json inline, eliminating the dependency on external blob storage. Co-authored-by: Cursor <cursoragent@cursor.com>
7bfc49d to
7e0db9d
Compare
Co-authored-by: root <root@mi300x8-005.atl1.do.cpe.ice.amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: root <root@mi300x8-005.atl1.do.cpe.ice.amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: root <root@mi300x8-005.atl1.do.cpe.ice.amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Motivation
The Azure blob storage endpoint (
sharkpublic.blob.core.windows.net) is returning 403 Forbidden errors, causing all AMD CI jobs to fail at the "Install dependencies" step:This is affecting multiple PRs including #18026.
Modifications
wgetdownload from Azure blob storageconfig.jsoninline using a heredoc3rdparty/amd/profiling/PROFILING.md)Checklist
Made with Cursor