Skip to content

feat: introduce GitHub Actions workflow for PR testing#2326

Merged
yzh119 merged 24 commits intoflashinfer-ai:mainfrom
yongwww:public_cicd
Jan 18, 2026
Merged

feat: introduce GitHub Actions workflow for PR testing#2326
yzh119 merged 24 commits intoflashinfer-ai:mainfrom
yongwww:public_cicd

Conversation

@yongwww
Copy link
Copy Markdown
Member

@yongwww yongwww commented Jan 10, 2026

📌 Description

#2355

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Co-Authored-By: Kevin Vincent kvincent@nvidia.com

Reviewer Notes

Summary by CodeRabbit

  • Chores
    • Added a PR pipeline that runs multi-architecture AOT build/imports and segmented GPU/CPU test shards, with conditional skips for doc/config-only changes or explicit skip inputs; ensures always-run cleanup and diagnostics and enforces overall CI status.
  • Tests
    • Added matrixed AOT and GPU test runs across architectures and CUDA versions with sharded GPU job execution and consolidated result aggregation into a single test summary.

✏️ Tip: You can customize this high-level summary in your review settings.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 10, 2026

📝 Walkthrough

Walkthrough

Adds a new GitHub Actions workflow .github/workflows/pr-test.yml named "PR Test". Triggers on push/PR/workflow_dispatch, reads ci/docker-tags.yml, computes skip_build, conditionally runs AOT build matrix (x86_64/aarch64 × CUDA versions) and sharded GPU JIT tests (A10G 5 shards, T4 1 shard), then aggregates results in a CI summary.

Changes

Cohort / File(s) Change Summary
GitHub Actions CI Workflow
​.github/workflows/pr-test.yml
New "PR Test" workflow: triggers on push/PR/workflow_dispatch (with skip_aot/skip_gpu inputs); Setup job reads ci/docker-tags.yml, sets docker tag output and skip_build based on changed files/inputs; adds AOT build matrix (x86_64/aarch64 × cu126/cu128/cu129/cu130), GPU JIT tests for A10G (SM86, 5 shards) and T4 (SM75, 1 shard) with runner labels; includes GPU cleanup/diagnostics and a Test Results Summary job that aggregates statuses and fails CI if enforced jobs fail.

Sequence Diagram(s)

sequenceDiagram
  participant Trigger as GitHub Trigger
  participant Setup as Setup Job
  participant AOT as AOT Build Jobs
  participant GPU_A10 as GPU JIT A10G Jobs
  participant GPU_T4 as GPU JIT T4 Job
  participant Summary as CI Summary Job

  Trigger->>Setup: start workflow (push/PR/dispatch)
  Setup->>Setup: read `ci/docker-tags.yml`, compute outputs (`docker_tag`, `skip_build`)
  alt AOT not skipped
    Setup->>AOT: start matrix (x86_64/aarch64 × CUDA versions)
    AOT-->>Setup: return results/artifacts
  end
  alt GPU not skipped
    Setup->>GPU_A10: dispatch 5 shards (A10G, SM86)
    Setup->>GPU_T4: dispatch 1 shard (T4, SM75)
    GPU_A10-->>Setup: shard results
    GPU_T4-->>Setup: result
  end
  Setup->>Summary: provide outputs/status
  AOT->>Summary: report status
  GPU_A10->>Summary: report status
  GPU_T4->>Summary: report status
  Summary->>Summary: aggregate results, write GitHub Step Summary, set final workflow status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • bkryu
  • jimmyzho
  • nvmbreughe

Poem

🐇 I sniffed a Docker tag so sweet,

spun AOT builds on nimble feet,
five A10G hops and one T4 song,
I pruned the cache and danced along,
CI carrots hop us all along 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete and lacks substantive information about the changes. It only references issue #2355 without explanation and does not describe what the GitHub Actions workflow does or why it's needed. Add a detailed description of the workflow's purpose, the types of tests it runs (AOT builds, GPU tests), the architectures/CUDA versions tested, and why these CI changes are necessary for the project.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title 'feat: introduce GitHub Actions workflow for PR testing' accurately describes the main change - adding a new GitHub Actions workflow file (.github/workflows/pr-test.yml) for PR testing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yongwww yongwww changed the title [DO NOT MERGE/REVIEW][WIP] feat: add GitHub Actions CI workflow for AWS self-hosted runners [DO NOT MERGE/REVIEW][WIP] feat: add GitHub Actions CI workflow for self-hosted spot runners Jan 10, 2026
@yongwww yongwww changed the title [DO NOT MERGE/REVIEW][WIP] feat: add GitHub Actions CI workflow for self-hosted spot runners feat: add GitHub Actions CI workflow for self-hosted spot runners Jan 16, 2026
@yongwww yongwww marked this pull request as ready for review January 16, 2026 23:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @.github/workflows/pr-test.yml:
- Around line 64-65: The TODO indicates .github/ was temporarily removed from
SKIP_PATTERNS; update the SKIP_PATTERNS declaration: either restore the
^\.github/ pattern into the SKIP_PATTERNS value (so that files under .github are
skipped) or remove the TODO comment and add a short inline comment documenting
the chosen final policy for .github/* (state whether CI should include or
exclude it), ensuring you modify the SKIP_PATTERNS variable and the accompanying
comment consistently.
- Around line 227-235: The summary currently treats only "failure" as a failing
job; update the final conditional that checks needs.aot-build-import.result,
needs.gpu-tests-a10g.result, and needs.gpu-tests-t4.result so any result not
equal to "success" (e.g., "cancelled" or "skipped") is considered a failure
unless an explicit skip flag is set; change the checks from == "failure" to !=
"success" and add exceptions for the explicit skip markers (e.g., skip_aot,
skip_gpu, skip_build or equivalent inputs) so that skipped jobs caused by those
flags do not cause the summary to fail.

Comment on lines +64 to +65
# TODO (yongwww): Add back ^\.github/ before merging to main
SKIP_PATTERNS="README.md|^docs/|^docker/|^licenses/|^LICENSE$|^NOTICE$|^version\.txt$"
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve the TODO before merge.
Line 64 notes that .github/ should be re-added to SKIP_PATTERNS. Please either restore it now (if intended) or remove the TODO and document the final policy to avoid forgetting.

🤖 Prompt for AI Agents
In @.github/workflows/pr-test.yml around lines 64 - 65, The TODO indicates
.github/ was temporarily removed from SKIP_PATTERNS; update the SKIP_PATTERNS
declaration: either restore the ^\.github/ pattern into the SKIP_PATTERNS value
(so that files under .github are skipped) or remove the TODO comment and add a
short inline comment documenting the chosen final policy for .github/* (state
whether CI should include or exclude it), ensuring you modify the SKIP_PATTERNS
variable and the accompanying comment consistently.

Copy link
Copy Markdown
Member Author

@yongwww yongwww Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep the todo here, because we are testing the workflow. We can remove it in follow-up prs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongwww, sounds good! That makes sense while you're testing the workflow.

Would you like me to create a follow-up issue to track removing the TODO and finalizing the SKIP_PATTERNS policy for .github/ files? This way it won't be forgotten once testing is complete.

Comment thread .github/workflows/pr-test.yml Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/pr-test.yml:
- Around line 48-53: The "Get Docker Tag" step (id: get-tag) currently extracts
TAG from ci/docker-tags.yml but doesn't validate results; update the step to
first check that ci/docker-tags.yml exists and is readable, then run the
grep/cut pipeline to set TAG, and after extraction verify TAG is non-empty and
matches the expected pattern; if the file is missing or TAG is empty/invalid,
print a clear error message (including the file/path and pattern) and exit
non‑zero so the workflow fails fast instead of propagating an empty TAG to
downstream jobs.
♻️ Duplicate comments (1)
.github/workflows/pr-test.yml (1)

64-65: Acknowledged: TODO retained for workflow testing.

Per prior discussion, the TODO is intentionally kept while testing the workflow. Consider tracking this in an issue so it's not forgotten post-testing.

🧹 Nitpick comments (1)
.github/workflows/pr-test.yml (1)

88-92: Custom runner labels are expected for self-hosted runners.

The actionlint warnings about cpu, X64, ARM64 labels are false positives. These are custom labels for your self-hosted runner fleet. Consider adding an actionlint.yaml config to suppress these warnings:

# actionlint.yaml
self-hosted-runner:
  labels:
    - cpu
    - gpu
    - sm75
    - sm86
    - X64
    - ARM64

Comment thread .github/workflows/pr-test.yml
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/pr-test.yml:
- Around line 58-63: Replace the unreliable git diff that uses origin/${{
github.base_ref }}...HEAD when computing CHANGED; update the CHANGED computation
to use the PR event SHAs (github.event.pull_request.base.sha and
github.event.pull_request.head.sha) for the diff range so the workflow uses the
pull_request base and head commit SHAs instead of relying on origin refs, or
alternatively explicitly fetch the base branch before diffing if you need to
compare against the up-to-date base branch; adjust the CHANGED assignment (the
git diff invocation) accordingly to reference these PR SHAs.
♻️ Duplicate comments (1)
.github/workflows/pr-test.yml (1)

48-53: Validate Docker tag extraction to avoid empty image tags.
If the pattern misses or the file is missing, TAG becomes empty and later Docker pulls fail with unclear errors—fail fast with a guard.

🛠️ Suggested hardening
       - name: Get Docker Tag
         id: get-tag
         run: |
-          TAG=$(grep 'flashinfer/flashinfer-ci-cu129:' ci/docker-tags.yml | cut -d':' -f2 | tr -d ' ')
+          if [ ! -f ci/docker-tags.yml ]; then
+            echo "::error::Missing ci/docker-tags.yml"
+            exit 1
+          fi
+          TAG=$(grep 'flashinfer/flashinfer-ci-cu129:' ci/docker-tags.yml | cut -d':' -f2 | tr -d ' ')
+          if [ -z "$TAG" ]; then
+            echo "::error::Failed to extract Docker tag from ci/docker-tags.yml"
+            exit 1
+          fi
           echo "tag=$TAG" >> $GITHUB_OUTPUT
           echo "Docker tag: $TAG"
🧹 Nitpick comments (1)
.github/workflows/pr-test.yml (1)

7-11: Verify self‑hosted runner isolation for PRs.
pull_request executes untrusted code on self‑hosted runners; ensure runners are ephemeral/isolated and have no secrets or broad network access, or restrict PR runs to trusted contributors.

Also applies to: 88-92

Comment thread .github/workflows/pr-test.yml Outdated
@yongwww yongwww changed the title feat: add GitHub Actions CI workflow for self-hosted spot runners feat: introduce GitHub Actions workflow for PR testing Jan 17, 2026
Copy link
Copy Markdown
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this feature and let's move forward with github actions, follow up work @yongwww :

  1. update the skip conditions.
  2. pr-test should run after pre-commits passed.
  3. remove jenkins workflow.

HEAD_SHA="${{ github.event.pull_request.head.sha }}"
CHANGED=$(git diff --name-only "$BASE_SHA...$HEAD_SHA")
# TODO (yongwww): Add back ^\.github/ before merging to main
SKIP_PATTERNS="README.md|^docs/|^docker/|^licenses/|^LICENSE$|^NOTICE$|^version\.txt$"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a script file to determine skip patterns (under scripts/).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants