Add a workflow file for AWS#386
Conversation
📝 Walkthrough""" WalkthroughA new GitHub Actions workflow, "GPU Unit Tests on AWS," is introduced to automate running GPU-accelerated unit tests on AWS EC2 instances. The workflow provisions a GPU-enabled EC2 instance, executes remote tests via AWS SSM, collects artifacts from S3, and ensures resource cleanup by terminating the instance post-execution. Additionally, the pull request template was updated to remove a GPU testing disclaimer and clarify test requirements. Changes
Sequence Diagram(s)sequenceDiagram
participant GitHub Actions
participant AWS EC2
participant AWS SSM
participant AWS S3
GitHub Actions->>AWS EC2: Launch GPU-enabled EC2 instance
GitHub Actions->>AWS SSM: Send remote test script to EC2 instance
AWS SSM->>AWS EC2: Execute test script
AWS EC2->>AWS S3: Upload test artifacts and logs
GitHub Actions->>AWS S3: Download test artifacts and logs
GitHub Actions->>AWS EC2: Terminate EC2 instance (cleanup)
""" 📜 Recent review detailsConfiguration used: .coderabbit.yml 📥 CommitsReviewing files that changed from the base of the PR and between 1927051f0f4d78d428ebf7b3aeabbe4dab02534b and 6f831caef9592fb9b33f7bb66edbd5740873497e. 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
f0eac3e to
6e3c9e9
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
.github/workflows/aws-gpu-tests.yml (1)
238-255: Cleanup job may skip termination if the test job fails early.
needs.run-unit-tests-on-aws.outputs.instance_idis only populated when the whole job succeeds.
If any later step (artifact download, Codecov upload) fails, the output is blank → instance leaks.Safer pattern:
- needs: run-unit-tests-on-aws - if: github.repository == 'newton-physics/newton' && always() + needs: run-unit-tests-on-aws + if: always() # always run, independent of previous job resultand pass the ID via an artefact or workflow-level environment file (
$GITHUB_ENV) written in the launch step and picked up here:- if: needs.run-unit-tests-on-aws.outputs.instance_id != '' + env: + INSTANCE_ID: ${{ needs.run-unit-tests-on-aws.outputs.instance_id || env.INSTANCE_ID }} ... - aws ec2 terminate-instances --instance-ids ${{ needs.run-unit-tests-on-aws.outputs.instance_id }} + aws ec2 terminate-instances --instance-ids "$INSTANCE_ID"Ensures termination under every failure mode.
🧹 Nitpick comments (2)
.github/workflows/aws-gpu-tests.yml (2)
160-178: Trailing spaces break YAML-lint and make reviews noisy.Lines 166 and 174 carry stray spaces →
yamllinterror.
Quick fix:- echo "Current status: $STATUS. Waited $elapsed_time seconds..."␠ + echo "Current status: $STATUS. Waited $elapsed_time seconds..."[to delete the two offending spaces]
136-138:jqmunging a full script into a single array element is brittle.
AWS-RunShellScriptexpects an array of commands. Embedding the entire multi-line script as one string works only if the remote interpreter honours embedded newlines; quoting errors silently turn the script into a single long line.Simpler & safer:
- jq -n --arg script_body "$(cat remote_script.sh)" \ - '{ "commands": [$script_body] }' > ssm_params.json + aws ssm send-command \ + --instance-ids "$INSTANCE_ID" \ + --document-name "AWS-RunShellScript" \ + --comment "Running Newton GPU unit tests" \ + --parameters "commands=$(cat remote_script.sh | base64)" \ + --cli-binary-format raw-in-base64-out \ + ...(or upload the script to S3 and execute
bash /tmp/script.sh). Reduces JSON-escaping headaches.
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 11f31bc and 6e3c9e98eeeab50729e12ed4c36a98d99fd5211a.
📒 Files selected for processing (1)
.github/workflows/aws-gpu-tests.yml(1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/aws-gpu-tests.yml
75-75: "github.head_ref" is potentially untrusted. avoid using it directly in inline scripts. instead, pass it through an environment variable. see https://docs.github.com/en/actions/security-for-github-actions/security-guides/security-hardening-for-github-actions for more details
(expression)
🪛 YAMLlint (1.37.1)
.github/workflows/aws-gpu-tests.yml
[error] 166-166: trailing spaces
(trailing-spaces)
[error] 174-174: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-asv-benchmarks
🔇 Additional comments (1)
.github/workflows/aws-gpu-tests.yml (1)
56-66: Verify g6.xlarge availability inus-east-2& spot-price.
g6.xlarge(Ada-L4) isn’t yet GA in every region;us-east-2often lags.
A failed capacity reservation will abort the workflow after the 75-min timeout yet still incur runner cost.Consider:
- AWS_INSTANCE_TYPE: g6.xlarge + AWS_INSTANCE_TYPE: g5.xlarge # wider regional supportor add a fallback strategy (
ec2 run-instances --instance-type-override).
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
.github/workflows/aws-gpu-tests.yml (2)
10-11: AMI filter pattern is still fragile – refine or pin to a stable AMI.
Wildcard sequence????????is brittle; the comment from the previous review remains valid. Use a build-pattern that will not break on the next AMI release or pin to an explicitowner aliasinstead.
125-129: Shell-injection risk: branch name is still interpolated directly.
BRANCH_NAME="${{ github.head_ref || github.ref_name }}"passes attacker-controlled data straight into the remote shell. The earlier mitigation proposal (use a sanitized env var) hasn’t been applied.- BRANCH_NAME="${{ github.head_ref || github.ref_name }}" + BRANCH_NAME="${BRANCH_NAME_ENV}"and add at the step level:
env: BRANCH_NAME_ENV: ${{ github.event.pull_request.head.ref || github.ref_name }}
🧹 Nitpick comments (1)
.github/workflows/aws-gpu-tests.yml (1)
172-190: Trailing whitespace triggers YAML-lint failures.
Lines 178 and 186 have stray spaces →yamllinterror. Strip them to keep the CI green.
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 6e3c9e98eeeab50729e12ed4c36a98d99fd5211a and 97cf854473c8c4fc56cdfb7d069d9678019212ec.
📒 Files selected for processing (1)
.github/workflows/aws-gpu-tests.yml(1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/aws-gpu-tests.yml
87-87: "github.head_ref" is potentially untrusted. avoid using it directly in inline scripts. instead, pass it through an environment variable. see https://docs.github.com/en/actions/security-for-github-actions/security-guides/security-hardening-for-github-actions for more details
(expression)
🪛 YAMLlint (1.37.1)
.github/workflows/aws-gpu-tests.yml
[error] 178-178: trailing spaces
(trailing-spaces)
[error] 186-186: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-asv-benchmarks
🔇 Additional comments (1)
.github/workflows/aws-gpu-tests.yml (1)
262-267:needs.*.outputs.instance_idmay be undefined when launch fails.
If thelaunchstep aborts before setting its output, the expression will raise a runtime error and block cleanup. Initialiseinstance_idto an empty string on job-level defaults or guard withif: failure() && ….
0610dce to
9fedd24
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
.github/workflows/aws-gpu-tests.yml (2)
154-188:final_statusstill misses edge cases likeCancelledorDeliveryTimedOut.The catch-all branch sets
final_status="$STATUS"and breaks, butSTATUScan be values that are not actionable failures (Cancelled,TimedOut,Undelivered, …).
Recommend enumerating all success statuses and treat everything else as failure, e.g.if [[ "$STATUS" == "Success" || "$STATUS" == "Cancelled" ]]; then : elif [[ "$STATUS" == "InProgress" || "$STATUS" == "Pending" ]]; then ... else final_status="$STATUS" break fi(or invert the condition:
if [[ "$STATUS" != "InProgress" && "$STATUS" != "Pending" ]]; then … fi) to avoid silent masking.
160-179: Trailing whitespace flagged by YAMLlintYAMLlint reports trailing spaces on lines 166 and 179 – minor, but fixing avoids noisy lint failures.
- --output text)␠ + --output text) ... - echo "Current status: $STATUS. Waited $elapsed_time seconds..."␠ + echo "Current status: $STATUS. Waited $elapsed_time seconds..."
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 97cf854473c8c4fc56cdfb7d069d9678019212ec and 9fedd2483846d00ec6279c6012be7b4cb6a57033.
📒 Files selected for processing (1)
.github/workflows/aws-gpu-tests.yml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.github/workflows/aws-gpu-tests.yml
[error] 166-166: trailing spaces
(trailing-spaces)
[error] 179-179: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-asv-benchmarks
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
.github/workflows/aws-gpu-tests.yml (1)
88-96: Here-doc still unquoted → variables can expand on the runner sideThe delimiter
<< EOFpermits interpolation of all$FOO,$( … )and back-ticks before the file is even copied to the EC2 box. A malicious value that slipped pastsafe_branchor any future variable introduced in the runner’s env would be executed locally with full GitHub-host permissions.Quoting the delimiter eliminates this class of bugs with zero downside:
- cat << EOF > remote_script.sh + cat <<'EOF' > remote_script.shEverything that really must be substituted (e.g. S3 bucket, key) can be injected explicitly after the file is generated or passed via SSM parameters/environment.
Also applies to: 120-123
🧹 Nitpick comments (3)
.github/workflows/aws-gpu-tests.yml (3)
14-17:pull_request_target+ AWS role = high-risk surfaceRunning untrusted PR code with an assumed AWS role (even if scoped) can expose the AWS account. A safer pattern is:
• Trigger on
pull_request(not*_target) so the fork’s workflow file must reference the role – impossible.
• Or keeppull_request_targetbut checkout the merge commit instead of the attacker’s branch and pass--depth 1.Please re-evaluate the threat model before merging.
Also applies to: 30-32
74-76: Possible job timeout before SSM command finishesThe job has a 75-minute timeout but:
- EC2 boot (
wait instance-status-ok) can take >10 min on bursty regions.- The polling loop waits up to 60 min for SSM.
- Tests themselves may run 30 min on a cold GPU AMI.
A narrow margin (<5 min) remains. Either shorten
TIMEOUT, increasetimeout-minutes, or push polling into a separate job so cleanup never races the watchdog.Also applies to: 160-195
172-173: Trailing whitespace – YAMLlint errorLines 172 and 185 contain stray spaces that fail the default
yamllintruleset.- --output text)␠ + --output text) ... - final_status="$STATUS"␠ + final_status="$STATUS"Minor, but it will break pipelines that enforce linting.
Also applies to: 185-186
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 9fedd2483846d00ec6279c6012be7b4cb6a57033 and 335f8fe9ac08ea80251b08fd8eb6ae45c01d1973.
📒 Files selected for processing (1)
.github/workflows/aws-gpu-tests.yml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.github/workflows/aws-gpu-tests.yml
[error] 172-172: trailing spaces
(trailing-spaces)
[error] 185-185: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-asv-benchmarks
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/aws-gpu-tests.yml (1)
84-120: Remove double-quoting of an alreadyprintf %q-escaped branch name
printf '%q'wraps the value in single-quotes when needed.
Embedding that string again inside double-quotes (BRANCH_NAME="${safe_branch}") makes the quotes part of the value, so the remotegit clone --branchlooks for a ref literally called'feature-xyz'and fails.- safe_branch=$(printf '%q' "${BRANCH_NAME_ENV}") - ... - BRANCH_NAME="${safe_branch}" +safe_branch=$(printf '%q' -- "${BRANCH_NAME_ENV}") # keep existing escaping +# no extra quotes – value is already shell-safe +BRANCH_NAME=${safe_branch}
🧹 Nitpick comments (2)
.github/workflows/aws-gpu-tests.yml (2)
18-24: Cancellation can orphan EC2 instances – put termination in an independentcleanupjobWhen a newer run hits the same
concurrencygroup, GitHub cancels the entire previous job; step-levelif: always()hooks never fire, soaws ec2 terminate-instancesis skipped and the GPU instance leaks.Create a separate
cleanupjob that
- has
if: always()- is not in the concurrency group (omit
needs:or use a different group key)- fetches the
INSTANCE_IDartifact (or tags) and terminates the instance.This guarantees the teardown runs even after a workflow-level cancellation.
Also applies to: 247-255
164-186: Minor: trailing whitespace tripsyamllintStatic analysis already reports trailing spaces on lines 170 & 183.
Strip them to keep the workflow lint-clean.- while [ $elapsed_time -lt $TIMEOUT ]; do␠ + while [ $elapsed_time -lt $TIMEOUT ]; do ... - echo "Current status: $STATUS. Waited $elapsed_time seconds..."␠ + echo "Current status: $STATUS. Waited $elapsed_time seconds..."
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 335f8fe9ac08ea80251b08fd8eb6ae45c01d1973 and cb08cb4152f0737a5abe03b21960588ee8e91733.
📒 Files selected for processing (1)
.github/workflows/aws-gpu-tests.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
.github/workflows/aws-gpu-tests.yml (1)
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:86-87
Timestamp: 2025-07-12T17:40:10.078Z
Learning: Git does not allow spaces in branch names. Branch names with spaces will result in "is not a valid branch name" errors.
🪛 YAMLlint (1.37.1)
.github/workflows/aws-gpu-tests.yml
[error] 170-170: trailing spaces
(trailing-spaces)
[error] 183-183: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-asv-benchmarks
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (1)
.github/workflows/aws-gpu-tests.yml (1)
140-142:jqmay not be available on self-hosted runnersUbuntu-latest currently bundles
jq, but this isn’t guaranteed and self-hosted runners often omit it.
Add an explicit install (e.g.,sudo apt-get -y install jq) or useaws ssm send-command --parameters "commands=[\"$(<remote_script.sh | base64 -w0)\"]"to avoid the dependency.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
.github/workflows/aws-gpu-tests.yml (4)
90-96: Redundant re-quoting may leak literal quote characters into$BRANCH_NAME
printf '%q'already returns a shell-escaped token. Wrapping it again in double quotes (line 128) can embed the quotes themselves when no escaping is needed, yieldinggit clone --branch "'feature-x'" …. Consider dropping the extra double quotes:-SAFE_BRANCH=$(printf '%q' "${BRANCH_NAME_ENV}") +SAFE_BRANCH=$(printf '%q' -- "${BRANCH_NAME_ENV}") # -- guards against leading ‘-’ … -BRANCH_NAME="${SAFE_BRANCH}" +BRANCH_NAME=${SAFE_BRANCH}Minor, but removes one layer of quoting and avoids surprises.
50-61: AMI lookup is region-hard-coded – breaks in non-us-east-2regionsThe SSM parameter path is valid only where AWS publishes that specific DLAMI build. If the repository later sets
AWS_REGIONto another region (or contributors fork with a different default),get-parameterreturns ParameterNotFound and the job aborts.Recommend either:
- Pin the AMI ID in the workflow inputs (cheap, deterministic), or
- Drive the parameter path from the region:
PARAM="/aws/service/deeplearning/ami/x86_64/base-oss-nvidia-driver-gpu-ubuntu-24.04/${{ env.AWS_REGION }}/latest/ami-id" LATEST_AMI_ID=$(aws ssm get-parameter --region "${{ env.AWS_REGION }}" \ --name "$PARAM" --query 'Parameter.Value' --output text)
150-152: Risk of exceeding SSM 16 KB payload limit
$(cat remote_script.sh)inlines the entire script into a single JSON string. Once the script grows beyond ~16 KB thesend-commandcall is rejected.Mitigations:
- gzip → base64 the script and decode on the instance, or
- store the script in S3 and reference it via
--document-version/--parameters sourceType=S3 ….Nice-to-have now, but future-proofs the workflow.
173-196: Trailing whitespace flagged by yamllintLines 179 and 192 contain stray spaces that yamllint reports. They’re inside a shell heredoc, so harmless at runtime, but cleaning them avoids noisy CI lint failures.
- echo "Current status: $STATUS. Waited $elapsed_time seconds..."␠ + echo "Current status: $STATUS. Waited $elapsed_time seconds..."
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between cb08cb4152f0737a5abe03b21960588ee8e91733 and 135909831a85af63b768a637d98e443f749e4523.
📒 Files selected for processing (2)
.github/PULL_REQUEST_TEMPLATE.md(1 hunks).github/workflows/aws-gpu-tests.yml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .github/PULL_REQUEST_TEMPLATE.md
🧰 Additional context used
🧠 Learnings (1)
.github/workflows/aws-gpu-tests.yml (1)
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:86-87
Timestamp: 2025-07-12T17:40:10.078Z
Learning: Git does not allow spaces in branch names. Branch names with spaces will result in "is not a valid branch name" errors.
🪛 YAMLlint (1.37.1)
.github/workflows/aws-gpu-tests.yml
[error] 179-179: trailing spaces
(trailing-spaces)
[error] 192-192: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-asv-benchmarks
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.github/workflows/aws-gpu-tests.yml (1)
182-195: Trailing whitespace – failsyamllint
yamllintflags lines 182 and 195 for trailing spaces.
They are inside a bash heredoc, so removal is safe and keeps the CI green.- +
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 135909831a85af63b768a637d98e443f749e4523 and 8a29822a92ef09c10c694cb2b50145eab4711285.
📒 Files selected for processing (1)
.github/workflows/aws-gpu-tests.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
.github/workflows/aws-gpu-tests.yml (1)
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:86-87
Timestamp: 2025-07-12T17:40:10.078Z
Learning: Git does not allow spaces in branch names. Branch names with spaces will result in "is not a valid branch name" errors.
🪛 YAMLlint (1.37.1)
.github/workflows/aws-gpu-tests.yml
[error] 182-182: trailing spaces
(trailing-spaces)
[error] 195-195: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-asv-benchmarks
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.github/workflows/aws-gpu-tests.yml (1)
126-129: Forked-PR code is never executed – clone URL & ref still point at base repository
REPO_URLandBRANCH_NAMEare hard-wired to the target repo / branch.
When somebody opens a PR from a fork, the branch of the contributor does not exist ingithub.repository; the workflow therefore tests stale code and can report false-green.-REPO_URL="https://github.com/${{ github.repository }}.git" -BRANCH_NAME="${SAFE_BRANCH}" +if [[ "${{ github.event.pull_request.head.repo.full_name }}" != "${{ github.repository }}" ]]; then + # External fork – test the exact commit under review + REPO_URL="${{ github.event.pull_request.head.repo.clone_url }}" + BRANCH_NAME="${{ github.event.pull_request.head.sha }}" +else + REPO_URL="https://github.com/${{ github.repository }}.git" + BRANCH_NAME="${SAFE_BRANCH}" +fiWithout this, GPU tests give a misleading sense of safety for every external contribution.
🧹 Nitpick comments (2)
.github/workflows/aws-gpu-tests.yml (2)
90-96:printf %qadds complexity for little gain
SAFE_BRANCH=$(printf '%q' …)produces shell-escaped fragments like'feature/xyz'.
Because you later embed the value in an assignment (BRANCH_NAME=${SAFE_BRANCH}) the extra quoting is stripped anyway for normal branch names, while exotic names (:~etc.) are already rejected by Git. Dropping this makes the script easier to read:-SAFE_BRANCH=$(printf '%q' "${BRANCH_NAME_ENV}") +SAFE_BRANCH="${BRANCH_NAME_ENV}"
180-195: Trailing whitespace flagged by yamllintLines 181 and 194 contain stray spaces which fail the lint step. Quick tidy-up:
- --output text)␠ + --output text) … - echo "SSM command failed or returned unexpected status: $STATUS"␠ + echo "SSM command failed or returned unexpected status: $STATUS"Pure style, but keeps CI green.
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 8a29822a92ef09c10c694cb2b50145eab4711285 and 2739be88c83fc8c7753d8c2d7ecea6450d02dcf4.
📒 Files selected for processing (1)
.github/workflows/aws-gpu-tests.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
.github/workflows/aws-gpu-tests.yml (1)
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:86-87
Timestamp: 2025-07-12T17:40:10.078Z
Learning: Git does not allow spaces in branch names. Branch names with spaces will result in "is not a valid branch name" errors.
🪛 YAMLlint (1.37.1)
.github/workflows/aws-gpu-tests.yml
[error] 181-181: trailing spaces
(trailing-spaces)
[error] 194-194: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-asv-benchmarks
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
.github/workflows/aws-gpu-tests.yml (4)
100-113:printf %q+ double-quoting still injects quotes into the remote variable
SAFE_BRANCHmay already be quoted ('feature/xyz').
Assigning it again inside double quotesBRANCH_NAME="${SAFE_BRANCH}"preserves the literal single quotes and yields a branch called
'feature/xyz'on the remote host.git clone --branchwill fail.Either:
-BRANCH_NAME="${SAFE_BRANCH}" +BRANCH_NAME=${SAFE_BRANCH} # no extra layer of quotingor drop
printf %qaltogether and rely on Git’s own ref rules.
165-167: SSM payload can exceed the 16 KB limit – externalise the script
send-commandembeds the entireremote_script.shas a single JSON element.
SSM RunShellScript rejects payloads > 16 KB withValidationException, and this script is already sizeable.Recommended pattern:
aws s3 cp remote_script.sh s3://$S3_BUCKET/$S3_KEY/remote_script.sh- Send a tiny SSM command:
aws s3 cp … && bash remote_script.shKeeps the payload small and future-proofs the workflow.
275-283: Instance-leak on concurrency cancellation persistsThe termination step lives inside the same job that can be cancelled by
concurrency.cancel-in-progress.
When a new push arrives, the running job is aborted before reaching this
step and the GPU instance keeps running.Move the cleanup to a separate
cleanupjob withif: always()and noneeds
dependency, pass theINSTANCE_IDvia an output or artifact, and terminate the
instance there.
144-148: Run the tests against the PR’s code, not the base repositoryFor PRs coming from forks
https://github.com/${{ github.repository }}.gitdoes not contain the contributor’s commits.
Result: the workflow happily green-lights stale code.Quick fix:
-REPO_URL="https://github.com/${{ github.repository }}.git" -BRANCH_NAME="${SAFE_BRANCH}" +if [[ -n "${{ github.event.pull_request.head.repo.clone_url }}" ]]; then + REPO_URL="${{ github.event.pull_request.head.repo.clone_url }}" + BRANCH_NAME="${{ github.event.pull_request.head.ref }}" +else + REPO_URL="https://github.com/${{ github.repository }}.git" + BRANCH_NAME=${SAFE_BRANCH} +fiThis clones the exact commit under review while still covering internal pushes.
🧹 Nitpick comments (1)
.github/workflows/aws-gpu-tests.yml (1)
73-75: Avoid embedding literal single-quotes in theGitHub-Repositorytag
TAGS="$TAGS,{Key=GitHub-Repository,Value='${{ github.repository }}'}"
The inner single quotes become part of the tag value ('newton-physics/newton').
Drop them:-TAGS="$TAGS,{Key=GitHub-Repository,Value='${{ github.repository }}'}" +TAGS="$TAGS,{Key=GitHub-Repository,Value=${{ github.repository }}}"Minor, but keeps tag values clean and easier to query.
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 2739be88c83fc8c7753d8c2d7ecea6450d02dcf4 and 234f57697bd408609ffcd11d534201fb494a722d.
📒 Files selected for processing (1)
.github/workflows/aws-gpu-tests.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
.github/workflows/aws-gpu-tests.yml (1)
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:86-87
Timestamp: 2025-07-12T17:40:10.078Z
Learning: Git does not allow spaces in branch names. Branch names with spaces will result in "is not a valid branch name" errors.
🪛 YAMLlint (1.37.1)
.github/workflows/aws-gpu-tests.yml
[error] 199-199: trailing spaces
(trailing-spaces)
[error] 212-212: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-asv-benchmarks
5f5dea8 to
6032b20
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
.github/PULL_REQUEST_TEMPLATE.md (1)
25-25: Checklist item is clear – consider explicitly mentioning GPU testsYou removed the outdated “GitHub does not perform GPU testing” disclaimer, which is great.
Since GPU tests are now executed by the new AWS workflow, you might add a short hint such as “GPU tests pass in CI” so contributors are aware this box covers both CPU and GPU paths..github/workflows/aws-gpu-tests.yml (2)
63-72: Tag value contains literal single-quotes
GitHub-Repositoryis assembled as
Value='${{ github.repository }}'which results in the tag value literally including the'characters ('newton-physics/newton').
AWS accepts it, but it bloats searches/filters and looks odd in the console.-TAGS="$TAGS,{Key=GitHub-Repository,Value='${{ github.repository }}'}" +TAGS="$TAGS,{Key=GitHub-Repository,Value=${{ github.repository }}}"
108-112: Harden the remote script withset -u
set -eandset -o pipefailare enabled, but missingset -umeans unset variables silently expand to empty strings (e.g., if$PR_NUMBERor$S3_BUCKETwere ever unset).
Adding it improves safety with negligible risk.-set -e -set -o pipefail +set -euo pipefail
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 11f31bc and 6032b20c1fa1a571684378d7de9028bf39312672.
📒 Files selected for processing (2)
.github/PULL_REQUEST_TEMPLATE.md(1 hunks).github/workflows/aws-gpu-tests.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:144-156
Timestamp: 2025-07-12T19:46:06.538Z
Learning: In the newton-physics/newton repository workflow discussions, the maintainer shi-eric prefers that forked PR testing behavior is intentionally designed to fail rather than test untrusted code from forks on AWS infrastructure.
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:65-82
Timestamp: 2025-07-12T19:46:47.510Z
Learning: In the newton-physics/newton repository AWS GPU workflow, the workflow only triggers on `pull_request_target` and `push` events, not on `pull_request` events. The `pull_request_target` event is used for testing PRs from forks safely on AWS infrastructure.
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:275-283
Timestamp: 2025-07-12T19:45:04.640Z
Learning: In GitHub Actions, the `if: always()` condition runs even when a job is cancelled due to concurrency settings with `cancel-in-progress: true`. This means cleanup steps with `if: always()` will execute properly to prevent resource leaks, even if the main workflow is cancelled by a newer run.
.github/PULL_REQUEST_TEMPLATE.md (2)
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:65-82
Timestamp: 2025-07-12T19:46:47.510Z
Learning: In the newton-physics/newton repository AWS GPU workflow, the workflow only triggers on `pull_request_target` and `push` events, not on `pull_request` events. The `pull_request_target` event is used for testing PRs from forks safely on AWS infrastructure.
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:144-156
Timestamp: 2025-07-12T19:46:06.538Z
Learning: In the newton-physics/newton repository workflow discussions, the maintainer shi-eric prefers that forked PR testing behavior is intentionally designed to fail rather than test untrusted code from forks on AWS infrastructure.
.github/workflows/aws-gpu-tests.yml (6)
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:65-82
Timestamp: 2025-07-12T19:46:47.510Z
Learning: In the newton-physics/newton repository AWS GPU workflow, the workflow only triggers on `pull_request_target` and `push` events, not on `pull_request` events. The `pull_request_target` event is used for testing PRs from forks safely on AWS infrastructure.
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:144-156
Timestamp: 2025-07-12T19:46:06.538Z
Learning: In the newton-physics/newton repository workflow discussions, the maintainer shi-eric prefers that forked PR testing behavior is intentionally designed to fail rather than test untrusted code from forks on AWS infrastructure.
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:275-283
Timestamp: 2025-07-12T19:45:04.640Z
Learning: In GitHub Actions, steps with `if: always()` continue to run even when the workflow is canceled via `concurrency.cancel-in-progress: true`. GitHub Actions re-evaluates if conditions during cancellation, and since `if: always()` always evaluates to true, those steps are not canceled and will continue to execute. This means cleanup steps with `if: always()` will run properly to prevent resource leaks even when workflows are cancelled due to concurrency limits.
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:275-283
Timestamp: 2025-07-12T19:45:04.640Z
Learning: In GitHub Actions, the `if: always()` condition runs even when a job is cancelled due to concurrency settings with `cancel-in-progress: true`. This means cleanup steps with `if: always()` will execute properly to prevent resource leaks, even if the main workflow is cancelled by a newer run.
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:0-0
Timestamp: 2025-07-12T19:46:37.456Z
Learning: For AWS SSM send-command payload limits, always measure the actual script size before suggesting optimizations. The 16KB limit is rarely hit by typical GitHub Actions workflow scripts, which are usually only a few KB in size.
Learnt from: shi-eric
PR: newton-physics/newton#386
File: .github/workflows/aws-gpu-tests.yml:86-87
Timestamp: 2025-07-12T17:40:10.078Z
Learning: Git does not allow spaces in branch names. Branch names with spaces will result in "is not a valid branch name" errors.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
6032b20 to
1927051
Compare
Signed-off-by: Eric Shi <ershi@nvidia.com>
6f831ca to
e7b5535
Compare
Signed-off-by: Eric Shi <ershi@nvidia.com>
Signed-off-by: Eric Shi <ershi@nvidia.com>
# Description Adds license files for additional 3rd party OSS dependencies added to Isaac Lab. ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task -->
Signed-off-by: Eric Shi <ershi@nvidia.com>
Description
Add a workflow for testing Newton on AWS EC2 instances.
Newton Migration Guide
Please ensure the migration guide for warp.sim users is up-to-date with the changes made in this MR.
docs/migration.rstis up-to dateBefore your PR is "Ready for review"
pre-commit run -aSummary by CodeRabbit