Skip to content

Add Free Disk Space step to E2E workflows#4759

Merged
volcano-sh-bot merged 2 commits intomasterfrom
copilot/fix-e2e-failure
Nov 28, 2025
Merged

Add Free Disk Space step to E2E workflows#4759
volcano-sh-bot merged 2 commits intomasterfrom
copilot/fix-e2e-failure

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 28, 2025

What type of PR is this?

CI/Build fix

What this PR does / why we need it:

E2E tests (TensorFlow, Ray plugin, etc.) are frequently failing due to disk space exhaustion on GitHub Actions runners. This adds the jlumbroso/free-disk-space action to all E2E workflows that were missing it.

Workflows updated:

  • e2e_sequence.yaml (TensorFlow/Ray tests)
  • e2e_parallel_jobs.yaml
  • e2e_admission.yaml (both jobs)
  • e2e_cronjob.yaml
  • e2e_dra.yml
  • e2e_hypernode.yaml
  • e2e_scheduling_actions.yaml
  • e2e_scheduling_basic.yaml
  • e2e_vcctl.yaml

e2e_spark.yaml already had this step and was used as reference.

Which issue(s) this PR fixes:

Fixes #4760

Special notes for your reviewer:

Configuration matches existing e2e_spark.yaml implementation.

Does this PR introduce a user-facing change?

NONE
Original prompt

This section details on the original issue you should resolve

<issue_title>E2E failure</issue_title>
<issue_description>These tests frequently failed. I have seem serveral times these days

TensorFlow E2E Test Will Start in pending state and goes through other phases to get complete phase
/home/runner/work/volcano/volcano/test/e2e/jobseq/tensorflow.go:35
  STEP: Initializing test context @ 11/26/25 12:00:48.395
  STEP: Creating Queues @ 11/26/25 12:00:48.597
  STEP: Cleaning up test context @ 11/26/25 12:05:52.001
  [FAILED] in [It] - /home/runner/work/volcano/volcano/test/e2e/jobseq/tensorflow.go:125 @ 11/26/25 12:06:34.405
• [FAILED] [346.010 seconds]
TensorFlow E2E Test [It] Will Start in pending state and goes through other phases to get complete phase
/home/runner/work/volcano/volcano/test/e2e/jobseq/tensorflow.go:35

  [FAILED] Unexpected error:
      <*errors.errorString | 0xc0004bfe10>: 
      [Wait time out]: expected job 'tensorflow-dist-mnist' phase in Running, actual got Pending
      {
          s: "[Wait time out]: expected job 'tensorflow-dist-mnist' phase in Running, actual got Pending",
      }
  occurred
  In [It] at: /home/runner/work/volcano/volcano/test/e2e/jobseq/tensorflow.go:125 @ 11/26/25 12:06:34.405
------------------------------
TensorFlow Plugin E2E Test Will Start in pending state and goes through other phases to get complete phase
/home/runner/work/volcano/volcano/test/e2e/jobseq/tensorflow_plugin.go:35
  STEP: Initializing test context @ 11/26/25 12:06:34.405
  STEP: Creating Queues @ 11/26/25 12:06:34.606
  STEP: Cleaning up test context @ 11/26/25 12:11:38.012
  [FAILED] in [It] - /home/runner/work/volcano/volcano/test/e2e/jobseq/tensorflow_plugin.go:124 @ 11/26/25 12:12:20.422
• [FAILED] [346.017 seconds]
TensorFlow Plugin E2E Test [It] Will Start in pending state and goes through other phases to get complete phase
/home/runner/work/volcano/volcano/test/e2e/jobseq/tensorflow_plugin.go:35

  [FAILED] Unexpected error:
      <*errors.errorString | 0xc0008f92b0>: 
      [Wait time out]: expected job 'tensorflow-dist-mnist' phase in Running, actual got Pending
      {
          s: "[Wait time out]: expected job 'tensorflow-dist-mnist' phase in Running, actual got Pending",
      }
  occurred
  In [It] at: /home/runner/work/volcano/volcano/test/e2e/jobseq/tensorflow_plugin.go:124 @ 11/26/25 12:12:20.422
------------------------------
Ray Plugin E2E Test Will Start in pending state and  get running phase

https://github.com/volcano-sh/volcano/actions/runs/19702531664/job/56442000769?pr=4753</issue_description>

Comments on the Issue (you are @copilot in this section)

@hzxuzhonghu It seems the disk is out of usage. We can free disk before running e2e with below action:
     - name: Free Disk Space
        uses: jlumbroso/free-disk-space@main
        with:
          tool-cache: false
          android: true
          dotnet: true
          haskell: true
          large-packages: true
          docker-images: true
          swap-storage: true
```</body></comment_new>
</comments>


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@volcano-sh-bot volcano-sh-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 28, 2025
Co-authored-by: hzxuzhonghu <13374016+hzxuzhonghu@users.noreply.github.com>
@volcano-sh-bot volcano-sh-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 28, 2025
Copilot AI changed the title [WIP] Fix E2E test failures in TensorFlow job Add Free Disk Space step to E2E workflows Nov 28, 2025
Copilot AI requested a review from hzxuzhonghu November 28, 2025 01:48
@hzxuzhonghu hzxuzhonghu marked this pull request as ready for review November 28, 2025 01:48
Copilot AI review requested due to automatic review settings November 28, 2025 01:48
@volcano-sh-bot volcano-sh-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 28, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses E2E test failures caused by disk space exhaustion on GitHub Actions runners by adding the jlumbroso/free-disk-space action to all E2E workflows that were previously missing it.

  • Adds Free Disk Space cleanup step to 9 E2E workflow files
  • Uses configuration matching the existing e2e_spark.yaml implementation
  • Places the cleanup step as the first step in each workflow to free disk space before other operations

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
.github/workflows/e2e_vcctl.yaml Adds disk space cleanup step before Volcano CLI E2E tests
.github/workflows/e2e_sequence.yaml Adds disk space cleanup step before TensorFlow/Ray sequence E2E tests
.github/workflows/e2e_scheduling_basic.yaml Adds disk space cleanup step before basic scheduling E2E tests
.github/workflows/e2e_scheduling_actions.yaml Adds disk space cleanup step before scheduling actions E2E tests
.github/workflows/e2e_parallel_jobs.yaml Adds disk space cleanup step before parallel jobs E2E tests
.github/workflows/e2e_hypernode.yaml Adds disk space cleanup step before hypernode E2E tests
.github/workflows/e2e_dra.yml Adds disk space cleanup step before DRA E2E tests
.github/workflows/e2e_cronjob.yaml Adds disk space cleanup step before cronjob E2E tests
.github/workflows/e2e_admission.yaml Adds disk space cleanup step to both admission policy and webhook E2E test jobs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hzxuzhonghu
Copy link
Copy Markdown
Member

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 28, 2025
@JesseStutler
Copy link
Copy Markdown
Member

/approve

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JesseStutler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2025
@volcano-sh-bot volcano-sh-bot merged commit 8482edf into master Nov 28, 2025
45 of 46 checks passed
@JesseStutler
Copy link
Copy Markdown
Member

/cherry-pick release-1.13

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: #4759 failed to apply on top of branch "release-1.13":

Patch is empty.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-1.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: new issue created for failed cherrypick: #4761

Details

In response to this:

/cherry-pick release-1.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copilot AI added a commit that referenced this pull request Nov 28, 2025
Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>
@JesseStutler
Copy link
Copy Markdown
Member

/cherry-pick release-1.12

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: #4759 failed to apply on top of branch "release-1.12":

Patch is empty.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: new issue created for failed cherrypick: #4769

Details

In response to this:

/cherry-pick release-1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JesseStutler
Copy link
Copy Markdown
Member

/cherrypick release-1.12

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: #4759 failed to apply on top of branch "release-1.12":

Patch is empty.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherrypick release-1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: new issue created for failed cherrypick: #4771

Details

In response to this:

/cherrypick release-1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

E2E failure

5 participants