Severe slowdown in k8s-novolume hooks due to _temp scan and cross-pod sync with actions/setup-go
Related issues / PRs and discussions
Checks
Controller Version
- Controller: actions-runner-controller (gha-runner-scale-set API group)
- Version: 0.13.0
- Mode:
kubernetes with type: novolume
- Hooks:
k8s-novolume 0.13.0
ACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.js
Deployment Method
- Helm, using the officially provided
gha-runner-scale-set chart.
Environment
- Controller: actions-runner-controller
- Runner type:
kubernetes with type: novolume
- Container hooks:
ACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.js
- Hooks version:
k8s-novolume 0.13.0
- Filesystem in pods:
overlay filesystem, no PVC, ephemeral storage only
- Language/stack: Go, using:
actions/checkout@v4
actions/setup-go@v5
go test + coverage
actions/upload-artifact@v4
dorny/test-reporter@v2
There are two pods involved per job:
- Runner pod (GitHub Runner)
- Workflow pod (where the job actually runs,
/__w/...)
To Reproduce
-
Deploy gha-runner-scale-set with:
- Controller version:
0.13.0
- Runner type:
kubernetes with type: novolume
ACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.js
k8s-novolume hooks version: 0.13.0
- Overlay filesystem, no PVC, only ephemeral storage.
-
Create a workflow that:
- Uses
actions/checkout@v4
- Uses
actions/setup-go@v5 with go-version-file pointing to go.mod
- Runs
go test with coverage
- Uploads artifacts with
actions/upload-artifact@v4
- Publishes JUnit results with
dorny/test-reporter@v2.
-
Optionally, wrap the above into a composite action (in my case, a Go Unit Test composite action that:
- Installs some system dependencies,
- Validates
go.mod location,
- Runs
go test -v ./... with coverage,
- Generates
coverage.out, HTML / text coverage reports, and reports/junit.xml,
- Uploads them via
actions/upload-artifact@v4 and dorny/test-reporter@v2).
-
Trigger the workflow on a Go repository and observe the job logs. For almost every step, you will see:
Run '/home/runner/k8s-novolume/index.js'
(node:643494) [DEP0005] DeprecationWarning: Buffer() is deprecated ...
-
Around actions/upload-artifact@v4 and dorny/test-reporter@v2, the workflow appears to hang for tens of minutes, even though the actual go test execution has already completed.
-
While the job is “stuck”, exec into the workflow pod and run ps -ef. You should see something like:
root 260513 0 0 10:25 ? 00:00:00 sh -c cd /__w/_temp && find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
root 260519 260513 12 10:25 ? 00:00:00 find . -not -path */_runner_hook_responses* -exec stat -c %b %n {} ;
-
In the runner pod, running ps -ef shows a similar process:
runner 573192 418946 0 10:35 ? 00:00:00 sh -c cd /home/runner/_work/_temp && find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
runner 573193 573192 5 10:35 ? 00:00:01 find . -not -path */_runner_hook_responses* -exec stat -c %b %n {} ;
-
Check the size of _temp in the workflow pod:
cd /__w/_temp
du -sh . # ~309M
find . | wc -l # ~14026 files
-
Optionally, run a simple I/O benchmark (e.g. dd) in the runner pod under /home/runner/_work to confirm that raw disk I/O is fast (hundreds of MB/s).
-
While a step such as actions/upload-artifact@v4 or dorny/test-reporter@v2 is slow and the find/stat processes are running, manually delete the Go-related temp directories under _temp in both pods:
-
Runner pod:
rm -rf /home/runner/_work/_temp/<go-setup-guid-folder>
-
Workflow pod:
rm -rf /__w/_temp/<go-setup-guid-folder>
-
After deleting those directories:
- The currently running step completes within minutes instead of tens of minutes.
- The remaining steps also complete quickly.
- Overall job runtime drops dramatically.
This demonstrates that the slowdown is directly linked to the size and contents of _temp in combination with the k8s-novolume hooks behavior.
Describe the bug
When using gha-runner-scale-set (controller version 0.13.0) with:
- Runner type:
kubernetes + type: novolume
ACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.js
k8s-novolume hooks version: 0.13.0
we see severe slowdowns whenever _temp becomes non-trivial in size.
For every step, the container hook runs a find + stat over _temp in both pods:
-
In the workflow pod:
cd /__w/_temp && find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
-
In the runner pod:
cd /home/runner/_work/_temp && find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
With Go workflows that use actions/setup-go@v5, _temp grows to ~300MB and ~14k files because the Go toolchain is extracted to /__w/_temp/<guid>/... and then cached.
Once _temp reaches that size, each invocation of index.js spends a long time scanning the entire _temp tree via find + stat. As a result:
- Steps like
actions/upload-artifact@v4 and dorny/test-reporter@v2 appear to hang for tens of minutes.
- The overall job runtime grows to 50+ minutes, even though the actual
go test work finishes much earlier.
- The underlying disk is not the bottleneck (tested with
dd, showing ~700MB/s writes and ~300MB/s reads). The expensive part is the repeated full scan of _temp from the hook.
Cross-pod _temp synchronization
In addition to the scan itself, novolume mode seems to synchronize the contents of _temp between the runner pod and the workflow pod. Both pods show mirrored contents under:
/home/runner/_work/_temp (runner pod)
/__w/_temp (workflow pod)
This suggests that some kind of pod-to-pod copy over the Kubernetes API is happening (implementation-wise, it looks similar in effect to a kubectl cp / tar-stream-like behavior).
When _temp contains hundreds of MB and thousands of files, this cross-pod sync likely adds even more overhead on top of the find/stat scan, further amplifying the slowdown.
Experimental evidence
If I manually delete the Go-related temp directory under _temp in both pods while a step is slow, the step completes quickly and the rest of the workflow also runs fast. This strongly suggests that:
- The
_temp disk-usage scan in k8s-novolume, and
- The implied copy/synchronization of that directory between the two pods
are the main causes of the slowdown for this scenario.
Describe the expected behavior
I would expect the k8s-novolume hooks to:
- Not introduce large overhead relative to the actual job workload.
- Avoid scanning a large
_temp tree with find + stat on every hook invocation, especially when _temp is populated by common actions like actions/setup-go@v5.
- Avoid doing heavy, full-directory pod-to-pod copies over the Kubernetes API for
_temp when it contains hundreds of MB and many files.
Ideally, the implementation would:
- Limit the scan/copy to a smaller, dedicated directory, or
- Run such checks less frequently, or
- Provide a way to disable / relax these disk-usage and sync operations when they become too expensive.
In practice, for this job I would expect:
- The total runtime to be dominated by
go test, uploads, and reporting logic.
- No tens-of-minutes hangs around
upload-artifact or dorny/test-reporter.
- No need to manually clean
_temp inside the pods as a workaround.
- No full
_temp copy/sync between runner pod and workflow pod on each hook invocation when _temp is large.
Additional Context
Workflow snippet (composite action)
For completeness, here is the composite action used to run Go tests and upload results:
name: Go Unit Test
description: Run Go unit tests with coverage and JUnit reporting.
inputs:
go-mod-path:
description: 'Path to the go.mod file. Default is the root of the repository.'
required: false
default: './'
go-dependencies-to-install:
description: 'Bash commands to install system dependencies.'
required: false
default: ""
go-test-script:
description: 'Custom Go test script to run.'
required: false
default: 'go test -v ./...'
runs:
using: "composite"
steps:
- name: Install system dependencies
run: |
apt-get update && apt-get install -y curl jq git
curl -L https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -o yq
chmod +x yq && mv yq /usr/local/bin/yq
shell: bash
- name: Mark repository as safe
run: git config --global --add safe.directory $GITHUB_WORKSPACE
shell: bash
- name: Checkout repository
uses: actions/checkout@v4
- name: Check go.mod presence and Go version match
id: check_go_version
shell: bash
run: |
GO_MOD_PATH_INPUT="${{ inputs.go-mod-path }}"
echo "GO_MOD_PATH input: ${GO_MOD_PATH_INPUT}"
if [ -f "go.mod" ]; then
echo "✅ go.mod found at root."
GO_MOD_PATH=$(pwd)
elif [ -n "${GO_MOD_PATH_INPUT}" ] && [ -f "${GO_MOD_PATH_INPUT}/go.mod" ]; then
echo "✅ go.mod found at custom path: ${GO_MOD_PATH_INPUT}"
cd "${GO_MOD_PATH_INPUT}"
GO_MOD_PATH=$(pwd)
else:
echo "❌ go.mod not found."
exit 1
fi
echo "go-mod-dir=${GO_MOD_PATH}" >> $GITHUB_OUTPUT
- name: Setup Go Dependencies
if: ${{ inputs.go-dependencies-to-install != '' }}
uses: ALM/CIHub/.github/actions/setup-dependency@setup-dependency-0.0.1
with:
commands: ${{ inputs.go-dependencies-to-install }}
- name: Set up Go Version
uses: actions/setup-go@v5
with:
go-version-file: ${{ steps.check_go_version.outputs.go-mod-dir }}/go.mod
- name: Run Go Tests with Coverage
shell: bash
run: |
mkdir -p coverage reports
go install github.com/jstemmer/go-junit-report@latest
export PATH="$HOME/go/bin:$PATH"
read -ra GO_TEST_CMD <<< "${{ inputs.go-test-script }}"
GO_TEST_CMD+=("-cover" "-coverprofile=coverage.out")
echo "▶️ Running: ${GO_TEST_CMD[@]}"
"${GO_TEST_CMD[@]}" 2>&1 | tee test_output.txt || true
TEST_EXIT_CODE=${PIPESTATUS[0]}
go-junit-report < test_output.txt > reports/junit.xml
go tool cover -html=coverage.out -o coverage/coverage.html
go tool cover -func=coverage.out > coverage/coverage.txt
exit $TEST_EXIT_CODE
- name: Upload Go test results
if: always()
uses: actions/upload-artifact@v4
with:
name: go-test-results
path: |
coverage/coverage.html
coverage/coverage.txt
coverage.out
reports/junit.xml
- name: Upload JUnit test results
if: success()
uses: dorny/test-reporter@v2
with:
name: Go Unit Tests
path: reports/junit.xml
reporter: java-junit
crossPodSyncHypothesis:
description: >
Both the runner pod and the workflow pod show the same _temp contents
under /home/runner/_work/_temp and /__w/_temp. This suggests that novolume
may be copying/syncing the entire directory between pods (via the K8s API,
similar in effect to kubectl cp). With ~300MB and ~14k files, this likely
adds significant overhead on top of the find/stat scan.
workaround:
description: >
Manually delete the Go temp directory created by actions/setup-go under
_temp in BOTH the runner pod and workflow pod while a step is slow. This
immediately speeds up the current step and the remaining workflow.
downside: "Manual and not sustainable."
notes:
- PLEASE REDACT ANY INFORMATION THAT SHOULD NOT BE PUBLICLY AVAILABLE, LIKE TOKENS.
- Happy to provide full values.yaml and more detailed logs if needed.
---
## Questions / Requests
1. Is it **expected** that the `k8s-novolume` hooks (v0.13.0) run:
```bash
cd /__w/_temp # or /home/runner/_work/_temp
find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
on every hook invocation?
-
Is there any configuration / environment variable to:
- disable or relax this disk-usage scanning, or
- limit it to a smaller directory, or
- reduce its frequency?
-
Since actions/setup-go and other actions naturally populate _temp with many files, this scan becomes a major bottleneck in novolume mode.
Is there a recommended way to:
- point the hooks to a different directory than
_temp, or
- split
_temp and _work onto different mounts/paths specifically for the hook logic?
-
If novolume is indeed copying/syncing _temp between the runner pod and workflow pod via the Kubernetes API (kubectl cp / tar-stream style), is there any plan or option to:
- make this incremental,
- limit it to a smaller directory,
- or disable it when
_temp is large?
Severe slowdown in k8s-novolume hooks due to
_tempscan and cross-pod sync withactions/setup-goRelated issues / PRs and discussions
Checks
Controller Version
kuberneteswithtype: novolumek8s-novolume0.13.0ACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.jsDeployment Method
gha-runner-scale-setchart.Environment
kuberneteswithtype: novolumeACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.jsk8s-novolume0.13.0overlayfilesystem, no PVC, ephemeral storage onlyactions/checkout@v4actions/setup-go@v5go test+ coverageactions/upload-artifact@v4dorny/test-reporter@v2There are two pods involved per job:
/__w/...)To Reproduce
Deploy
gha-runner-scale-setwith:0.13.0kuberneteswithtype: novolumeACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.jsk8s-novolumehooks version:0.13.0Create a workflow that:
actions/checkout@v4actions/setup-go@v5withgo-version-filepointing togo.modgo testwith coverageactions/upload-artifact@v4dorny/test-reporter@v2.Optionally, wrap the above into a composite action (in my case, a
Go Unit Testcomposite action that:go.modlocation,go test -v ./...with coverage,coverage.out, HTML / text coverage reports, andreports/junit.xml,actions/upload-artifact@v4anddorny/test-reporter@v2).Trigger the workflow on a Go repository and observe the job logs. For almost every step, you will see:
Around
actions/upload-artifact@v4anddorny/test-reporter@v2, the workflow appears to hang for tens of minutes, even though the actualgo testexecution has already completed.While the job is “stuck”, exec into the workflow pod and run
ps -ef. You should see something like:In the runner pod, running
ps -efshows a similar process:Check the size of
_tempin the workflow pod:Optionally, run a simple I/O benchmark (e.g.
dd) in the runner pod under/home/runner/_workto confirm that raw disk I/O is fast (hundreds of MB/s).While a step such as
actions/upload-artifact@v4ordorny/test-reporter@v2is slow and thefind/statprocesses are running, manually delete the Go-related temp directories under_tempin both pods:Runner pod:
Workflow pod:
After deleting those directories:
This demonstrates that the slowdown is directly linked to the size and contents of
_tempin combination with thek8s-novolumehooks behavior.Describe the bug
When using
gha-runner-scale-set(controller version0.13.0) with:kubernetes+type: novolumeACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.jsk8s-novolumehooks version:0.13.0we see severe slowdowns whenever
_tempbecomes non-trivial in size.For every step, the container hook runs a
find + statover_tempin both pods:In the workflow pod:
In the runner pod:
With Go workflows that use
actions/setup-go@v5,_tempgrows to ~300MB and ~14k files because the Go toolchain is extracted to/__w/_temp/<guid>/...and then cached.Once
_tempreaches that size, each invocation ofindex.jsspends a long time scanning the entire_temptree viafind + stat. As a result:actions/upload-artifact@v4anddorny/test-reporter@v2appear to hang for tens of minutes.go testwork finishes much earlier.dd, showing ~700MB/s writes and ~300MB/s reads). The expensive part is the repeated full scan of_tempfrom the hook.Cross-pod
_tempsynchronizationIn addition to the scan itself,
novolumemode seems to synchronize the contents of_tempbetween the runner pod and the workflow pod. Both pods show mirrored contents under:/home/runner/_work/_temp(runner pod)/__w/_temp(workflow pod)This suggests that some kind of pod-to-pod copy over the Kubernetes API is happening (implementation-wise, it looks similar in effect to a
kubectl cp/ tar-stream-like behavior).When
_tempcontains hundreds of MB and thousands of files, this cross-pod sync likely adds even more overhead on top of thefind/statscan, further amplifying the slowdown.Experimental evidence
If I manually delete the Go-related temp directory under
_tempin both pods while a step is slow, the step completes quickly and the rest of the workflow also runs fast. This strongly suggests that:_tempdisk-usage scan ink8s-novolume, andare the main causes of the slowdown for this scenario.
Describe the expected behavior
I would expect the
k8s-novolumehooks to:_temptree withfind + staton every hook invocation, especially when_tempis populated by common actions likeactions/setup-go@v5._tempwhen it contains hundreds of MB and many files.Ideally, the implementation would:
In practice, for this job I would expect:
go test, uploads, and reporting logic.upload-artifactordorny/test-reporter._tempinside the pods as a workaround._tempcopy/sync between runner pod and workflow pod on each hook invocation when_tempis large.Additional Context
Workflow snippet (composite action)
For completeness, here is the composite action used to run Go tests and upload results:
crossPodSyncHypothesis:
description: >
Both the runner pod and the workflow pod show the same _temp contents
under /home/runner/_work/_temp and /__w/_temp. This suggests that novolume
may be copying/syncing the entire directory between pods (via the K8s API,
similar in effect to kubectl cp). With ~300MB and ~14k files, this likely
adds significant overhead on top of the find/stat scan.
workaround:
description: >
Manually delete the Go temp directory created by actions/setup-go under
_temp in BOTH the runner pod and workflow pod while a step is slow. This
immediately speeds up the current step and the remaining workflow.
downside: "Manual and not sustainable."
notes:
on every hook invocation?
Is there any configuration / environment variable to:
Since
actions/setup-goand other actions naturally populate_tempwith many files, this scan becomes a major bottleneck innovolumemode.Is there a recommended way to:
_temp, or_tempand_workonto different mounts/paths specifically for the hook logic?If
novolumeis indeed copying/syncing_tempbetween the runner pod and workflow pod via the Kubernetes API (kubectl cp / tar-stream style), is there any plan or option to:_tempis large?