Severe slowdown in k8s-novolume hooks due to `_temp` scan and cross-pod sync with `actions/setup-go`


# Severe slowdown in k8s-novolume hooks due to `_temp` scan and cross-pod sync with `actions/setup-go`

## Related issues / PRs and discussions

- actions-runner-controller: [actions/actions-runner-controller#4313](https://github.com/actions/actions-runner-controller/issues/4313)
- GitHub Community: [Discussion #179085](https://github.com/orgs/community/discussions/179085)
- runner-container-hooks: [actions/runner-container-hooks#244](https://github.com/actions/runner-container-hooks/pull/244)

## Checks

- [x] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [x] I am using charts that are officially provided.
- [x] This isn't a question or user support case (for Q&A and community support, go to Discussions).
- [x] I've read the Changelog and I'm sure it's not due to any recently-introduced backward-incompatible changes.

## Controller Version

- Controller: **actions-runner-controller** (gha-runner-scale-set API group)
- Version: **0.13.0**
- Mode: `kubernetes` with `type: novolume`
- Hooks: `k8s-novolume` **0.13.0**
- `ACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.js`

## Deployment Method

- **Helm**, using the officially provided `gha-runner-scale-set` chart.

---

## Environment

- **Controller**: actions-runner-controller  
- **Runner type**: `kubernetes` with `type: novolume`  
- **Container hooks**: `ACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.js`  
- **Hooks version**: `k8s-novolume` **0.13.0**  
- **Filesystem in pods**: `overlay` filesystem, no PVC, ephemeral storage only  
- **Language/stack**: Go, using:
  - `actions/checkout@v4`
  - `actions/setup-go@v5`
  - `go test` + coverage  
  - `actions/upload-artifact@v4`
  - `dorny/test-reporter@v2`

There are two pods involved per job:

- **Runner pod** (GitHub Runner)
- **Workflow pod** (where the job actually runs, `/__w/...`)

---

## To Reproduce

1. Deploy `gha-runner-scale-set` with:
   - Controller version: `0.13.0`
   - Runner type: `kubernetes` with `type: novolume`
   - `ACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.js`
   - `k8s-novolume` hooks version: `0.13.0`
   - Overlay filesystem, no PVC, only ephemeral storage.

2. Create a workflow that:
   - Uses `actions/checkout@v4`
   - Uses `actions/setup-go@v5` with `go-version-file` pointing to `go.mod`
   - Runs `go test` with coverage
   - Uploads artifacts with `actions/upload-artifact@v4`
   - Publishes JUnit results with `dorny/test-reporter@v2`.

3. Optionally, wrap the above into a composite action (in my case, a `Go Unit Test` composite action that:
   - Installs some system dependencies,
   - Validates `go.mod` location,
   - Runs `go test -v ./...` with coverage,
   - Generates `coverage.out`, HTML / text coverage reports, and `reports/junit.xml`,
   - Uploads them via `actions/upload-artifact@v4` and `dorny/test-reporter@v2`).

4. Trigger the workflow on a Go repository and observe the job logs. For almost every step, you will see:

   ```text
   Run '/home/runner/k8s-novolume/index.js'

   (node:643494) [DEP0005] DeprecationWarning: Buffer() is deprecated ...
   ```

5. Around `actions/upload-artifact@v4` and `dorny/test-reporter@v2`, the workflow appears to hang for tens of minutes, even though the actual `go test` execution has already completed.

6. While the job is “stuck”, exec into the **workflow pod** and run `ps -ef`. You should see something like:

   ```text
   root      260513       0  0 10:25 ?        00:00:00 sh -c cd /__w/_temp && find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
   root      260519  260513 12 10:25 ?        00:00:00 find . -not -path */_runner_hook_responses* -exec stat -c %b %n {} ;
   ```

7. In the **runner pod**, running `ps -ef` shows a similar process:

   ```text
   runner    573192  418946  0 10:35 ?        00:00:00 sh -c cd /home/runner/_work/_temp && find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
   runner    573193  573192  5 10:35 ?        00:00:01 find . -not -path */_runner_hook_responses* -exec stat -c %b %n {} ;
   ```

8. Check the size of `_temp` in the workflow pod:

   ```bash
   cd /__w/_temp
   du -sh .        # ~309M
   find . | wc -l  # ~14026 files
   ```

9. Optionally, run a simple I/O benchmark (e.g. `dd`) in the runner pod under `/home/runner/_work` to confirm that raw disk I/O is fast (hundreds of MB/s).

10. While a step such as `actions/upload-artifact@v4` or `dorny/test-reporter@v2` is slow and the `find/stat` processes are running, manually delete the Go-related temp directories under `_temp` in **both** pods:

    - Runner pod:

      ```bash
      rm -rf /home/runner/_work/_temp/<go-setup-guid-folder>
      ```

    - Workflow pod:

      ```bash
      rm -rf /__w/_temp/<go-setup-guid-folder>
      ```

11. After deleting those directories:
    - The currently running step completes within minutes instead of tens of minutes.
    - The remaining steps also complete quickly.
    - Overall job runtime drops dramatically.

This demonstrates that the slowdown is directly linked to the size and contents of `_temp` in combination with the `k8s-novolume` hooks behavior.

---

## Describe the bug

When using `gha-runner-scale-set` (controller version `0.13.0`) with:

- Runner type: `kubernetes` + `type: novolume`
- `ACTIONS_RUNNER_CONTAINER_HOOKS=/home/runner/k8s-novolume/index.js`
- `k8s-novolume` hooks version: `0.13.0`

we see **severe slowdowns** whenever `_temp` becomes non-trivial in size.

For every step, the container hook runs a `find + stat` over `_temp` in **both** pods:

- In the **workflow pod**:

  ```bash
  cd /__w/_temp && find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
  ```

- In the **runner pod**:

  ```bash
  cd /home/runner/_work/_temp && find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
  ```

With Go workflows that use `actions/setup-go@v5`, `_temp` grows to ~300MB and ~14k files because the Go toolchain is extracted to `/__w/_temp/<guid>/...` and then cached.

Once `_temp` reaches that size, each invocation of `index.js` spends a long time scanning the entire `_temp` tree via `find + stat`. As a result:

- Steps like `actions/upload-artifact@v4` and `dorny/test-reporter@v2` appear to hang for tens of minutes.
- The overall job runtime grows to 50+ minutes, even though the actual `go test` work finishes much earlier.
- The underlying disk is not the bottleneck (tested with `dd`, showing ~700MB/s writes and ~300MB/s reads). The expensive part is the repeated full scan of `_temp` from the hook.

### Cross-pod `_temp` synchronization

In addition to the scan itself, `novolume` mode seems to **synchronize the contents of `_temp` between the runner pod and the workflow pod**. Both pods show mirrored contents under:

- `/home/runner/_work/_temp` (runner pod)
- `/__w/_temp` (workflow pod)

This suggests that some kind of **pod-to-pod copy over the Kubernetes API** is happening (implementation-wise, it looks similar in effect to a `kubectl cp` / tar-stream-like behavior).

When `_temp` contains hundreds of MB and thousands of files, this cross-pod sync likely adds even more overhead on top of the `find/stat` scan, further amplifying the slowdown.

### Experimental evidence

If I manually delete the Go-related temp directory under `_temp` in both pods while a step is slow, the step completes quickly and the rest of the workflow also runs fast. This strongly suggests that:

1. The `_temp` disk-usage scan in `k8s-novolume`, and  
2. The implied copy/synchronization of that directory between the two pods  

are the main causes of the slowdown for this scenario.

---

## Describe the expected behavior

I would expect the `k8s-novolume` hooks to:

- Not introduce large overhead relative to the actual job workload.
- Avoid scanning a large `_temp` tree with `find + stat` on every hook invocation, especially when `_temp` is populated by common actions like `actions/setup-go@v5`.
- Avoid doing heavy, full-directory pod-to-pod copies over the Kubernetes API for `_temp` when it contains hundreds of MB and many files.

Ideally, the implementation would:

- Limit the scan/copy to a smaller, dedicated directory, or
- Run such checks less frequently, or
- Provide a way to disable / relax these disk-usage and sync operations when they become too expensive.

In practice, for this job I would expect:

- The total runtime to be dominated by `go test`, uploads, and reporting logic.
- No tens-of-minutes hangs around `upload-artifact` or `dorny/test-reporter`.
- No need to manually clean `_temp` inside the pods as a workaround.
- No full `_temp` copy/sync between runner pod and workflow pod on each hook invocation when `_temp` is large.

---

## Additional Context

### Workflow snippet (composite action)

For completeness, here is the composite action used to run Go tests and upload results:

```yaml
name: Go Unit Test
description: Run Go unit tests with coverage and JUnit reporting.

inputs:
  go-mod-path:
    description: 'Path to the go.mod file. Default is the root of the repository.'
    required: false
    default: './'
  go-dependencies-to-install:
    description: 'Bash commands to install system dependencies.'
    required: false
    default: ""
  go-test-script:
    description: 'Custom Go test script to run.'
    required: false
    default: 'go test -v ./...'

runs:
  using: "composite"
  steps:
    - name: Install system dependencies
      run: |
        apt-get update && apt-get install -y curl jq git
        curl -L https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -o yq
        chmod +x yq && mv yq /usr/local/bin/yq
      shell: bash

    - name: Mark repository as safe
      run: git config --global --add safe.directory $GITHUB_WORKSPACE
      shell: bash

    - name: Checkout repository
      uses: actions/checkout@v4

    - name: Check go.mod presence and Go version match
      id: check_go_version
      shell: bash
      run: |
        GO_MOD_PATH_INPUT="${{ inputs.go-mod-path }}"
        echo "GO_MOD_PATH input: ${GO_MOD_PATH_INPUT}"

        if [ -f "go.mod" ]; then
          echo "✅ go.mod found at root."
          GO_MOD_PATH=$(pwd)
        elif [ -n "${GO_MOD_PATH_INPUT}" ] && [ -f "${GO_MOD_PATH_INPUT}/go.mod" ]; then
          echo "✅ go.mod found at custom path: ${GO_MOD_PATH_INPUT}"
          cd "${GO_MOD_PATH_INPUT}"
          GO_MOD_PATH=$(pwd)
        else:
          echo "❌ go.mod not found."
          exit 1
        fi
        echo "go-mod-dir=${GO_MOD_PATH}" >> $GITHUB_OUTPUT

    - name: Setup Go Dependencies
      if: ${{ inputs.go-dependencies-to-install != '' }}
      uses: ALM/CIHub/.github/actions/setup-dependency@setup-dependency-0.0.1
      with:
        commands: ${{ inputs.go-dependencies-to-install }}

    - name: Set up Go Version
      uses: actions/setup-go@v5
      with:
        go-version-file: ${{ steps.check_go_version.outputs.go-mod-dir }}/go.mod

    - name: Run Go Tests with Coverage
      shell: bash
      run: |
        mkdir -p coverage reports
        go install github.com/jstemmer/go-junit-report@latest
        export PATH="$HOME/go/bin:$PATH"

        read -ra GO_TEST_CMD <<< "${{ inputs.go-test-script }}"
        GO_TEST_CMD+=("-cover" "-coverprofile=coverage.out")

        echo "▶️ Running: ${GO_TEST_CMD[@]}"
        "${GO_TEST_CMD[@]}" 2>&1 | tee test_output.txt || true
        TEST_EXIT_CODE=${PIPESTATUS[0]}

        go-junit-report < test_output.txt > reports/junit.xml
        go tool cover -html=coverage.out -o coverage/coverage.html
        go tool cover -func=coverage.out > coverage/coverage.txt

        exit $TEST_EXIT_CODE

    - name: Upload Go test results
      if: always()
      uses: actions/upload-artifact@v4
      with:
        name: go-test-results
        path: |
          coverage/coverage.html
          coverage/coverage.txt
          coverage.out
          reports/junit.xml

    - name: Upload JUnit test results
      if: success()
      uses: dorny/test-reporter@v2
      with:
        name: Go Unit Tests
        path: reports/junit.xml
        reporter: java-junit
```

crossPodSyncHypothesis:
  description: >
    Both the runner pod and the workflow pod show the same _temp contents
    under /home/runner/_work/_temp and /__w/_temp. This suggests that novolume
    may be copying/syncing the entire directory between pods (via the K8s API,
    similar in effect to kubectl cp). With ~300MB and ~14k files, this likely
    adds significant overhead on top of the find/stat scan.

workaround:
  description: >
    Manually delete the Go temp directory created by actions/setup-go under
    _temp in BOTH the runner pod and workflow pod while a step is slow. This
    immediately speeds up the current step and the remaining workflow.
  downside: "Manual and not sustainable."

notes:
  - PLEASE REDACT ANY INFORMATION THAT SHOULD NOT BE PUBLICLY AVAILABLE, LIKE TOKENS.
  - Happy to provide full values.yaml and more detailed logs if needed.
```

---

## Questions / Requests

1. Is it **expected** that the `k8s-novolume` hooks (v0.13.0) run:

   ```bash
   cd /__w/_temp      # or /home/runner/_work/_temp
   find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \;
   ```

   on every hook invocation?

2. Is there any **configuration / environment variable** to:
   - disable or relax this disk-usage scanning, or  
   - limit it to a smaller directory, or  
   - reduce its frequency?

3. Since `actions/setup-go` and other actions naturally populate `_temp` with many files, this scan becomes a major bottleneck in `novolume` mode.  
   Is there a recommended way to:
   - point the hooks to a **different directory** than `_temp`, or  
   - split `_temp` and `_work` onto different mounts/paths specifically for the hook logic?

4. If `novolume` is indeed copying/syncing `_temp` between the runner pod and workflow pod via the Kubernetes API (kubectl cp / tar-stream style), is there any plan or option to:
   - make this incremental,
   - limit it to a smaller directory,
   - or disable it when `_temp` is large?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Severe slowdown in k8s-novolume hooks due to `_temp` scan and cross-pod sync with `actions/setup-go` #274

Severe slowdown in k8s-novolume hooks due to `_temp` scan and cross-pod sync with `actions/setup-go`

Related issues / PRs and discussions

Checks

Controller Version

Deployment Method

Environment

To Reproduce

Describe the bug

Cross-pod `_temp` synchronization

Experimental evidence

Describe the expected behavior

Additional Context

Workflow snippet (composite action)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Severe slowdown in k8s-novolume hooks due to _temp scan and cross-pod sync with actions/setup-go #274

Description

Severe slowdown in k8s-novolume hooks due to _temp scan and cross-pod sync with actions/setup-go

Related issues / PRs and discussions

Checks

Controller Version

Deployment Method

Environment

To Reproduce

Describe the bug

Cross-pod _temp synchronization

Experimental evidence

Describe the expected behavior

Additional Context

Workflow snippet (composite action)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Severe slowdown in k8s-novolume hooks due to `_temp` scan and cross-pod sync with `actions/setup-go` #274

Severe slowdown in k8s-novolume hooks due to `_temp` scan and cross-pod sync with `actions/setup-go`

Cross-pod `_temp` synchronization