Skip to content

Ephemeral (single use) runner registrations #510

@MatisseHack

Description

@MatisseHack

Describe the bug
When starting a self hosted runner with ./run.cmd --once, the runner sometimes accepts a second job before shutting down, which causes that second job to fail with the message:

The runner: [runner-name] lost communication with the server. Verify the machine is running and has a healthy network connection.

This looks like the same issue recently fixed here: microsoft/azure-pipelines-agent#2728

To Reproduce
Steps to reproduce the behavior:

  1. Create a repo, enable GitHub Actions, and add a new workflow

  2. Configure a new runner on your machine

  3. Run the runner with ./run.cmd --once

  4. Queue two runs of your workflow

  5. The first job will run and the runner will go offline

  6. (Optionally) configure and start a second runner

  7. The second job will time out after several minutes with the message:

    The runner: [runner-name] lost communication with the server. Verify the machine is running and has a healthy network connection.
    

    (where [runner-name] is the name of the first runner)

  8. Also: trying to remove the first runner with the command ./config.cmd remove --token [token] will result in the following error until the second job times out:

    Failed: Removing runner from the server
    Runner "[runner-name]" is running a job for pool "Default"
    

Expected behavior
The second job should run on (and wait for) any new runner that comes online rather than try to run as a second job on the, now offline, original runner.

Runner Version and Platform

2.262.1 on Windows

Runner and Worker's Diagnostic Logs

_diag.zip

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions