Skip to content

[🐛 Bug]: Workers not handling SIGSEGV signal properly #10720

@nextlevelbeard

Description

@nextlevelbeard

Have you read the Contributing Guidelines on issues?

WebdriverIO Version

latest

Node.js Version

20

Mode

Standalone Mode

Which capabilities are you using?

No response

What happened?

Sometimes the machine running the tests can be deprived of resources momentarily.
Workers, when initializing, can receive a SIGSEGV signal and return a null exit code, experiencing a segmentation fault.

Noticed this behavior when running a larger number of tests in a parallel manner.
Despite that workers are constantly being created / destroyed and running tests successfully, occasionally, this can happen to a worker.

When a WDIO worker has a SIGSEGV signal:

  • absolutely no relevant logs are generated, either during or after execution has ended
  • opened browser/driver is not terminated, an empty browser is left running well after WDIO is no longer running

What is your expected behavior?

Handle SIGSEGV signal, in the same fashion to the other signals it is already handling (SIGTERM, SIGINT, etc)

  • Logs should be produced (at runner level since worker process crashed) informing the user, an error message should be displayed in console, currently only [0-0] FAILED in browser - file:///myspecfile is displayed
  • Stop assuming there is an exit code when logging and handle the case where only a signal is known
  • Opened browser/driver should be terminated

Retries:
Since SIGSEGV can be a momentarily hiccup of the machine running the tests and a couple of retries could help, worker could bounce back and complete its task and not ruin what can be a perfectly green run.

Retries already happen when user enables retries but the crashed browser/drivers are not closed/terminated between retries.

A retry mechanism could be employed even if the user is not explicit about enabling retries, since this type of error is often experienced at worker initialization/startup and the OS is allocating memory for the worker.

Have a config option (like retrySegmentationFaults) so that even when specFileRetries is zero or not enabled, the worker is still re-run for a limited amount of times, perhaps 2.

Suggestions:

    private _handleExit (exitCode: number) {
        const { cid, childProcess, specs, retries } = this;
        /**
         * delete process of worker
         */

        const signal = childProcess.signalCode;

        delete this.childProcess;
        this.isBusy = false;
        this.isKilled = true;
        log.debug(`Runner ${cid} finished with exit code ${exitCode}`);
        this.emit('exit', { cid, exitCode, specs, retries, signal });
        if (childProcess) {
            childProcess.kill('SIGTERM');
        }
    }
   async _endHandler({ cid: rid, exitCode, specs, retries, signal }) {

       const passed = this._isWatchModeHalted() || exitCode === 0;

       const config = this.configParser.getConfig();

       this.segfaultRetry = this.segfaultRetry ?? (signal === 'SIGSEGV' && !config.specFileRetries)

       if (this.segfaultRetry && !retries) {
           retries = 2;
           this.segfaultRetry = false
       }
       ...

How to reproduce the bug.

Use segfault-handler and trigger a segmentation fault in tests.

import SegfaultHandler from 'segfault-handler';

// Optionally specify a callback function for custom logging. This feature is currently only supported for Node.js >= v0.12 running on Linux.
SegfaultHandler.registerHandler("crash.log", function(signal, address, stack) {
	// Do what you want with the signal, address, or stack (array)
	// This callback will execute before the signal is forwarded on.
});

SegfaultHandler.causeSegfault(); // simulates a buggy native module that dereferences NULL

Relevant log output

Little to no logs are produced since process crashes and no information is output to console.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Is there an existing issue for this?

  • I have searched the existing issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions