Skip to content

Failure Threshold Shutdown Miscalculation #185

@eric-tramel

Description

@eric-tramel

Describe the bug

Ran a job, had a handful of errors, but the whole creation process got shut down. When reviewing the logs, the math seems wrong.

  |-- Data generation was terminated early due to error rate exceeding threshold.
  |-- The summary of encountered errors is:
{
    "failure_threshold": 0.5,
    "completed_count": 512,
    "success_count": 491,
    "early_shutdown": true,
    "error_count": 21,
    "task_errors": {
        "ModelAPIError": 21
    }
}

21 failures out of 512 should not trigger a shutdown.

Steps/Code to reproduce bug

Ran a job that produced some ModelAPIError (ran many image context threads that probably exceeded input context or something).

Expected behavior

A failure threshold of 0.5 should indicate 50% failure rate ... in the above we had a failure rate of 4.1%.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions