Skip to content

Formal states for test items #960

Description

@mattias-p

At any given time a test item is in exactly one of several states. Today these states don't have formal names and it is a non-trivial operation to determine what state a test item is in. This issue describes a way to improve this situation.

Here is the list of possible states and the current criteria for each of them:

  • waiting — progress == 0
  • running — 0 < progress < 100
  • completed — progress == 100 and test_results does not contain { "tag": "UNABLE_TO_FINISH_TEST", ... }
  • cancelled — progress == 100 and test_results contains { "tag": "UNABLE_TO_FINISH_TEST", ... } and test_end_time ≈ test_start_time + max_zonemaster_execution_time
  • crashed — progress == 100 and test_results contains { "tag": "UNABLE_TO_FINISH_TEST", ... } and logfile contains "Test died: {test_id}"

The procedure job_status should return an object with the following keys:

{
    "state": state,          // in all states
    "created_at": timestamp, // in all states
    "started_at": timestamp, // only in states "running", "completed", "cancelled" and "crashed"
    "ended_at": timestamp,   // only in states "completed", "cancelled" and "crashed"
    "progress": int          // only in states "running", "completed", "cancelled" and "crashed"
}

The lifecycle.t test file should be updated to:

  • Cover all state transitions.
  • When the intention is the check the state of a test, check the state name instead of proxies like the progress field.
  • Check the timestamps and progress where it makes sense.

Alternatives

  • In an earlier version the waiting state was called created. A few arguments have been made against created and in favor of waiting.
    • All tests could be said to be created, even the completed ones.
    • The term waiting makes more sense than created should the test agent's dispatcher ever want to reclaim a test from an worker, e.g. because of inactivity.
  • In an MVP implementation we could let the completed state also cover the cancelled and crashed states above, and the API could be extended to return the distinct cancelled and crashed as a followup. If we declare in the RPCAPI documentation that the set of reported states will be extended in the future, the followup would constitute an minor version bump and not a major version bump. This limited feature would be simpler to implement as it can be implemented efficiently without updating the database schema. In fact we already have the Zonemaster::Backend::DB::test_state() method detecting the required states.

Stabilization

Tracking issue

Affected interfaces

Incompatibly changed:

  • job_status

Dependencies

None

Overlapping proposals

None

Metadata

Metadata

Labels

T-FeatureType: New feature in software or test case description

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions