It's difficult and error-prone to get a test run status

## Description

It is hard to tell whether a job succeeded or not,
for example, this ci action in the Rust SDK looks like it will capture whether the test succeeds or not, but actually, it doesn't:

https://github.com/testground/sdk-rust/blob/5c50ba4f63d2aff6b815cfbfcb0c8f9e4d73809e/.github/workflows/ci.yml#L106-L114

The CLI exit with code = 0, which looks like the job is successful and the action completes, but the actual testground job outcome is "failed".

## How to actually get a test result

There is a difference between the `run finished successfully` output in the log and the actual status shown on the dashboard (when you visit http://localhost:8042 with the daemon running locally).

If you run a test with the `--wait`, the CLI will exit with code 0 despite the test failing.
This is why the action above doesn't catch failing tests.

Here is an example of the "correct" way to get the outcome, It's 40 lines:

https://github.com/galargh/testground-github-action/blob/7dbfde22f7158acfdeb1b22352a14992f41b0310/entrypoint.sh#L46-L83

After polling for status, you have to run:

```sh
testground status --task c9p3s72el22n9gatgq70 --extended 
```

Which outputs something like:

```
› testground status --task c9p3s72el22n9gatgq70 --extended
May  4 08:54:21.573004  INFO    using home directory: /Users/laurent/testground
May  4 08:54:21.573192  INFO    .env.toml loaded from: /Users/laurent/testground/.env.toml
May  4 08:54:21.573203  INFO    testground client initialized   {"addr": "http://localhost:8043"}

>>> Result:

ID:             c9p3s72el22n9gatgq70
Priority:       1
Created:        2022-05-04 08:49:32.426292982 +0000 UTC
Type:           run
Status:         complete
Last update:    2022-05-04 08:51:15.295936558 +0000 UTC

Input:
{"Sources":{"base_dir":"/home/laurent/testground/data/work/requests/5950d816","extra_dir":"","plan_dir":"/home/laurent/testground/data/work/requests/5950d816/plan","sdk_dir":""},"build_groups":[0],"composition":{"global":{"build":null,"build_config":{"enabled":true},"builder":"docker:generic","case":"example","disable_metrics":false,"plan":"sdk-rust","run":null,"run_config":{"enabled":true},"runner":"local:docker","total_instances":1},"groups":[{"build":{"dependencies":[],"selectors":null},"id":"single","instances":{"count":1,"percentage":0},"resources":{"cpu":"","memory":""},"run":{"artifact":"51db1e46f370","profiles":null,"test_params":{}}}],"metadata":{"author":"","name":""}},"created_by":{},"manifest":{"Builders":{"docker:generic":{"enabled":true}},"ExtraSources":null,"Name":"sdk-rust","Runners":{"local:docker":{"enabled":true}},"TestCases":[{"Instances":{"Maximum":1,"Minimum":1},"Name":"example","Parameters":null}]},"priority":1}

Result:
{"journal":{"events":{},"pods_statuses":{}},"outcome":"failure","outcomes":{"single":{"ok":0,"total":1}}}
```

Note the `Status: complete` at the beginning, 
BUT if you use the `--extended` parameter, you see `"outcomes": "failure"` JSON log.

## What defines this endeavour to be complete?

The CLI should be more explicit about failures, for example, use exit codes that are `!= 0` when we run a job with `--wait` and call the `status` command.

Note that we have to deal with 2 kinds of errors, errors between the CLI and the job running (if the job upload failed, if the cluster is down, etc), which are different than the test run actually failing.

- [x] list the operations we regularly use in the CLI, especially the one that requires a lot of scripts,
- [x] Expose a cleaner UI to these.

## Sub tasks

- [x] #1344
  - The daemon might log "run successfully" but the job failed. Or log a "write error" or "incomplete" but the job succeeds.
  - It should print out the outcome (same on the dashboard) explicitly.
- [x] #1346
  - See the example above, the outcome is hidden in the last line of the extended status command logs
- [ ] #1351
  - Any CI job has to implement some variation of [this script](https://github.com/testground/testground-github-action/blob/master/entrypoint.sh#L34-L91), it should be easy to: get a job run id, get a job run status, wait for a job completion,
  - Something like `testground wait -t TASK_ID --timeout TIMEOUT_SECONDS` should do it
- [x] #1349
  - [x] Verify this information
  - [x] then fix


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It's difficult and error-prone to get a test run status #1329

Description

How to actually get a test result

What defines this endeavour to be complete?

Sub tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

It's difficult and error-prone to get a test run status #1329

Description

Description

How to actually get a test result

What defines this endeavour to be complete?

Sub tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions