-
Notifications
You must be signed in to change notification settings - Fork 75
Description
Description
It is hard to tell whether a job succeeded or not,
for example, this ci action in the Rust SDK looks like it will capture whether the test succeeds or not, but actually, it doesn't:
The CLI exit with code = 0, which looks like the job is successful and the action completes, but the actual testground job outcome is "failed".
How to actually get a test result
There is a difference between the run finished successfully output in the log and the actual status shown on the dashboard (when you visit http://localhost:8042 with the daemon running locally).
If you run a test with the --wait, the CLI will exit with code 0 despite the test failing.
This is why the action above doesn't catch failing tests.
Here is an example of the "correct" way to get the outcome, It's 40 lines:
After polling for status, you have to run:
testground status --task c9p3s72el22n9gatgq70 --extended Which outputs something like:
› testground status --task c9p3s72el22n9gatgq70 --extended
May 4 08:54:21.573004 INFO using home directory: /Users/laurent/testground
May 4 08:54:21.573192 INFO .env.toml loaded from: /Users/laurent/testground/.env.toml
May 4 08:54:21.573203 INFO testground client initialized {"addr": "http://localhost:8043"}
>>> Result:
ID: c9p3s72el22n9gatgq70
Priority: 1
Created: 2022-05-04 08:49:32.426292982 +0000 UTC
Type: run
Status: complete
Last update: 2022-05-04 08:51:15.295936558 +0000 UTC
Input:
{"Sources":{"base_dir":"/home/laurent/testground/data/work/requests/5950d816","extra_dir":"","plan_dir":"/home/laurent/testground/data/work/requests/5950d816/plan","sdk_dir":""},"build_groups":[0],"composition":{"global":{"build":null,"build_config":{"enabled":true},"builder":"docker:generic","case":"example","disable_metrics":false,"plan":"sdk-rust","run":null,"run_config":{"enabled":true},"runner":"local:docker","total_instances":1},"groups":[{"build":{"dependencies":[],"selectors":null},"id":"single","instances":{"count":1,"percentage":0},"resources":{"cpu":"","memory":""},"run":{"artifact":"51db1e46f370","profiles":null,"test_params":{}}}],"metadata":{"author":"","name":""}},"created_by":{},"manifest":{"Builders":{"docker:generic":{"enabled":true}},"ExtraSources":null,"Name":"sdk-rust","Runners":{"local:docker":{"enabled":true}},"TestCases":[{"Instances":{"Maximum":1,"Minimum":1},"Name":"example","Parameters":null}]},"priority":1}
Result:
{"journal":{"events":{},"pods_statuses":{}},"outcome":"failure","outcomes":{"single":{"ok":0,"total":1}}}
Note the Status: complete at the beginning,
BUT if you use the --extended parameter, you see "outcomes": "failure" JSON log.
What defines this endeavour to be complete?
The CLI should be more explicit about failures, for example, use exit codes that are != 0 when we run a job with --wait and call the status command.
Note that we have to deal with 2 kinds of errors, errors between the CLI and the job running (if the job upload failed, if the cluster is down, etc), which are different than the test run actually failing.
- list the operations we regularly use in the CLI, especially the one that requires a lot of scripts,
- Expose a cleaner UI to these.
Sub tasks
- Job outcome is not clear when you read the daemon's stdout. #1344
- The daemon might log "run successfully" but the job failed. Or log a "write error" or "incomplete" but the job succeeds.
- It should print out the outcome (same on the dashboard) explicitly.
- Job outcome is hard to retrieve from the CLI #1346
- See the example above, the outcome is hidden in the last line of the extended status command logs
- It's hard to "long poll" for a job status #1351
- Any CI job has to implement some variation of this script, it should be easy to: get a job run id, get a job run status, wait for a job completion,
- Something like
testground wait -t TASK_ID --timeout TIMEOUT_SECONDSshould do it
- A testground run might end with a status code = 0 despite the job failing (hence the need for the entrypoint.sh script) #1349
- Verify this information
- then fix
Metadata
Metadata
Assignees
Labels
Type
Projects
Status