Prototype 'skip_role_on_error: True' in 4 roles {lokole, moodle, mongodb, sugarizer} by holta · Pull Request #3255 · iiab/iiab

holta · 2022-06-17T15:13:51Z

It's not just that humanity is impatient (the world is melting after all) but also that some prefer to race through all roles and then look at the individual errors later.

So this PR prototypes an unblocking mechanism, allowing Ansible to continue to the next role(s) even despite serious errors in individual roles.

It uses Ansible's block/rescue technique. All this really does is indent most all of ROLE/tasks/main.yml — and then tack this on the bottom:

  rescue:

  - name: 'SEE ERROR ABOVE (skip_role_on_error: {{ skip_role_on_error }})'
    fail:
      msg: ""
    when: not skip_role_on_error

This means that if you override default_vars.yml to set skip_role_on_error: True you will get a rescued=2 count of all bypassed (rescued) errors in the bottom-right — to notify you of how many error(s) need to be addressed — as in this example output:

TASK [mongodb : SOME SERIOUS ERROR TRYING TO INSTALL MongoDB] **************************************************************************************************************************************************
fatal: [127.0.0.1]: FAILED! => {"changed": false, "msg": "DEEP FAIL 1! (MongoDB)"}

TASK [mongodb : SEE ERROR ABOVE (skip_role_on_error: False)] ***********************************************************************************************************
fatal: [127.0.0.1]: FAILED! => {"changed": false, "msg": ""}

TASK [sugarizer : SEE ERROR ABOVE (skip_role_on_error: False)] *********************************************************************************************************
fatal: [127.0.0.1]: FAILED! => {"changed": false, "msg": ""}

PLAY RECAP *************************************************************************************************************************************************************
127.0.0.1                  : ok=42   changed=7    unreachable=0    failed=1    skipped=5    rescued=2    ignored=0

As you can see above, it immediately "pops to the top of the stack" if encountering an error in a deeply nested role when skip_role_on_error: False

On the other hand, those who choose to override default_vars.yml to set skip_role_on_error: True later need to take responsibility for all the errors they intentionally chose to bypass! One way to do this is:

cd /opt/iiab/iiab
grep -B1 'SEE ERROR ABOVE' *.log

The -B1 flag is the essential part, as it includes the actual error on the previous line of the log file(s), for every error.

PS This technique should very likely be expanded to most all/other user-facing IIAB apps (i.e. their roles) in the coming days, if it proves to work well!

holta · 2022-06-17T15:23:46Z

Explanation above revised to make it more clear.
iiab-diagnostics might later include the following command, to surface disparate Ansible errors that folks have (intentionally or unintentionally) swept under the carpet:
```
grep -B1 'SEE ERROR ABOVE' /opt/iiab/iiab/*.log
```

holta · 2022-06-17T15:30:24Z

This PR was heavily tested on Ubuntu. It should probably be merged shortly, for broader community evaluation and testing.

jvonau · 2022-06-17T16:54:44Z

If this is expanded to include the roles that rely on mariadb this technique could be very useful for image creation.

holta · 2022-06-17T17:14:05Z

Recap:

Some IIAB implementers/testers/devs want to see errors immediately (e.g. when running ./runrole, ./iiab-install, etc) — whereas others want to bypass those errors (e.g. Admin Console, ./runroles, etc) "until later."

(So this PR tries to tease apart and acknowledge... some of these very different needs... expected by some IIAB implementers/testers/developers but not others.)

tim-moody · 2022-06-17T17:20:40Z

Some IIAB implementers/testers/devs want to see errors immediately (e.g. when running ./runrole, ./iiab-install, etc) — whereas others want to bypass those errors (e.g. Admin Console, ./runroles, etc) "until later."

I wrote runroles and admin console and I have no wish to bypass errors until later. I'm not even sure what that means.

All I want is for a role that knows it will fail due to lack of support for a particular OS to do so gracefully.

tim-moody · 2022-06-17T17:22:30Z

I also don't know what the reference to the readme is for; there's no mention of errors later.

jvonau · 2022-06-17T17:39:34Z

This RP doesn't directly solve 3242, but I can see a use for this functionally with automated testing.

All I want is for a role that knows it will fail due to lack of support for a particular OS to do so gracefully.

Can you expand on 'fail' and 'gracefully'? Stop and display error message as I proposed in 3244?

jvonau · 2022-06-17T17:44:34Z

Think #3242 (comment) is causing some confusion.

tim-moody · 2022-06-17T18:09:17Z

All I'm saying is that if a particular role can't support a particular environment it should say so and do nothing.

jvonau · 2022-06-17T18:59:31Z

All I'm saying is that if a particular role can't support a particular environment it should say so and do nothing.

Before trying to install the role? as a precautionary check? We know in advance where the failure is going to occur 'with a particular environment'. Are you suggesting just to ignore the *_install variable for that role and silently skip the role without relying on 'skip_role_on_error' being 'True' trying to install the role, failing then silently proceeding? The summary is a nice touch thou.

holta · 2022-06-17T19:09:39Z

There is no crystal ball or magic wand that will say whether a role will complete or not — and there never will be. Oddmakers are useful (and entertaining!) but in end that's all they are, oddsmakers.

Counting balls and strikes may be interesting (and yes necessary, to monitor and improve reliability!) but the higher-level need here is the people who want to (1) run long Ansible jobs largely uninterrupted, to install/enable many roles (i.e. many IIAB apps/services) (2) and then address the consequences later.

Some of these roles will succeed and some will fail, for innumerable and different reasons (e.g. an apt update that was applied last night fixing/breaking the OS and/or any upstream CI app fixing/breaking last night — and countless other reasons such as transient root causes, quasi-permanent root causes, networking/mirror failures, hardware failures, ETC, etc).

This important need has been overlooked for too long. So this PR works towards allowing everything to be batched up in service to this growing population of people (who want to batch everything up, then deal with the consequences later, indeed whether they have time later or not...)

Regardless how that batch process of installing many roles (in a more uninterrupted way) is invoked. Just one example is those installing a preset / learning bouquet that's trying to install many such roles/apps — whether from Admin Console, or from a script like ./iiab-install or ./runroles or any other means.

holta · 2022-06-17T19:50:59Z

I can see a use for this functionally with automated testing

Indeed!

QA (community testing) is one of several demographics that will contribute far more effectively with a batch process working through the installing/enabling of many roles in a best effort fashion (rather than hurry-up-and-wait, constantly watching the kettle boil...)

Reason: Being randomly interrupted may be "addicting" according to many psychological studies (and extremely mentally taxing, ask almost anyone in on-call / retail service industries) but such "ADHD" workflows can also be the worst way to get serious work accomplished. 😱

holta · 2022-06-18T02:26:32Z

I ran another successful test (LARGE-sized IIAB install on Ubuntu 22.04 not 20.04 this time) using skip_role_on_error: True to override default_vars.yml

Output confirms the 3 expected errors are trapped — with the error count (rescued=3 in this case) clear to anyone who chooses to look:

root@box:/opt/iiab/iiab# grep rescue *log
2022-06-17 22:15:15,859 p=4267 u=root n=ansible | 127.0.0.1                  : ok=793  changed=450  unreachable=0    failed=0    skipped=221  rescued=3    ignored=5

With error detail (for all 3 errors in this case) reviewable here:

root@box:/opt/iiab/iiab# grep -B1 'SEE ERROR ABOVE' *log
2022-06-17 22:08:06,133 p=4267 u=root n=ansible | fatal: [127.0.0.1]: FAILED! => {"changed": true, "cmd": "/opt/iiab/moodle/moodle_installer", "delta": "0:00:00.106486", "end": "2022-06-17 22:08:06.113942", "msg": "non-zero return code", "rc": 255, "start": "2022-06-17 22:08:06.007456", "stderr": "+ sudo -u www-data /usr/bin/php /opt/iiab/moodle/admin/cli/install.php --wwwroot=http://box.lan/moodle --dataroot=/library/moodle --dbtype=pgsql --dbname=moodle --dbuser=Admin --dbpass=changeme --fullname=Your_School --shortname=School --adminuser=admin --adminpass=changeme --non-interactive --agree-license --allow-unstable\nPHP Deprecated:  Return type of moodle_recordset::rewind() should either be compatible with Iterator::rewind(): void, or the #[\\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /opt/iiab/moodle/lib/dml/moodle_recordset.php on line 61\nPHP Fatal error:  Type of xml_format_exception::$line must be int (as in class Exception) in /opt/iiab/moodle/lib/xmlize.php on line 42", "stderr_lines": ["+ sudo -u www-data /usr/bin/php /opt/iiab/moodle/admin/cli/install.php --wwwroot=http://box.lan/moodle --dataroot=/library/moodle --dbtype=pgsql --dbname=moodle --dbuser=Admin --dbpass=changeme --fullname=Your_School --shortname=School --adminuser=admin --adminpass=changeme --non-interactive --agree-license --allow-unstable", "PHP Deprecated:  Return type of moodle_recordset::rewind() should either be compatible with Iterator::rewind(): void, or the #[\\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /opt/iiab/moodle/lib/dml/moodle_recordset.php on line 61", "PHP Fatal error:  Type of xml_format_exception::$line must be int (as in class Exception) in /opt/iiab/moodle/lib/xmlize.php on line 42"], "stdout": "\nDeprecated: Return type of moodle_recordset::rewind() should either be compatible with Iterator::rewind(): void, or the #[\\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /opt/iiab/moodle/lib/dml/moodle_recordset.php on line 61\n\nFatal error: Type of xml_format_exception::$line must be int (as in class Exception) in /opt/iiab/moodle/lib/xmlize.php on line 42", "stdout_lines": ["", "Deprecated: Return type of moodle_recordset::rewind() should either be compatible with Iterator::rewind(): void, or the #[\\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /opt/iiab/moodle/lib/dml/moodle_recordset.php on line 61", "", "Fatal error: Type of xml_format_exception::$line must be int (as in class Exception) in /opt/iiab/moodle/lib/xmlize.php on line 42"]}
2022-06-17 22:08:06,146 p=4267 u=root n=ansible | TASK [moodle : SEE ERROR ABOVE (skip_role_on_error: True)] *************************************************************************************************************
--
2022-06-17 22:11:35,032 p=4267 u=root n=ansible | fatal: [127.0.0.1]: FAILED! => {"cache_update_time": 1655518292, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"       install 'mongodb-org=4.4.14' 'mongodb-org-server=4.4.14'' failed: E: Unable to correct problems, you have held broken packages.\n", "rc": 100, "stderr": "E: Unable to correct problems, you have held broken packages.\n", "stderr_lines": ["E: Unable to correct problems, you have held broken packages."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nSome packages could not be installed. This may mean that you have\nrequested an impossible situation or if you are using the unstable\ndistribution that some required packages have not yet been created\nor been moved out of Incoming.\nThe following information may help to resolve the situation:\n\nThe following packages have unmet dependencies:\n mongodb-org-mongos : Depends: libssl1.1 (>= 1.1.0) but it is not installable\n mongodb-org-server : Depends: libssl1.1 (>= 1.1.0) but it is not installable\n mongodb-org-shell : Depends: libssl1.1 (>= 1.1.0) but it is not installable\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Some packages could not be installed. This may mean that you have", "requested an impossible situation or if you are using the unstable", "distribution that some required packages have not yet been created", "or been moved out of Incoming.", "The following information may help to resolve the situation:", "", "The following packages have unmet dependencies:", " mongodb-org-mongos : Depends: libssl1.1 (>= 1.1.0) but it is not installable", " mongodb-org-server : Depends: libssl1.1 (>= 1.1.0) but it is not installable", " mongodb-org-shell : Depends: libssl1.1 (>= 1.1.0) but it is not installable"]}
2022-06-17 22:11:35,048 p=4267 u=root n=ansible | TASK [mongodb : SEE ERROR ABOVE (skip_role_on_error: True)] ************************************************************************************************************
--
2022-06-17 22:11:35,109 p=4267 u=root n=ansible | fatal: [127.0.0.1]: FAILED! => {"changed": false, "msg": "MongoDB INSTALLATION FAILED, perhaps because MongoDB doesn't yet support Ubuntu 22.04 with libssl3?"}
2022-06-17 22:11:35,124 p=4267 u=root n=ansible | TASK [sugarizer : SEE ERROR ABOVE (skip_role_on_error: True)] **********************************************************************************************************

holta · 2022-06-18T11:10:01Z

@deldesir thanks for having reviewed this quickly overnight!

FYI he is eager to install arbitrary collections of IIAB
roles/apps/services in a more "uninterrupted" way, using this PR's approach.

ASIDE: This PR does not time out individual roles that "get stuck" for example installing too slowly (e.g. due to Internet problems from Haiti/India/Etc) but that use case might be considered in future — if it proves to be a common problem.

tim-moody · 2022-06-18T14:17:41Z

Now that I see more of the details I think I misunderstood this PR. I thought it was a wrapper approach, which I oppose, rather than causing a role internally to fail gracefully, which is what I have been asking for. So, sorry if I got it wrong.

If I now understand correctly, let me ask when and why would I ever want skip_role_on_error to be false, if I want ansible to terminate immediately rather than completing the run and installing other roles because I am only interested in testing one role? If that is the case perhaps True should be the default, not False.

I also suggest that skip_role_on_error might be skip_roles_on_error as it is global.

he is eager to install arbitrary collections of IIAB roles/apps/services

easily done with Admin Console.

holta · 2022-06-18T15:18:26Z

why would I ever want skip_role_on_error to be false

"Traditional craftsmanship" is the reason arguably. For those who feel industrial efficiency is too often impersonal (which it is!)

So we should keep this legacy option to help Perfectionists (which are not at all uncommon in the QA community, and in fact often incredibly insightful people as a result of their OCD attention to fine detail, often the only people who catch critical underlying flaws...)

So I guess the answer is to support these very important people who sometimes/often want to watch every little step of Ansible's output (deliberately, as this can be a learning experience) and then rerun things like sudo iiab on each error — to build up a test/field IIAB progressively/linearly in a more organic/patient way.

I also suggest that skip_role_on_error might be skip_roles_on_error as it is global.

I'm sympathetic.

But also torn trying to convey that just 1 role will be skipped per error. (I didn't want to accidentally convey that multiple roles would be skipped upon hitting a single error...)

holta · 2022-06-18T15:47:45Z

It's been more than 24h so I'm going to merge this to encourage many more people to bang on it and provide feedback around next steps.

Changes in strategy/naming/etc can and should certainly be made down the road, as we learn more.

tim-moody · 2022-06-18T16:26:33Z

override default_vars.yml to set skip_role_on_error: True

I assume you mean in local vars, Please don't encourage anyone to modify default vars

holta · 2022-06-18T16:39:02Z

I assume you mean in local vars

Yes!

To clarify, most everyone should add/modify IIAB Variables[*] within /etc/iiab/local_vars.yml — prior to IIAB installation if possible.

[*] Used by Ansible and other things too.

Prototype 'skip_role_on_error: True' in 4 roles {lokole, moodle, mong…

645a3e2

…odb, sugarizer}

holta added the enhancement label Jun 17, 2022

holta added this to the 8.0 milestone Jun 17, 2022

holta merged commit 83ea6ce into iiab:master Jun 18, 2022

holta mentioned this pull request Jul 7, 2022

IIAB install froze at "pip install 7 packages into virtual environment: /opt/iiab/jupyterhub (~217 MB)" [on 64-bit RasPiOS — whereas Ubuntu 22.04 on x86_64 is not affected] #3283

Closed

holta added a commit to cwivagg/iiab that referenced this pull request Jul 28, 2022

Modernize matomo/tasks/main.yml for skip_role_on_error iiab#3255

cd63585

holta mentioned this pull request Jul 29, 2022

Cwivagg/matomo 20220618 #3304

Merged

holta mentioned this pull request Aug 31, 2022

azuracast/tasks/main.yml: skip_role_on_error #3355

Merged

holta mentioned this pull request Nov 28, 2022

Install new Moodle 4.1 LTS, or 4.2dev if OS PHP >= 8.2 #3430

Merged

Conversation

holta commented Jun 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

holta commented Jun 17, 2022

Uh oh!

holta commented Jun 17, 2022

Uh oh!

jvonau commented Jun 17, 2022

Uh oh!

holta commented Jun 17, 2022

Uh oh!

tim-moody commented Jun 17, 2022

Uh oh!

tim-moody commented Jun 17, 2022

Uh oh!

jvonau commented Jun 17, 2022

Uh oh!

jvonau commented Jun 17, 2022

Uh oh!

tim-moody commented Jun 17, 2022

Uh oh!

jvonau commented Jun 17, 2022

Uh oh!

holta commented Jun 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

holta commented Jun 17, 2022

Uh oh!

holta commented Jun 18, 2022

Uh oh!

holta commented Jun 18, 2022

Uh oh!

tim-moody commented Jun 18, 2022

Uh oh!

holta commented Jun 18, 2022

Uh oh!

holta commented Jun 18, 2022

Uh oh!

tim-moody commented Jun 18, 2022

Uh oh!

holta commented Jun 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

holta commented Jun 17, 2022 •

edited

Loading

holta commented Jun 17, 2022 •

edited

Loading

holta commented Jun 18, 2022 •

edited

Loading