chore(common): adds retry mechanism for build script npm ci calls#11451
chore(common): adds retry mechanism for build script npm ci calls#11451
Conversation
User Test ResultsTest specification and instructions User tests are not required Test Artifacts
|
mcdurdin
left a comment
There was a problem hiding this comment.
This is looking good. Can we add function documentation at the front, similar to that found in builder.inc.sh, e.g.
keyman/resources/builder.inc.sh
Lines 227 to 243 in 6c00656
Co-authored-by: Marc Durdin <marc@durdin.net>
|
Just discovered https://docs.npmjs.com/cli/v6/using-npm/config#fetch-retries and its friends: fetch-retries fetch-retry-factor fetch-retry-mintimeout fetch-retry-maxtimeout maxsockets We might wish to consider these because they'll be much cleaner than retrying the whole |
|
Also noting that the problem is not resolved with this PR, see https://build.palaso.org/buildConfiguration/Keyman_Common_KPAPI_TestPullRequests_macOS/463600 That last line is suspicious ... is the retry code working correctly? |
mcdurdin
left a comment
There was a problem hiding this comment.
With my previous comment -- I don't think this is working correctly
Following up on the quoted comment... viewing its link shows the following:
Based on that, I believe that those settings affect
|
|
Inspecting that build log, I don't think npm's retry module even kicked in. That, or there are custom defaults set on the agent that provide less retries than npm's default.
Note the elapsed time: approximately 30 seconds in total. The retry parameterization quoted above raises questions.
Shouldn't there be at least 110 seconds between initial |
No. |
|
https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Windows/466721 is another example where npm retry may help -- in this case, not network related, but local fs related (possibly security software?): and https://build.palaso.org/buildConfiguration/Keyman_Developer_Test/466684 also |
|
Trying it out locally with a temporary script... function inner_test ( ) {
npm install @keymanapp/totally-not-a-package-that-is-distributed-so-it-should-make-an-error
}
try_multiple_times inner_testI get the following if I disconnect from the internet mid-install: It's not an Also of note: the retry script totally worked with npm's error outputs in this case - on both my Windows and macOS machines. |
|
Ah, inserting the invalid |
|
Well, the good news is... the retry mechanism definitely works now. Might need a spot of cleanup after failed attempts, though. And... wait, what's this about a web/ specific |
|
Noted in discussion: it appears that our Linux BAs may be hitting out-of-memory due to VM restrictions that are then triggering the active This was noticed via cross-reference with https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Linux/467596?buildTab=log&linesState=86&logView=flowAware&focusLine=102, which reported an error code of 137, which is indicative of memory issues. |
mcdurdin
left a comment
There was a problem hiding this comment.
Noted in discussion: it appears that our Linux BAs may be hitting out-of-memory due to VM restrictions that are then triggering the active npm ci call to be killed. That could easily leave the file-system updates halfway, which would then lead to the ENOTEMPTY error that followed in the prior log.
The exit code for OOM kill is 137 (https://stackoverflow.com/questions/53245385/npm-gets-killed-or-errno-137) so if we captured that we could do a cleanup of node_modules by wrapping the npm ci into yet another function.
resources/shellHelperFunctions.sh
Outdated
| sleep $wait_length | ||
| fi | ||
|
|
||
| if ! "$@"; then |
There was a problem hiding this comment.
| if ! "$@"; then | |
| if ! "$@"; then | |
| builder_echo "Command failed with error $?" |
If would be really helpful to capture the error code here and report it. (Need to test that the exit code is not already lost; if it is then we'll need a slightly more complicated solution).
There was a problem hiding this comment.
I should've checked it locally first:
[web/src/app/browser] Command failed with error 0
Co-authored-by: Marc Durdin <marc@durdin.net>
|
Changes in this pull request will be available for download in Keyman version 18.0.50-alpha |
Fixes #10350.
I've tested the new
reattempt_if_failingutility function on its own with the following two calls:reattempt_if_failing echo "this should pass"reattempt_if_failing cd "not-a-directory"@keymanapp-test-bot skip