chore(common): adds retry mechanism for build script npm ci calls by jahorton · Pull Request #11451 · keymanapp/keyman

jahorton · 2024-05-15T05:14:55Z

Fixes #10350.

I've tested the new reattempt_if_failing utility function on its own with the following two calls:

reattempt_if_failing echo "this should pass"
- passes immediately, no retries
reattempt_if_failing cd "not-a-directory"
- fails every time (cd: invalid path) as intended, properly delays between attempts

@keymanapp-test-bot skip

keymanapp-test-bot · 2024-05-15T05:14:58Z

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

Android
Developer
iOS
- Keyman for iOS (simulator image)
- FirstVoices Keyboards for iOS (simulator image)
- TestFlight internal PR build version - 18.0.45 (0.11451.11200)
Keyboards
- Test Keyboards
Linux
macOS
- Keyman for macOS
Web
- KeymanWeb Test Home
Windows

mcdurdin

This is looking good. Can we add function documentation at the front, similar to that found in builder.inc.sh, e.g.

keyman/resources/builder.inc.sh

Lines 227 to 243 in 6c00656

    
           # 
        
           # Returns `0` if first parameter is in the array passed as second parameter, 
        
           # where the array may contain globs. 
        
           # 
        
           # ### Parameters 
        
           # 
        
           # * 1: `item`       item to search for in array 
        
           # * 2: `array`      bash array, e.g. `array=(one two three)` 
        
           # 
        
           # ### Example 
        
           # 
        
           # ```bash 
        
           #   array=(foo bar it*) 
        
           #   if _builder_item_in_glob_array "item" "${array[@]}"; then ...; fi 
        
           # ``` 
        
           # 
        
           _builder_item_in_glob_array() {

resources/shellHelperFunctions.sh

Co-authored-by: Marc Durdin <marc@durdin.net>

mcdurdin · 2024-05-17T05:48:29Z

Just discovered https://docs.npmjs.com/cli/v6/using-npm/config#fetch-retries and its friends:

fetch-retries
Default: 2
Type: Number
The "retries" config for the retry module to use when fetching packages from the registry.

fetch-retry-factor
Default: 10
Type: Number
The "factor" config for the retry module to use when fetching packages.

fetch-retry-mintimeout
Default: 10000 (10 seconds)
Type: Number
The "minTimeout" config for the retry module to use when fetching packages.

fetch-retry-maxtimeout
Default: 60000 (1 minute)
Type: Number
The "maxTimeout" config for the retry module to use when fetching packages.

maxsockets
Default: 50
Type: Number
The maximum number of connections to use per origin (protocol/host/port combination). Passed to the http Agent used to make the request.

We might wish to consider these because they'll be much cleaner than retrying the whole npm ci?

mcdurdin · 2024-05-17T05:49:51Z

Also noting that the problem is not resolved with this PR, see https://build.palaso.org/buildConfiguration/Keyman_Common_KPAPI_TestPullRequests_macOS/463600

17:42:47   [common/tools/hextobin] ## configure starting...
17:42:48   npm WARN EBADENGINE Unsupported engine {
17:42:48   npm WARN EBADENGINE   package: undefined,
17:42:48   npm WARN EBADENGINE   required: { node: '^18.x' },
17:42:48   npm WARN EBADENGINE   current: { node: 'v21.7.3', npm: '10.5.1' }
17:42:48   npm WARN EBADENGINE }
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-node-xml2js.git
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
17:42:49   npm WARN deprecated @npmcli/move-file@2.0.1: This functionality has been moved to @npmcli/fs
17:43:17   npm ERR! code 1
17:43:17   npm ERR! git dep preparation failed
17:43:17   npm ERR! command /opt/homebrew/Cellar/node/21.7.3/bin/node /opt/homebrew/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/Users/marc_durdin/.npm --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
17:43:17   npm ERR! npm WARN using --force Recommended protections disabled.
17:43:17   npm ERR! npm ERR! code ECONNRESET
17:43:17   npm ERR! npm ERR! errno ECONNRESET
17:43:17   npm ERR! npm ERR! network Invalid response body while trying to fetch https://registry.npmjs.org/@parcel%2fgraph: aborted
17:43:17   npm ERR! npm ERR! network This is a problem related to network connectivity.
17:43:17   npm ERR! npm ERR! network In most cases you are behind a proxy or have bad network settings.
17:43:17   npm ERR! npm ERR! network
17:43:17   npm ERR! npm ERR! network If you are behind a proxy, please make sure that the
17:43:17   npm ERR! npm ERR! network 'proxy' config is set properly.  See: 'npm help config'
17:43:17   npm ERR!
17:43:17   npm ERR! npm ERR! A complete log of this run can be found in: /Users/marc_durdin/.npm/_logs/2024-05-16T10_42_51_434Z-debug-0.log
17:43:17   
17:43:17   npm ERR! A complete log of this run can be found in: /Users/marc_durdin/.npm/_logs/2024-05-16T10_42_47_808Z-debug-0.log
17:43:17   [common/tools/hextobin] Expected output: 'node_modules'.

That last line is suspicious ... is the retry code working correctly?

mcdurdin

With my previous comment -- I don't think this is working correctly

jahorton · 2024-05-29T08:51:17Z

Just discovered https://docs.npmjs.com/cli/v6/using-npm/config#fetch-retries and its friends:

[...]

Following up on the quoted comment... viewing its link shows the following:

The "retries" config for the retry module to use when fetching packages from the registry.

Based on that, I believe that those settings affect npm's use of the retry package as seen at https://www.npmjs.com/package/retry.

factor: The exponential factor to use. Default is 2.

[...]

The formula used to calculate the individual timeouts is:
Math.min(random * minTimeout * Math.pow(factor, attempt), maxTimeout)

jahorton · 2024-05-29T09:08:29Z

Inspecting that build log, I don't think npm's retry module even kicked in. That, or there are custom defaults set on the agent that provide less retries than npm's default.

17:42:47   [common/tools/hextobin] ## configure starting...
17:42:48   npm WARN EBADENGINE Unsupported engine {
17:42:48   npm WARN EBADENGINE   package: undefined,
17:42:48   npm WARN EBADENGINE   required: { node: '^18.x' },
17:42:48   npm WARN EBADENGINE   current: { node: 'v21.7.3', npm: '10.5.1' }
17:42:48   npm WARN EBADENGINE }
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-node-xml2js.git
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
17:42:48   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
17:42:49   npm WARN deprecated @npmcli/move-file@2.0.1: This functionality has been moved to @npmcli/fs
17:43:17   npm ERR! code 1
17:43:17   npm ERR! git dep preparation failed
17:43:17   npm ERR! command /opt/homebrew/Cellar/node/21.7.3/bin/node /opt/homebrew/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/Users/marc_durdin/.npm --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
17:43:17   npm ERR! npm WARN using --force Recommended protections disabled.
17:43:17   npm ERR! npm ERR! code ECONNRESET
17:43:17   npm ERR! npm ERR! errno ECONNRESET
17:43:17   npm ERR! npm ERR! network Invalid response body while trying to fetch https://registry.npmjs.org/@parcel%2fgraph: aborted
[...]

Note the elapsed time: approximately 30 seconds in total.

The retry parameterization quoted above raises questions.

Default retry-factor of 2: asserts that after an initial failure, two retry attempts were made.
fetch-retry-factor of 10: asserts that the second retry's timeouts should be multiplied by 10.
fetch-retry-mintimeout of 10000 (10 seconds)

Shouldn't there be at least 110 seconds between initial npm call and the final, reported failure? mintimeout x factor seem to suggest this. retry's documentation says that the "min" and "max" timeouts are supposed to be a delay between tries, rather than limits for the retry's elapsed time. So... why did it it take only about 30 seconds to fail? Is there a custom .npmrc already on the build agents that's set more aggressively?

mcdurdin · 2024-05-29T23:24:31Z

Is there a custom .npmrc already on the build agents that's set more aggressively?

No.

mcdurdin · 2024-05-30T01:17:48Z

https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Windows/466721 is another example where npm retry may help -- in this case, not network related, but local fs related (possibly security software?):

13:39:39   npm WARN cleanup     [Error: EPERM: operation not permitted, rmdir 'C:\BuildAgent\work\99b311828f4ee7c\keyman\node_modules\@microsoft\tsdoc-config\lib\__tests__'] {
13:39:39   npm WARN cleanup       errno: -4048,
13:39:39   npm WARN cleanup       code: 'EPERM',
13:39:39   npm WARN cleanup       syscall: 'rmdir',
13:39:39   npm WARN cleanup       path: 'C:\\BuildAgent\\work\\99b311828f4ee7c\\keyman\\node_modules\\@microsoft\\tsdoc-config\\lib\\__tests__'
13:39:39   npm WARN cleanup     }
13:39:39   npm WARN cleanup   ]
13:39:39   npm WARN cleanup ]
13:39:39   npm ERR! code 1
13:39:39   npm ERR! git dep preparation failed
13:39:39   npm ERR! command C:\Program Files\nodejs\node.exe C:\Users\bob\global_node\node_modules\npm\bin\npm-cli.js install --force --cache=C:\Users\bob\AppData\Local\npm-cache --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run
13:39:39   npm ERR! npm WARN using --force Recommended protections disabled.
13:39:39   npm ERR! npm ERR! code EBUSY
13:39:39   npm ERR! npm ERR! syscall open
13:39:39   npm ERR! npm ERR! path C:\Users\bob\AppData\Local\npm-cache\_cacache\index-v5\77\5e\7e30208cca13688147d134de1109e20935222521f11e1ceef8001daae2ed
13:39:39   npm ERR! npm ERR! errno EBUSY
13:39:39   npm ERR! npm ERR! Invalid response body while trying to fetch https://registry.npmjs.org/ansi-colors: EBUSY: resource busy or locked, open 'C:\Users\bob\AppData\Local\npm-cache\_cacache\index-v5\77\5e\7e30208cca13688147d134de1109e20935222521f11e1ceef8001daae2ed'
13:39:39   npm ERR!
13:39:39   npm ERR! npm ERR! A complete log of this run can be found in: C:\Users\bob\AppData\Local\npm-cache\_logs\2024-05-29T06_36_26_705Z-debug-0.log

and https://build.palaso.org/buildConfiguration/Keyman_Developer_Test/466684 also

resources/shellHelperFunctions.sh

jahorton · 2024-05-31T07:26:21Z

Trying it out locally with a temporary script...

function inner_test ( ) {
  npm install @keymanapp/totally-not-a-package-that-is-distributed-so-it-should-make-an-error
}

try_multiple_times inner_test

I get the following if I disconnect from the internet mid-install:

$ ./temp.sh
npm ERR! code E404
npm ERR! 404 Not Found - GET https://registry.npmjs.org/@keymanapp%2ftotally-not-a-package-that-is-distributed-so-it-should-make-an-error - Not found
npm ERR! 404
npm ERR! 404  '@keymanapp/totally-not-a-package-that-is-distributed-so-it-should-make-an-error@*' is not in this registry.
npm ERR! 404
npm ERR! 404 Note that you can also install from a
npm ERR! 404 tarball, folder, http url, or git url.

npm ERR! A complete log of this run can be found in: C:\Users\User\AppData\Local\npm-cache\_logs\2024-05-31T07_17_09_049Z-debug-0.log
Delaying 39 seconds before attempt 2: `inner_test`
npm ERR! code ENOTFOUND
npm ERR! syscall getaddrinfo
npm ERR! errno ENOTFOUND
npm ERR! network request to https://registry.npmjs.org/@keymanapp%2ftotally-not-a-package-that-is-distributed-so-it-should-make-an-error failed, reason: getaddrinfo ENOTFOUND registry.npmjs.org
npm ERR! network This is a problem related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network settings.
npm ERR! network
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly.  See: 'npm help config'

npm ERR! A complete log of this run can be found in: C:\Users\User\AppData\Local\npm-cache\_logs\2024-05-31T07_17_50_553Z-debug-0.log
Delaying 77 seconds before attempt 3: `inner_test`
[...]

It's not an ECONNRESET, but we are getting a similar message. Of course, disconnecting from the internet would be "bad network settings".

Also of note: the retry script totally worked with npm's error outputs in this case - on both my Windows and macOS machines.

jahorton · 2024-05-31T08:07:01Z

Ah, inserting the invalid npm install command in place of the npm ci call did the trick - it appears that the surrounding pushd/popd were affecting things. Come to think of it, if the npm ci call failed, it wouldn't have gotten to do the popd afterward.

jahorton · 2024-05-31T08:19:37Z

Well, the good news is... the retry mechanism definitely works now.

https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Web/467970?buildTab=log&focusLine=170&logView=linear&linesState=102

Might need a spot of cleanup after failed attempts, though.

15:12:49   [common/web/keyman-version] Delaying 52 seconds before attempt 2: `npm ci`
15:13:43   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-node-xml2js.git
15:13:43   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
15:13:43   npm WARN skipping integrity check for git dependency ssh://git@github.com/keymanapp/dependency-restructure.git
15:13:50   npm WARN deprecated @npmcli/move-file@2.0.1: This functionality has been moved to @npmcli/fs
15:14:54   [common/web/keyman-version] Delaying 55 seconds before attempt 3: `npm ci`
15:14:54   /var/lib/TeamCity/work/99b311828f4ee7c/keyman/resources/shellHelperFunctions.sh: line 196: 3974199 Killed                  "$@"
15:15:51   npm ERR! code ENOTEMPTY
15:15:51   npm ERR! syscall rename
15:15:51   npm ERR! path /var/lib/TeamCity/work/99b311828f4ee7c/keyman/web/node_modules/typescript
15:15:51   npm ERR! dest /var/lib/TeamCity/work/99b311828f4ee7c/keyman/web/node_modules/.typescript-FshlhPnn
15:15:51   npm ERR! errno -39
15:15:51   npm ERR! ENOTEMPTY: directory not empty, rename '/var/lib/TeamCity/work/99b311828f4ee7c/keyman/web/node_modules/typescript' -> '/var/lib/TeamCity/work/99b311828f4ee7c/keyman/web/node_modules/.typescript-FshlhPnn'
15:15:51   
15:15:51   npm ERR! A complete log of this run can be found in: /home/bob/.npm/_logs/2024-05-31T08_15_49_718Z-debug-0.log

And... wait, what's this about a web/ specific node_modules/typescript? Sure enough, the package.json for web/ is oddly showing the older version that we dropped. Will fix that in the 🔩 PR set (new commit on #11464).

jahorton · 2024-06-03T07:00:47Z

Noted in discussion: it appears that our Linux BAs may be hitting out-of-memory due to VM restrictions that are then triggering the active npm ci call to be killed. That could easily leave the file-system updates halfway, which would then lead to the ENOTEMPTY error that followed in the prior log.

This was noticed via cross-reference with https://build.palaso.org/buildConfiguration/Keyman_Test_Common_Linux/467596?buildTab=log&linesState=86&logView=flowAware&focusLine=102, which reported an error code of 137, which is indicative of memory issues.

mcdurdin

Noted in discussion: it appears that our Linux BAs may be hitting out-of-memory due to VM restrictions that are then triggering the active npm ci call to be killed. That could easily leave the file-system updates halfway, which would then lead to the ENOTEMPTY error that followed in the prior log.

The exit code for OOM kill is 137 (https://stackoverflow.com/questions/53245385/npm-gets-killed-or-errno-137) so if we captured that we could do a cleanup of node_modules by wrapping the npm ci into yet another function.

mcdurdin · 2024-06-03T07:02:43Z

resources/shellHelperFunctions.sh

+    sleep $wait_length
+  fi
+
+  if ! "$@"; then


Suggested change

if ! "$@"; then

if ! "$@"; then

builder_echo "Command failed with error $?"

If would be really helpful to capture the error code here and report it. (Need to test that the exit code is not already lost; if it is then we'll need a slightly more complicated solution).

I should've checked it locally first:

[web/src/app/browser] Command failed with error 0

Co-authored-by: Marc Durdin <marc@durdin.net>

mcdurdin

LGTM

keyman-server · 2024-06-06T18:06:31Z

Changes in this pull request will be available for download in Keyman version 18.0.50-alpha

chore(web): adds retry mechanism for build script npm ci calls

5a0e66c

jahorton requested a review from mcdurdin as a code owner May 15, 2024 05:14

keymanapp-test-bot bot added this to the A18S2 milestone May 15, 2024

github-actions bot added common/ common/resources/ Build infrastructure chore labels May 15, 2024

mcdurdin requested changes May 15, 2024

View reviewed changes

chore(common): adjustments per code review

4705889

jahorton requested a review from mcdurdin May 15, 2024 07:28

mcdurdin approved these changes May 16, 2024

View reviewed changes

resources/shellHelperFunctions.sh Outdated Show resolved Hide resolved

resources/shellHelperFunctions.sh Outdated Show resolved Hide resolved

chore(common): Apply suggestions from code review

e382ddd

Co-authored-by: Marc Durdin <marc@durdin.net>

mcdurdin modified the milestones: A18S2, A18S3 May 24, 2024

mcdurdin requested changes May 28, 2024

View reviewed changes

mcdurdin reviewed May 30, 2024

View reviewed changes

resources/shellHelperFunctions.sh Outdated Show resolved Hide resolved

Joshua Horton added 3 commits May 31, 2024 14:52

fix(common): only retry the npm ci call itself

6251faa

Merge branch 'master' into chore/common/retry-npm-ci

27a959d

change(common): simplifies retry command-execution syntax

b6a06c6

fix(common): revert accidental test-code inclusion

66bdc29

mcdurdin reviewed Jun 3, 2024

View reviewed changes

jahorton requested a review from mcdurdin June 3, 2024 08:19

jahorton and others added 2 commits June 5, 2024 12:46

change(common): emit error-code on retry fail

e2e8ed3

Co-authored-by: Marc Durdin <marc@durdin.net>

fix(common): better error-code capture

c91ffb9

mcdurdin approved these changes Jun 5, 2024

View reviewed changes

jahorton merged commit e9ccd6f into master Jun 6, 2024

jahorton deleted the chore/common/retry-npm-ci branch June 6, 2024 01:10

jahorton mentioned this pull request Jun 10, 2024

chore(web): CI builds are currently extremely unreliable #10494

Closed

	#
	# Returns `0` if first parameter is in the array passed as second parameter,
	# where the array may contain globs.
	#
	# ### Parameters
	#
	# * 1: `item` item to search for in array
	# * 2: `array` bash array, e.g. `array=(one two three)`
	#
	# ### Example
	#
	# ```bash
	# array=(foo bar it*)
	# if _builder_item_in_glob_array "item" "${array[@]}"; then ...; fi
	# ```
	#
	_builder_item_in_glob_array() {

	if ! "$@"; then
	if ! "$@"; then
	builder_echo "Command failed with error $?"

Uh oh!

Conversation

jahorton commented May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keymanapp-test-bot bot commented May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User Test Results

Test Artifacts

Uh oh!

mcdurdin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mcdurdin commented May 17, 2024

Uh oh!

mcdurdin commented May 17, 2024

Uh oh!

mcdurdin left a comment

Choose a reason for hiding this comment

Uh oh!

jahorton commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jahorton commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcdurdin commented May 29, 2024

Uh oh!

mcdurdin commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jahorton commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jahorton commented May 31, 2024

Uh oh!

jahorton commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jahorton commented Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcdurdin left a comment

Choose a reason for hiding this comment

Uh oh!

mcdurdin Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

jahorton Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

mcdurdin left a comment

Choose a reason for hiding this comment

Uh oh!

keyman-server commented Jun 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jahorton commented May 15, 2024 •

edited

Loading

keymanapp-test-bot bot commented May 15, 2024 •

edited

Loading

jahorton commented May 29, 2024 •

edited

Loading

jahorton commented May 29, 2024 •

edited

Loading

mcdurdin commented May 30, 2024 •

edited

Loading

jahorton commented May 31, 2024 •

edited

Loading

jahorton commented May 31, 2024 •

edited

Loading

jahorton commented Jun 3, 2024 •

edited

Loading