Skip to content

ci(INFRA-3593): Phase 1 — Namespace cache for Linux CI trial#29716

Merged
alucardzom merged 19 commits into
mainfrom
phase1-namespace-linux-cache
May 7, 2026
Merged

ci(INFRA-3593): Phase 1 — Namespace cache for Linux CI trial#29716
alucardzom merged 19 commits into
mainfrom
phase1-namespace-linux-cache

Conversation

@alucardzom

@alucardzom alucardzom commented May 5, 2026

Copy link
Copy Markdown
Contributor

Description

INFRA-3593 Phase 1 — adds Namespace Cache Volumes integration to the Linux CI jobs on the namespace-runner-trial branch.

When runner_provider: namespace is dispatched, the 8 Linux CI jobs that install dependencies now:

  1. Mount a Namespace Cache Volume via nscloud-cache-action covering ~/.cache/yarn, .metamask, node_modules, .yarn/cache, and .yarn/install-state.gz
  2. Skip actions/setup-node Yarn caching (set to empty string) to avoid duplicate network-backed cache traffic
  3. Skip actions/cache for node_modules in component-view-tests and merge-unit-and-component-view-tests (Namespace cache already covers it)

When runner_provider: current (the default on every existing trigger), all ternaries collapse to their prior values and behavior is byte-identical to the base branch.

No job is renamed. No default is changed. This is an additive, opt-in change activated only via manual workflow_dispatch with runner_provider: namespace.

Changelog

CHANGELOG entry: null

Related issues

Fixes: INFRA-3593 (parent epic INFRA-3511)
Refs: INFRA-3592 (Phase 0, PR #29557)

Manual testing steps

Feature: Namespace Cache Volumes on Linux CI

  Scenario: dispatch with namespace provider — cache volumes active
    Given the branch phase1-namespace-linux-cache
    When user runs `gh workflow run ci.yml --ref phase1-namespace-linux-cache -f runner_provider=namespace`
    Then all 8 Linux CI jobs with dependencies use nscloud-cache-action
    And actions/setup-node Yarn caching is disabled (no duplicate cache traffic)
    And all jobs pass across every matrix shard (unit-tests x10, component-view-tests x2, scripts x6)

  Scenario: dispatch with current provider — byte-identical to base
    Given the branch phase1-namespace-linux-cache
    When user runs `gh workflow run ci.yml --ref phase1-namespace-linux-cache -f runner_provider=current`
    Then nscloud-cache-action steps are skipped (if condition is false)
    And actions/setup-node uses cache: yarn as before
    And actions/cache for node_modules runs as before
    And all jobs pass on GitHub-hosted runners

  Scenario: implicit current via PR/push trigger
    Given a push or pull_request event (no workflow_dispatch)
    Then inputs.runner_provider is undefined/empty
    And all ternaries collapse to existing behavior

Screenshots/Recordings

Before

N/A

After

N/A — CI infrastructure PR, no UI surface.

Pre-merge author checklist

Performance checks (if applicable)

N/A — workflow YAML only, no app code.

Pre-merge reviewer checklist

  • I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
  • I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

Made with Cursor


Note

Medium Risk
Touches many GitHub Actions workflows to optionally switch runner labels and caching behavior, which could break CI execution or cause cache-related flakiness when enabled. Default behavior remains unchanged unless runner_provider=namespace is explicitly selected.

Overview
Introduces a runner_provider input (with workflow_dispatch choices where relevant) to route jobs between existing runners and new namespace-profile-* runner labels across ci.yml, reusable build workflows, and E2E smoke/regression workflows.

When runner_provider=namespace, Linux CI jobs that install dependencies mount Namespace cache volumes via namespacelabs/nscloud-cache-action, disable actions/setup-node Yarn caching, and skip actions/cache-based node_modules restores in coverage-merge/component-view jobs; Node memory limits are also reduced on Namespace runners.

Updates actionlint configuration to allow the new Namespace runner profile labels.

Reviewed by Cursor Bugbot for commit 3a52111. Bugbot is set up for automated code reviews on this repo. Configure here.

jluque0101 and others added 14 commits April 30, 2026 17:17
  Add metamask-ci-linux profile label, a placeholder for the canonical
  Namespace Linux label (to be replaced before the trial dispatch with
  runner_provider: namespace), and the common nscloud-ubuntu-* inline
  labels so Phase 2 can pick any of them without a follow-up config edit.

  Phase 0 of INFRA-3592. No workflow references these labels yet.
…-4 entry points

Adds the choice input current|namespace (default current) to the five
Phase 1-4 entry-point workflows. No runs-on or job behavior changes
yet — caller forwarding and runs-on ternary land in a follow-up commit.

Phase 0 of INFRA-3592.
…eusables

Adds the optional string input runner_provider (default current) to the
seven Phase 1-4 reusable workflows. Phase 7 reusables (runway-*, nightly,
testflight, etc.) are intentionally not modified — they continue to call
without forwarding, and the default keeps behavior byte-identical.

Phase 0 of INFRA-3592.
Adds with: runner_provider: ${{ inputs.runner_provider }} at every
in-scope caller site (55 sites across 7 caller workflows). Two iOS
build-ios-e2e.yml call sites had no with: block; a new minimal one
is added for them.

Phase 7 caller sites are intentionally not modified — push-eas-update,
nightly-build, runway-*, build-and-upload-to-testflight, build-rc-auto
continue to call without forwarding, the callee defaults to current,
and behavior is byte-identical.

Behavior is unchanged at this point: no runs-on consumes runner_provider
yet — that lands in I.3b.

Phase 0 of INFRA-3592.
Replaces every runs-on line in the in-scope Phase 1-4 workflows with the
additive ternary:

  runs-on: ${{ inputs.runner_provider == 'namespace' && 'nscloud-PLACEHOLDER-CONFIRM-LABEL' || <existing> }}

Where <existing> is the previous literal label or expression. Three sites
already had a ${{ ... }} platform ternary (build.yml setup-dependencies,
run-e2e-workflow.yml test-e2e-mobile, setup-node-modules.yml setup); for
those the existing expression is preserved verbatim inside the
runner_provider == 'namespace' || branch.

29 sites across 10 workflows. With runner_provider: current (the default
on every existing trigger), each ternary collapses to its prior literal
and behavior is byte-identical. The 'namespace' branch points at the
PLACEHOLDER label by design — replacement happens before any
runner_provider: namespace dispatch (see .phase0/namespace-artifacts.md).

Phase 0 of INFRA-3592.
…abels

Resolves Q1 of INFRA-3592 Phase 0. The four profile labels confirmed
live in the metamask Namespace workspace (format: namespace-profile-<name>):

  - namespace-profile-metamask-ci-linux       (Linux CI — Phase 1)
  - namespace-profile-metamask-android-build  (Android — Phase 3)
  - namespace-profile-metamask-ios-build      (iOS build / xl — Phase 4)
  - namespace-profile-metamask-ios-e2e        (iOS E2E test — Phase 4)

Each runs-on ternary now points at the profile that matches the existing
runner class (ubuntu-latest → ci-linux; macos-latest → ios-build; Cirrus
ubuntu-runner-amd64 → android-build; Cirrus macos-runner:tahoe-xl →
ios-build; Cirrus macos-runner:tahoe → ios-e2e). The three pre-existing
platform-driven dynamic expressions are preserved in both branches of
the ternary so Namespace dispatch follows the same iOS/Android branching
as the current runner choice.

actionlint.yaml drops the speculative nscloud-* and metamask-ci-linux
labels (never used) and registers the four canonical labels above.

Behavior on runner_provider: current is unchanged (every ternary still
collapses to its prior literal/expression).

Phase 0 of INFRA-3592.
…g for Phase 1 Linux CI trial

Add nscloud-cache-action before dependency installation in 8 Linux CI
jobs when runner_provider is 'namespace', and disable actions/setup-node
Yarn caching on the Namespace path to avoid duplicate network-backed
cache traffic.

Jobs with both nscloud-cache-action and conditional cache: yarn:
  dedupe, git-safe-dependencies, scripts, js-bundle-size-check,
  unit-tests, merge-unit-and-component-view-tests, component-view-tests

Jobs with nscloud-cache-action only (no cache: yarn to modify):
  sonar-cloud

Jobs with actions/cache for node_modules (component-view-tests,
merge-unit-and-component-view-tests) are gated to skip on Namespace
since nscloud-cache-action already covers node_modules.

Phase 1 of INFRA-3593 / parent epic INFRA-3511.

Co-authored-by: Cursor <cursoragent@cursor.com>
@alucardzom alucardzom self-assigned this May 5, 2026
@metamaskbotv2 metamaskbotv2 Bot added the team-dev-ops DevOps team label May 5, 2026
@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

@alucardzom alucardzom changed the base branch from namespace-runner-trial to main May 5, 2026 08:44
@github-actions github-actions Bot added the size-S label May 5, 2026
@alucardzom alucardzom added the skip-smart-e2e-selection Skip Smart E2E selection, i.e. select all E2E tests to run label May 5, 2026
alucardzom and others added 2 commits May 5, 2026 11:42
…scloud cache paths

nscloud-cache-action mounts each path as a directory, so listing
.yarn/install-state.gz as a path creates a directory mount where Yarn
expects a file, causing EISDIR errors on yarn install.

Replace .yarn/cache and .yarn/install-state.gz with .yarn in all 8
nscloud-cache-action steps. This only affects the Namespace path
(guarded by inputs.runner_provider == 'namespace'); the GitHub-hosted
runner path with actions/cache is unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
…tted files

Mounting .yarn as a cache volume hides committed subdirectories
(.yarn/releases/, .yarn/patches/, .yarn/plugins/) that are checked into
git, causing "Cannot find module yarn-4.10.3.cjs" errors.

Use .yarn/cache (runtime-generated package tarballs) instead. The
install-state.gz file is not cached on Namespace — Yarn regenerates
it during install with negligible overhead.

Co-authored-by: Cursor <cursoragent@cursor.com>
@alucardzom alucardzom marked this pull request as ready for review May 5, 2026 11:08
@alucardzom alucardzom requested review from a team as code owners May 5, 2026 11:08
Comment thread .github/workflows/ci.yml
Resolve conflict in ci.yml: main removed the all-jobs-pass job and
refactored check-all-jobs-pass (PR #29619). Keep main's refactored
structure and apply the runner_provider ternary to the new
check-all-jobs-pass runs-on.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions github-actions Bot added size-M and removed size-S labels May 5, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aeaba63. Configure here.

Comment thread .github/workflows/ci.yml Outdated
alucardzom and others added 2 commits May 5, 2026 13:21
…ng on Namespace

The expression `inputs.runner_provider == 'namespace' && '' || 'yarn'`
always evaluates to 'yarn' because empty string is falsy in GitHub
Actions expressions: true && '' produces '', then '' || 'yarn' falls
through to 'yarn'.

Invert to `inputs.runner_provider != 'namespace' && 'yarn' || ''` so
the cache is correctly disabled when runner_provider is namespace.

Co-authored-by: Cursor <cursoragent@cursor.com>
…vent OOM

Namespace metamask-ci-linux profile (8x16, 16GB RAM) lacks swap space
unlike GitHub-hosted ubuntu-latest runners. NODE_OPTIONS with
max_old_space_size=20480 (20GB) causes OOM kills (SIGKILL) on Jest
workers.

Conditionally lower to 12288 (12GB) when runner_provider is namespace.
The current path retains 20480 unchanged.

Affects: unit-tests (10 shards), component-view-tests (2 shards).
Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

🔍 Smart E2E Test Selection

⏭️ Smart E2E selection skipped - skip-smart-e2e-selection label found

All E2E tests pre-selected.

View GitHub Actions results

@sonarqubecloud

sonarqubecloud Bot commented May 5, 2026

Copy link
Copy Markdown

@github-project-automation github-project-automation Bot moved this to Needs dev review in PR review queue May 5, 2026
Comment thread .github/workflows/build.yml
Comment thread .github/workflows/ci.yml
@github-project-automation github-project-automation Bot moved this from Needs dev review to Review finalised - Ready to be merged in PR review queue May 7, 2026
@alucardzom alucardzom added this pull request to the merge queue May 7, 2026
Merged via the queue into main with commit 967d357 May 7, 2026
371 of 401 checks passed
@alucardzom alucardzom deleted the phase1-namespace-linux-cache branch May 7, 2026 11:39
@github-actions github-actions Bot locked and limited conversation to collaborators May 7, 2026
@metamaskbotv2 metamaskbotv2 Bot added the release-7.77.0 Issue or pull request that will be included in release 7.77.0 label May 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

release-7.77.0 Issue or pull request that will be included in release 7.77.0 size-M skip-smart-e2e-selection Skip Smart E2E selection, i.e. select all E2E tests to run team-dev-ops DevOps team

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants