Skip to content

feat: Auto-failover to Quicknode when Infura is down#14139

Merged
mcmire merged 53 commits intomainfrom
add-failovers-for-infura-networks
Apr 9, 2025
Merged

feat: Auto-failover to Quicknode when Infura is down#14139
mcmire merged 53 commits intomainfrom
add-failovers-for-infura-networks

Conversation

@mcmire
Copy link
Copy Markdown
Contributor

@mcmire mcmire commented Mar 20, 2025

Description

When Infura goes down, we want to automatically and invisibly redirect requests from Infura RPC endpoints to Quicknode endpoints.

  • There is not one central Quicknode RPC endpoint that all Infura requests will be forwarded to; rather, each Infura-supported chain gets its own Quicknode endpoint. These URLs contain an API key, so we need to add an environment variable for each endpoint.
  • All of the Infura RPC endpoints that have an associated Quicknode endpoint are:
    • Ethereum Mainnet
    • Linea Mainnet
    • Arbitrum Mainnet
    • Avalanche
    • Optimism (OP) Mainnet
    • Polygon Mainnet
    • Base Mainnet
  • The failover behavior itself is already present in @metamask/network-controller, where a new failoverRpcUrls property has been added to the RpcEndpoint type. This commit bumps @metamask/network-controller to bring in those changes, and then it not only adds a migration to go through each RPC endpoint in NetworkController state and populate the failoverRpcUrls property, it also updates the code responsible for the "additional networks" feature to ensure that Infura RPC endpoints get added with failoverRpcUrls set correctly. In addition, any other place that is responsible for adding a network, we need to make sure we pass failoverRpcUrls as an empty array (since it's a required configuration property).
  • Finally, this commit adds some minimal UI to allow users to see that Infura endpoints have a failover URL assigned. Since all of the Quicknode URLs are long, this ends up getting cut off, but that doesn't matter.

Related issues

Fixes #14120.

Manual testing steps

There are 7 environment variables that you need to have set in .js.env before you begin. Please contact me in Slack and I will give them to you.

Testing the UI changes

  1. Run yarn setup / yarn setup:expo and then run yarn watch.
  2. Open the app.
  3. Set up an account if necessary
  4. Open the network switcher
  5. Click on "Edit" next to "Ethereum Mainnet"
  6. Under RPC Url, you should see a truncated version of the Quicknode URL below mainnet.infura.io
  7. Open the network switcher again
  8. Add "Avalanche C-Chain" (which is an Infura-supported "featured" network)
  9. Under RPC Url, you should see a truncated version of the Quicknode URL below the Avalanche Infura URL
  10. Open the network switcher again
  11. Tap "Add a custom network"
  12. Add a new network, such as "Flare Mainnet": https://chainlist.wtf/chain/14/. Type in a network name, add a new RPC URL, then type in the chain ID and the symbol.
    • Make sure to put the RPC URL first and then the RPC name, or else you won't be able to save.
  13. Try switching to the new network. You should not see any errors doing so.
  14. Click on the network switcher and edit Flare. Under Default RPC URL you should not see a failover RPC URL listed like you did for Ethereum Mainnet or Avalanche.
  15. Open the browser, go to docs.metamask.io, tap the hamburger menu, go to Wallet API, tap the menu again, go to JSON-RPC API, tap the menu again, go to wallet_addEthereumChain.
  16. Click on "Connect MetaMask".
  17. Scroll down and run the request (which should add Gnosis). Accept the approval.
  18. You should get a modal saying that Gnosis has been switched to.
  19. Tap back on the wallet view and open the network switcher.
  20. Go to edit Gnosis. Under RPC Url you should not see a failover RPC URL listed like you did for Ethereum Mainnet or Avalanche.

Testing the failover logic

  1. Open this branch in your editor.
  2. We need to lower the time that an unavailable endpoint is ignored while the failover is active. Open app/core/Engine/Engine.ts. Look for const networkControllerOpts = and make the following changes:
      const networkControllerOpts = {
        infuraProjectId: process.env.MM_INFURA_PROJECT_ID || NON_EMPTY,
        state: initialNetworkControllerState,
        messenger: networkControllerMessenger,
        getRpcServiceOptions: () => ({
          fetch,
          btoa,
    +     policyOptions: {
    +       maxRetries: 1,
    +       maxConsecutiveFailures: 10,
    +       circuitBreakDuration: 10000,
    +     },
        }),
      };
  3. Open node_modules/@metamask/network-controller/dist/rpc-service/rpc-service.cjs.
  4. We now need to know what URL we are requesting and also force Infura to be "down" for a little while. Find exports.RpcService = RpcService. Make the following changes:
      exports.RpcService = RpcService;
    + const startDate = new Date().getTime();
      _RpcService_fetch = new WeakMap(), _RpcService_fetchOptions = new WeakMap(), _RpcService_failoverService = new WeakMap(), _RpcService_policy = new WeakMap(), _RpcService_instances = new WeakSet(), _RpcService_getDefaultFetchOptions = function _RpcService_getDefaultFetchOptions(endpointUrl, fetchOptions, givenBtoa) {
          if (endpointUrl.username && endpointUrl.password) {
              const authString = `${endpointUrl.username}:${endpointUrl.password}`;
              const encodedCredentials = givenBtoa(authString);
              return (0, deepmerge_1.default)(fetchOptions, {
                  headers: { Authorization: `Basic ${encodedCredentials}` },
              });
          }
          return fetchOptions;
      }, _RpcService_getCompleteFetchOptions = function _RpcService_getCompleteFetchOptions(jsonRpcRequest, fetchOptions) {
          const defaultOptions = {
              method: 'POST',
              headers: {
                  Accept: 'application/json',
                  'Content-Type': 'application/json',
              },
          };
          const mergedOptions = (0, deepmerge_1.default)(defaultOptions, (0, deepmerge_1.default)(__classPrivateFieldGet(this, _RpcService_fetchOptions, "f"), fetchOptions));
          const { id, jsonrpc, method, params } = jsonRpcRequest;
          const body = JSON.stringify({
              id,
              jsonrpc,
              method,
              params,
          });
          return { ...mergedOptions, body };
      }, _RpcService_executePolicy =
      /**
       * Makes the request using the Cockatiel policy that this service creates.
       *
       * @param jsonRpcRequest - The JSON-RPC request to send to the endpoint.
       * @param fetchOptions - The options for `fetch`; will be combined with the
       * fetch options passed to the constructor
       * @returns The decoded JSON-RPC response from the endpoint.
       * @throws A "method not found" error if the response status is 405.
       * @throws A rate limiting error if the response HTTP status is 429.
       * @throws A timeout error if the response HTTP status is 503 or 504.
       * @throws A generic error if the response HTTP status is not 2xx but also not
       * 405, 429, 503, or 504.
       */
      async function _RpcService_executePolicy(jsonRpcRequest, fetchOptions) {
          return await __classPrivateFieldGet(this, _RpcService_policy, "f").execute(async () => {
              const response = await __classPrivateFieldGet(this, _RpcService_fetch, "f").call(this, this.endpointUrl, fetchOptions);
    +         console.log(`=== Fetching ${this.endpointUrl.toString()} ===`);
    +         const now = new Date().getTime();
    +         if (this.endpointUrl.host.includes('infura.io') && (now - startDate) < 120000) {
    +           throw new Error("Infura is down");
    +         }
              if (response.status === 405) {
  5. Shake the device, reload the app, and wait for the dev server to rebundle.
  6. In your terminal, press Cmd-F and start a search for === Fetching.
  7. You'll see a bunch of messages flying by. Some of these are Sentry logging messages, which you can ignore. You'll start seeing === Fetching <URL> === messages where it's an Infura URL, followed by the error we just created ("Infura is down").
  8. After about 10-20 seconds, you'll start seeing === Fetching <URL> === messages where instead of an Infura URL, it's a Quicknode URL (https://<descriptive-random-string>.quicknode.pro/<api-key>). You have to look closely, because right after that you will still continue to see the "Infura is down" message.
  9. After about 10-20 more seconds you'll start to see errors like "Execution prevented because the circuit breaker is down".
  10. Pretty instantly after that, Infura should "recover" and you should start seeing no more errors.

Screenshots/Recordings

Before

After

This is what it now looks like when editing an Infura network:

Screenshot_1743790496

Screenshot_1743790510

This is what it looks like when editing a custom network:

Screenshot_1743790531

Here is a video that goes through all Quicknode-suppported Infura networks and then shows selecting a custom RPC endpoint vs. Infura RPC endpoint:

mobile.demo.mp4

Pre-merge author checklist

Pre-merge reviewer checklist

  • I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
  • I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

When Infura goes down, we want to automatically and invisibly redirect
requests from Infura RPC endpoints to Quicknode endpoints.

- There is not one central Quicknode RPC endpoint that all Infura
  requests will be forwarded to; rather, each Infura-supported chain
  gets its own Quicknode endpoint. These URLs contain an API key, so we
  need to add an environment variable for each endpoint.
- All of the Infura RPC endpoints that have an associated Quicknode
  endpoint are:
  - Ethereum Mainnet
  - Linea Mainnet
  - Arbitrum Mainnet
  - Avalanche
  - Optimism (OP) Mainnet
  - Polygon Mainnet
  - Base Mainnet
- The failover behavior itself is already present in
  `@metamask/network-controller`, where a new `failoverRpcUrls` property
  has been added to the `RpcEndpoint` type. This commit bumps
  `@metamask/network-controller` to bring in those changes, and then it
  not only adds a migration to go through each RPC endpoint in
  NetworkController state and populate the `failoverRpcUrls` property,
  it also updates the code responsible for the "additional networks"
  feature to ensure that Infura RPC endpoints get added with
  `failoverRpcUrls` set correctly. In addition, any other place that is
  responsible for adding a network, we need to make sure we pass
  `failoverRpcUrls` as an empty array (since it's a required
  configuration property).
- Finally, this commit adds some minimal UI to allow users to see that
  Infura endpoints have a failover URL assigned. Since all of the
  Quicknode URLs are long, this ends up getting cut off, but that
  doesn't matter.
@github-actions
Copy link
Copy Markdown
Contributor

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

@metamaskbot metamaskbot added the team-wallet-framework-deprecated DEPRECATED: please use "team-core-platform" instead label Mar 20, 2025
@mcmire mcmire changed the title Auto-failover to Quicknode when Infura is down feat: Auto-failover to Quicknode when Infura is down Mar 20, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 20, 2025

https://bitrise.io/ Bitrise

❌❌❌ pr_smoke_e2e_pipeline failed on Bitrise! ❌❌❌

Commit hash: 536b007
Build link: https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/e8888685-ab0f-43cf-a01b-3ef69d30f8fc

Note

  • You can kick off another pr_smoke_e2e_pipeline on Bitrise by removing and re-applying the Run Smoke E2E label on the pull request

Tip

  • Check the documentation if you have any doubts on how to understand the failure on bitrise

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 21, 2025

https://bitrise.io/ Bitrise

❌❌❌ pr_smoke_e2e_pipeline failed on Bitrise! ❌❌❌

Commit hash: 420f8be
Build link: https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/dadbbd5e-0308-4742-a9a4-791ed8ec3f7f

Note

  • You can kick off another pr_smoke_e2e_pipeline on Bitrise by removing and re-applying the Run Smoke E2E label on the pull request

Tip

  • Check the documentation if you have any doubts on how to understand the failure on bitrise

@mcmire mcmire mentioned this pull request Apr 8, 2025
9 tasks
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 82.05128% with 28 lines in your changes missing coverage. Please review.

Project coverage is 66.62%. Comparing base (44bb493) to head (75d32d2).
Report is 52 commits behind head on main.

Files with missing lines Patch % Lines
...Settings/NetworksSettings/NetworkSettings/index.js 69.23% 8 Missing ⚠️
app/core/Engine/Engine.ts 60.00% 7 Missing and 1 partial ⚠️
...rs/network-controller/messenger-action-handlers.ts 54.54% 5 Missing ⚠️
app/util/networks/customNetworks.tsx 83.33% 1 Missing and 1 partial ⚠️
app/util/onlyKeepHost.ts 66.66% 1 Missing and 1 partial ⚠️
...nts-temp/CellSelectWithMenu/CellSelectWithMenu.tsx 83.33% 0 Missing and 1 partial ⚠️
app/components/UI/NetworkModal/index.tsx 50.00% 1 Missing ⚠️
.../components/UI/Ramp/hooks/useRampNetworksDetail.ts 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14139      +/-   ##
==========================================
+ Coverage   66.29%   66.62%   +0.32%     
==========================================
  Files        2248     2263      +15     
  Lines       48037    48397     +360     
  Branches     6766     6853      +87     
==========================================
+ Hits        31848    32246     +398     
+ Misses      14172    14115      -57     
- Partials     2017     2036      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 8, 2025

Copy link
Copy Markdown
Contributor

@Prithpal-Sooriya Prithpal-Sooriya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from an Assets CO approval
(it seems some of our tests had some minor changes)

Copy link
Copy Markdown
Member

@jiexi jiexi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving for wallet_addEthereumChain handler changes

Copy link
Copy Markdown
Contributor

@Cal-L Cal-L left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@georgeweiler
Copy link
Copy Markdown
Contributor

Ramp changes LGTM 🚀

Copy link
Copy Markdown
Contributor

@cryptodev-2s cryptodev-2s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mcmire mcmire enabled auto-merge April 9, 2025 15:59
@mcmire mcmire added this pull request to the merge queue Apr 9, 2025
sethkfman pushed a commit that referenced this pull request Apr 9, 2025
…14408)

<!--
Please submit this PR as a draft initially.
Do not mark it as "Ready for review" until the template has been
completely filled out, and PR status checks have passed at least once.
-->

## **Description**

<!--
Write a short description of the changes included in this pull request,
also include relevant motivation and context. Have in mind the following
questions:
1. What is the reason for the change?
2. What is the improvement/solution?
-->

These upgrades support the RPC failover and MegaETH initiatives.

The patch for `network-controller` is no longer needed because the
changes there have been integrated into `network-controller`.

## **Related issues**

Unblocks #14139, and
other initiatives.

## **Manual testing steps**

This PR should have no functional changes, everything should work the
same way.

1. Check out this branch and go through the setup steps to load the app
onto your device or emulator.
2. Create an account if necessary.
3. Open the wallet to go the home screen. You should see no errors in
your local console.
4. Try to switch the network. You should see no errors.
5. Try to send a transaction. You should see no errors.
6. Open the browser, go to `docs.metamask.io`, tap the hamburger menu,
go to Wallet API, tap the menu again, go to JSON-RPC API, tap the menu
again, go to `wallet_addEthereumChain`.
7. Click on "Connect MetaMask".
8. Scroll down and run the request (which should add Gnosis). Accept the
approval.
9. You should get a modal saying that Gnosis has been switched to. You
should not see any errors in your local terminal.

## **Screenshots/Recordings**

<!-- If applicable, add screenshots and/or recordings to visualize the
before and after of your change. -->

(No screenshots/recordings, as everything should work the same way.)

### **Before**

<!-- [screenshots/recordings] -->

### **After**

<!-- [screenshots/recordings] -->

## **Pre-merge author checklist**

- [x] I’ve followed [MetaMask Contributor
Docs](https://github.com/MetaMask/contributor-docs) and [MetaMask Mobile
Coding
Standards](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/CODING_GUIDELINES.md).
- [x] I've completed the PR template to the best of my ability
- [x] I’ve included tests if applicable
- [x] I’ve documented my code using [JSDoc](https://jsdoc.app/) format
if applicable
- [x] I’ve applied the right labels on the PR (see [labeling
guidelines](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/LABELING_GUIDELINES.md)).
Not required for external contributors.

## **Pre-merge reviewer checklist**

- [ ] I've manually tested the PR (e.g. pull and build branch, run the
app, test code being changed).
- [ ] I confirm that this PR addresses all acceptance criteria described
in the ticket it closes and includes the necessary testing evidence such
as recordings and or screenshots.
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 9, 2025
@mcmire mcmire added this pull request to the merge queue Apr 9, 2025
@sethkfman sethkfman removed this pull request from the merge queue due to the queue being cleared Apr 9, 2025
@mcmire mcmire added this pull request to the merge queue Apr 9, 2025
Merged via the queue into main with commit 420add2 Apr 9, 2025
39 checks passed
@mcmire mcmire deleted the add-failovers-for-infura-networks branch April 9, 2025 21:03
@github-actions github-actions bot locked and limited conversation to collaborators Apr 9, 2025
@metamaskbot metamaskbot added the release-7.45.0 Issue or pull request that will be included in release 7.45.0 label Apr 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

release-7.45.0 Issue or pull request that will be included in release 7.45.0 team-wallet-framework-deprecated DEPRECATED: please use "team-core-platform" instead

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-failover to Quicknode when Infura is down