Feat: local dev perf tests using e2e by NicolasMassart · Pull Request #16806 · MetaMask/metamask-mobile

NicolasMassart · 2025-06-30T16:09:47Z

Description

This PR implements a comprehensive E2E performance testing system that enables developers to measure, track, and compare performance metrics across test runs during local development.

What is the reason for the change?

Developers need a reliable way to measure performance improvements and regressions when iterating on code changes locally. Traditional manual performance testing is inconsistent and doesn't provide the controlled, repeatable environment needed for accurate performance analysis. There was no systematic way to track performance metrics across E2E test runs or compare results between different code changes during local development.

What is the improvement/solution?

The E2E performance system provides a complete workflow for local performance testing:

Controlled Performance Measurement: Leverages existing Redux performance tracking to collect precise timing data during E2E test execution
Automated Data Collection: Automatically captures performance metrics from fixture server during test runs
Persistent Storage: Saves performance results to timestamped JSON files for historical analysis
Comparison Analysis: Provides tools to compare performance across different test runs and identify improvements/regressions
Developer Workflow Integration: Enables developers to iterate on performance improvements with consistent, repeatable measurements

Key Features:

Performance Tracking: Collects precise timing data for user interactions and operations during E2E tests
Test Isolation: Each test suite generates its own performance file with unique timestamps
Historical Analysis: Maintains performance history for trend analysis and regression detection
Comparison Tools: Automated comparison between baseline and current performance results
Interactive Interface: User-friendly CLI for exploring and analyzing performance data

Related issues

Fixes: Need for systematic performance testing and comparison in local E2E development workflow

Manual testing steps

Run E2E tests to generate performance data: yarn test:e2e:ios -- --testNamePattern="SampleFeature"
Check generated performance files in e2e-performance-results/ directory
Run performance comparison: npx ts-node scripts/compare-e2e-performance.ts SampleFeature-sample-feature-counter
Test interactive mode: npx ts-node scripts/compare-e2e-performance.ts list
Compare all available tests: npx ts-node scripts/compare-e2e-performance.ts

Screenshots/Recordings

Before

N/A

After

e2e test run logs indicating performances tracking:

The comparison app displaying the results using yarn test:e2e:performance:compare list:

Pre-merge author checklist

I've followed MetaMask Contributor Docs and MetaMask Mobile Coding Standards.
I've completed the PR template to the best of my ability
I've included tests if applicable
I've documented my code using JSDoc format if applicable
I've applied the right labels on the PR (see labeling guidelines). Not required for external contributors.

Pre-merge reviewer checklist

I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

Otherwise impossible to build when trying to simply update with main

… feature implementation

…ollers

# Conflicts: # app/core/Engine/Engine.ts

was still using the addressbook controller... and added metrics

- Remove address and name tracking from Sentry traces for privacy protection - Keep chainId and petNamesCount tracking (safe public/aggregate data) - Update tests to reflect privacy-focused trace data structure - Update README documentation to clarify privacy approach - Maintain performance monitoring capabilities while protecting user data This ensures user privacy is protected while still providing valuable performance monitoring and debugging capabilities.

for this specific one, we have to use interface as asked by lint, despite consistency

should be in another PR

github-actions · 2025-06-30T16:09:56Z

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

## **Description** # 🎯 Performance Quality Gates This document outlines the performance thresholds (quality gates) for MetaMask Mobile's critical user flows. These thresholds ensure optimal user experience across different platforms and scenarios. ## 📱 Platform Overview - **Android**: Generally has higher thresholds due to platform constraints - **iOS**: Lower thresholds leveraging platform optimizations - **Quality Gates**: Hard limits that cause test failures if exceeded ## 📊 Test Reporting Performance tests automatically generate detailed reports using the `PerformanceTestReporter` utility: - **JSON Reports**: Structured performance data for analysis - **Test Results**: Include timing metrics, thresholds, and pass/fail status - **User Profile Testing**: Tests run across different user states (CORE_USER, POWER_USER) --- ## 🏠 Account List Performance Tests ### Test: `render account list efficiently with multiple accounts and networks` **Configuration**: Multiple accounts, popular networks, profile syncing enabled | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 5000ms | 5 seconds maximum | | **iOS** | 7,500ms | 7.5 seconds maximum | --- ### Test: `handle account list performance with heavy token load` **Configuration**: Multiple accounts, popular networks, 10 tokens for stress testing | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 5000ms | 5 seconds maximum | | **iOS** | 7,500ms | 7.5 seconds maximum | --- ### Test: `benchmark account list with minimal load` **Configuration**: Minimal accounts, default network, 2 tokens (baseline measurement) | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 45,000ms | 45 seconds maximum | | **iOS** | 15,000ms | 15 seconds maximum | --- ### Test: `benchmark switching accounts from the account list` **Configuration**: Account switching/dismissal performance | Platform | Dismissal Max Time | Notes | |----------|-------------------|--------| | **Android** | 5,000ms | 5 seconds maximum | | **iOS** | 4,000ms | 4 seconds maximum | --- ## 🌐 Network List Performance Tests ### Test: `render network list efficiently with multiple accounts and all popular networks` **Configuration**: Multiple accounts, all popular networks | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 17,500ms | 17.5 seconds maximum | | **iOS** | 6,500ms | 6.5 seconds maximum | --- ### Test: `handle network list performance with heavy token load on all popular networks` **Configuration**: Multiple accounts, popular networks, 10 tokens for stress testing | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 17,500ms | 17.5 seconds maximum | | **iOS** | 6,500ms | 6.5 seconds maximum | --- ### Test: `benchmark network list with minimal load` **Configuration**: Minimal tokens, popular networks (baseline measurement) | Platform | Render Max Time | Notes | |----------|----------------|--------| | **Android** | 2,500ms | 2.5 seconds maximum | | **iOS** | 1,500ms | 1.5 seconds maximum | --- ### Test: `benchmark switching networks from the network list` **Configuration**: Network switching/dismissal performance | Platform | Dismissal Max Time | Notes | |----------|-------------------|--------| | **Android** | 2,500ms | 2.5 seconds maximum | | **iOS** | 1,500ms | 1.5 seconds maximum | --- ## 📊 Test Summary ### Account List Tests (4 tests) - ✅ **Standard Load**: Multiple accounts, popular networks - ✅ **Heavy Load**: Multiple accounts, 10 tokens, popular networks - ✅ **Baseline Test**: Minimal accounts, 2 tokens, default network - ✅ **Dismissal Test**: Account switching performance ### Network List Tests (4 tests) - ✅ **Standard Load**: Multiple accounts, popular networks - ✅ **Heavy Load**: Multiple accounts, 10 tokens, popular networks - ✅ **Baseline Test**: Minimal tokens, popular networks - ✅ **Dismissal Test**: Network switching performance **Total**: 8 performance tests across critical user flows --- ## 🚨 Quality Gate Rules ### Failure Criteria Tests fail immediately when total time exceeds the maximum acceptable time for any scenario. ### Performance Patterns - **Heavy Token Load**: Increases render times but maintains same thresholds as standard load - **Platform Differences**: iOS consistently performs better than Android - **Baseline Tests**: Should complete quickly but have generous thresholds for stability - **Dismissal Tests**: Focus on UI responsiveness during state transitions ### User Profile Testing Tests run across different user states with varying account complexity: #### User Profile Definitions - **CASUAL_USER**: 2 EVM accounts from 1 SRP *(currently not used in tests)* - **CORE_USER**: 5 EVM accounts and 5 Solana accounts from 1 SRP - **POWER_USER**: 15 EVM accounts from 2 SRPs + 5 Solana accounts #### Current Test Coverage - **CORE_USER**: Standard user configuration for baseline performance - **POWER_USER**: Enhanced user configuration with maximum account complexity --- ## 📈 Reporting Features ### Automated Reports - **JSON Output**: Structured performance data for CI/CD integration - **Test Metrics**: Total time measurements and performance thresholds - **Threshold Tracking**: Pass/fail status with actual vs. expected performance - **User Profile Results**: Separate results for different user configurations ### Report Usage Performance reports can be used for: - Continuous integration quality gates - Performance regression detection - Platform-specific optimization insights - User experience benchmarking --- **What can we do to improve it** * Instead of measuring the action once, we perform the action multiple times and we get an average and the top duration, both need to have base lines. * To get more granular function reports, we can use this sample work done to measure multiple functions of one flow, for example if add account if it's split in 2 functions, instead of one big measure, we can measure both functions, and get more deep knowledge of where we can improve #16806 **Bitrise Runs** * https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/05c14c4e-4868-43b0-9642-0e5d69ebd82b?tab=workflows * https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/9b63202e-8944-46b5-8867-ab94f748f37f * https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/98ca0cf1-edfa-4888-9e38-ef64422d6833 * https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/7857fee5-3f27-41a6-90a9-0332e1a81435 Table if 8 bitrise runs results | **Test Name** | **Platform** | **Total Time (Runs)** | **Min** | **Max** | **Delta** | **Average** | |---------------------------------------------------------|--------------|--------------------------------------------------------------------------------------|---------|---------|-----------|-------------| | render account list efficiently with multiple accounts and networks | IOS | 2.54, 2.61, 3.65, 2.46, 2.88, 3.1, 2.94, 2.87 | 2.46 | 3.65 | 1.19 | 2.88 | | render account list efficiently with multiple accounts and networks | ANDROID | 2.83, 2.64, 2.92, 2.28, 2.62, 2.76, 1.95, 2.37 | 1.95 | 2.92 | 0.97 | 2.55 | | handle account list performance with heavy token load | IOS | 4.01, 5.72, 4.02, 5.3, 2.49, 2.48, 2.67, 2.78 | 2.48 | 5.72 | 3.24 | 3.93 | | handle account list performance with heavy token load | ANDROID | 2.15, 2.51, 1.97, 2.78, 2.13, 2.29, 2.23, 2.41 | 1.97 | 2.78 | 0.81 | 2.29 | | benchmark account list with minimal load | IOS | 2.86, 3.48, 2.87, 3.32, 2.81, 3.35, 2.85, 3.13 | 2.81 | 3.48 | 0.67 | 3.08 | | benchmark account list with minimal load | ANDROID | 2.08, 2.17, 2.59, 2.28, 2.07, 2.19, 2.13, 2.49 | 2.07 | 2.59 | 0.52 | 2.25 | | benchmark switching networks from the network list | IOS | 6.61, 8.11, 7.65, 8.08, 7.64, 6.53, 6.49, 6.55 | 6.49 | 8.11 | 1.62 | 7.34 | | benchmark switching networks from the network list | ANDROID | 7.68, 4.26, 4.11, 5.6, 4.04, 4.32, 4.11, 4.28 | 4.04 | 7.68 | 3.64 | 4.92 | | benchmark switching accounts from the account list | IOS | 5.99, 4.37, 3.14, 4.55, 3.36, 4.33 | 3.14 | 5.99 | 2.85 | 4.29 | | benchmark switching accounts from the account list | ANDROID | 2.01, 2.28, 1.71, 2.05, 1.77, 2.28, 1.75, 2.39 | 1.71 | 2.39 | 0.68 | 2.06 | ## **Related issues** Fixes: ## **Manual testing steps** 1. Go to this page... 2. 3. ## **Screenshots/Recordings**  ### **Before**  ### **After**  ## **Pre-merge author checklist** - [ ] I’ve followed [MetaMask Contributor Docs](https://github.com/MetaMask/contributor-docs) and [MetaMask Mobile Coding Standards](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/CODING_GUIDELINES.md). - [ ] I've completed the PR template to the best of my ability - [ ] I’ve included tests if applicable - [ ] I’ve documented my code using [JSDoc](https://jsdoc.app/) format if applicable - [ ] I’ve applied the right labels on the PR (see [labeling guidelines](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/LABELING_GUIDELINES.md)). Not required for external contributors. ## **Pre-merge reviewer checklist** - [ ] I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed). - [ ] I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots. --------- Co-authored-by: Curtis David <Curtis.David7@gmail.com>

NicolasMassart · 2025-11-03T09:40:50Z

PoC PR, not expected to be merged

NicolasMassart and others added 30 commits June 12, 2025 15:04

reapply all sample feature changes on freshly reset branch

c97ba0b

Otherwise impossible to build when trying to simply update with main

Merge branch 'main' into feat/4428_sample_feature

14d1a23

fix unit test

59d101e

update test and add cursor rule for testing

9dc93c3

test(sample-feature): update test snapshots and test files for sample…

3626acb

… feature implementation

chore: add commit rules documentation

ab5538d

Refactor SampleFeature to use Redux only

8ac8b13

update doc

9ffa4ab

feat: implement SamplePetnamesController using @metamask/sample-contr…

71d4d16

…ollers

Merge branch 'main' into feat/4428_sample_feature

e22fed4

# Conflicts: # app/core/Engine/Engine.ts

Merge branch 'main' into feat/4428_sample_feature

d7dd9cf

Merge branch 'main' into feat/4428_sample_feature

561ec6b

fix unit tests

8103845

format

61dc4c3

lint fix

56fa053

fix tests

0a6e4f1

remove unwanted change

d862550

fix sonar

c6fe3bf

Merge branch 'main' into feat/4428_sample_feature

3c360f5

Merge branch 'main' into feat/4428_sample_feature

ff1337f

rename snapshot tests

e2c5c45

update snapshots

ef06d13

fix unit test

dbee2e6

readonly props

df2c112

updated snapshot

1a68ff8

fix path

0d7c521

update to use controller propely

44640a9

was still using the addressbook controller... and added metrics

update tests

caf3c01

format

7b4c8ac

move to top to prevent always conflicting

e26b8bd

NicolasMassart and others added 18 commits June 26, 2025 12:02

update snapshot

3525cbf

exclude e2e from coverage

6338f1a

add sentry performance tracing

ecd08be

remove sensitive info from tracking

0accadb

Merge branch 'main' into feat/4428_sample_feature

892a47b

Merge branch 'main' into feat/4428_sample_feature

6e34f9b

fix type for engine

0a8d476

for this specific one, we have to use interface as asked by lint, despite consistency

remote feature flag

c584799

lint fix

e200956

fix tsc issue

ccde054

Merge branch 'main' into feat/4428_sample_feature

7a3b8da

exclude e2e from sonar coverage check

8d2595a

remove cursor rules

2475e1b

should be in another PR

do not check dup on tests

b58b16e

initial perf test system

2b39c85

remove redundant doc

aaf81e3

use redux perf metrics

3da4f4e

NicolasMassart added DO-NOT-MERGE Pull requests that should not be merged team-mobile-platform Mobile Platform team labels Jun 30, 2025

NicolasMassart self-assigned this Jun 30, 2025

tommasini mentioned this pull request Jul 15, 2025

chore: test loading multi accounts #16732

Merged

7 tasks

NicolasMassart force-pushed the feat/4428_sample_feature branch from 14b4243 to d5169b6 Compare September 2, 2025 13:03

NicolasMassart force-pushed the feat/4428_sample_feature branch from 223c084 to 946bde3 Compare October 22, 2025 10:26

Base automatically changed from feat/4428_sample_feature to main October 22, 2025 18:33

NicolasMassart closed this Nov 3, 2025

github-actions Bot locked and limited conversation to collaborators Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: local dev perf tests using e2e#16806

Feat: local dev perf tests using e2e#16806
NicolasMassart wants to merge 67 commits into
mainfrom
feat/4428_sample_feature_e2e_perf_tests

NicolasMassart commented Jun 30, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 30, 2025

Uh oh!

NicolasMassart commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

NicolasMassart commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What is the reason for the change?

What is the improvement/solution?

Key Features:

Related issues

Manual testing steps

Screenshots/Recordings

Before

After

Pre-merge author checklist

Pre-merge reviewer checklist

Uh oh!

github-actions Bot commented Jun 30, 2025

Uh oh!

NicolasMassart commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NicolasMassart commented Jun 30, 2025 •

edited

Loading