Skip to content

Feat: local dev perf tests using e2e#16806

Closed
NicolasMassart wants to merge 67 commits into
mainfrom
feat/4428_sample_feature_e2e_perf_tests
Closed

Feat: local dev perf tests using e2e#16806
NicolasMassart wants to merge 67 commits into
mainfrom
feat/4428_sample_feature_e2e_perf_tests

Conversation

@NicolasMassart

@NicolasMassart NicolasMassart commented Jun 30, 2025

Copy link
Copy Markdown
Contributor

Description

This PR implements a comprehensive E2E performance testing system that enables developers to measure, track, and compare performance metrics across test runs during local development.

What is the reason for the change?

Developers need a reliable way to measure performance improvements and regressions when iterating on code changes locally. Traditional manual performance testing is inconsistent and doesn't provide the controlled, repeatable environment needed for accurate performance analysis. There was no systematic way to track performance metrics across E2E test runs or compare results between different code changes during local development.

What is the improvement/solution?

The E2E performance system provides a complete workflow for local performance testing:

  1. Controlled Performance Measurement: Leverages existing Redux performance tracking to collect precise timing data during E2E test execution
  2. Automated Data Collection: Automatically captures performance metrics from fixture server during test runs
  3. Persistent Storage: Saves performance results to timestamped JSON files for historical analysis
  4. Comparison Analysis: Provides tools to compare performance across different test runs and identify improvements/regressions
  5. Developer Workflow Integration: Enables developers to iterate on performance improvements with consistent, repeatable measurements

Key Features:

  • Performance Tracking: Collects precise timing data for user interactions and operations during E2E tests
  • Test Isolation: Each test suite generates its own performance file with unique timestamps
  • Historical Analysis: Maintains performance history for trend analysis and regression detection
  • Comparison Tools: Automated comparison between baseline and current performance results
  • Interactive Interface: User-friendly CLI for exploring and analyzing performance data

Related issues

Fixes: Need for systematic performance testing and comparison in local E2E development workflow

Manual testing steps

  1. Run E2E tests to generate performance data: yarn test:e2e:ios -- --testNamePattern="SampleFeature"
  2. Check generated performance files in e2e-performance-results/ directory
  3. Run performance comparison: npx ts-node scripts/compare-e2e-performance.ts SampleFeature-sample-feature-counter
  4. Test interactive mode: npx ts-node scripts/compare-e2e-performance.ts list
  5. Compare all available tests: npx ts-node scripts/compare-e2e-performance.ts

Screenshots/Recordings

Before

N/A

After

e2e test run logs indicating performances tracking:

image Screenshot 2025-06-30 at 17 58 42

The comparison app displaying the results using yarn test:e2e:performance:compare list:

image

Pre-merge author checklist

Pre-merge reviewer checklist

  • I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
  • I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

NicolasMassart and others added 18 commits June 26, 2025 12:02
- Remove address and name tracking from Sentry traces for privacy protection
- Keep chainId and petNamesCount tracking (safe public/aggregate data)
- Update tests to reflect privacy-focused trace data structure
- Update README documentation to clarify privacy approach
- Maintain performance monitoring capabilities while protecting user data

This ensures user privacy is protected while still providing valuable
performance monitoring and debugging capabilities.
for this specific one, we have to use interface as asked by lint, despite consistency
should be in another PR
@github-actions

Copy link
Copy Markdown
Contributor

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

@NicolasMassart NicolasMassart added DO-NOT-MERGE Pull requests that should not be merged team-mobile-platform Mobile Platform team labels Jun 30, 2025
@NicolasMassart NicolasMassart self-assigned this Jun 30, 2025
github-merge-queue Bot pushed a commit that referenced this pull request Jul 23, 2025
<!--
Please submit this PR as a draft initially.
Do not mark it as "Ready for review" until the template has been
completely filled out, and PR status checks have passed at least once.
-->

## **Description**
# 🎯 Performance Quality Gates

This document outlines the performance thresholds (quality gates) for
MetaMask Mobile's critical user flows. These thresholds ensure optimal
user experience across different platforms and scenarios.

## 📱 Platform Overview

- **Android**: Generally has higher thresholds due to platform
constraints
- **iOS**: Lower thresholds leveraging platform optimizations
- **Quality Gates**: Hard limits that cause test failures if exceeded

## 📊 Test Reporting

Performance tests automatically generate detailed reports using the
`PerformanceTestReporter` utility:
- **JSON Reports**: Structured performance data for analysis
- **Test Results**: Include timing metrics, thresholds, and pass/fail
status
- **User Profile Testing**: Tests run across different user states
(CORE_USER, POWER_USER)

---

## 🏠 Account List Performance Tests

### Test: `render account list efficiently with multiple accounts and
networks`
**Configuration**: Multiple accounts, popular networks, profile syncing
enabled

| Platform | Total Max Time | Notes |
|----------|----------------|--------|
| **Android** | 5000ms | 5 seconds maximum |
| **iOS** | 7,500ms | 7.5 seconds maximum |

---

### Test: `handle account list performance with heavy token load`
**Configuration**: Multiple accounts, popular networks, 10 tokens for
stress testing

| Platform | Total Max Time | Notes |
|----------|----------------|--------|
| **Android** | 5000ms | 5 seconds maximum |
| **iOS** | 7,500ms | 7.5 seconds maximum |

---

### Test: `benchmark account list with minimal load`
**Configuration**: Minimal accounts, default network, 2 tokens (baseline
measurement)

| Platform | Total Max Time | Notes |
|----------|----------------|--------|
| **Android** | 45,000ms | 45 seconds maximum |
| **iOS** | 15,000ms | 15 seconds maximum |

---

### Test: `benchmark switching accounts from the account list`
**Configuration**: Account switching/dismissal performance

| Platform | Dismissal Max Time | Notes |
|----------|-------------------|--------|
| **Android** | 5,000ms | 5 seconds maximum |
| **iOS** | 4,000ms | 4 seconds maximum |

---

## 🌐 Network List Performance Tests

### Test: `render network list efficiently with multiple accounts and
all popular networks`
**Configuration**: Multiple accounts, all popular networks

| Platform | Total Max Time | Notes |
|----------|----------------|--------|
| **Android** | 17,500ms | 17.5 seconds maximum |
| **iOS** | 6,500ms | 6.5 seconds maximum |

---

### Test: `handle network list performance with heavy token load on all
popular networks`
**Configuration**: Multiple accounts, popular networks, 10 tokens for
stress testing

| Platform | Total Max Time | Notes |
|----------|----------------|--------|
| **Android** | 17,500ms | 17.5 seconds maximum |
| **iOS** | 6,500ms | 6.5 seconds maximum |

---

### Test: `benchmark network list with minimal load`
**Configuration**: Minimal tokens, popular networks (baseline
measurement)

| Platform | Render Max Time | Notes |
|----------|----------------|--------|
| **Android** | 2,500ms | 2.5 seconds maximum |
| **iOS** | 1,500ms | 1.5 seconds maximum |

---

### Test: `benchmark switching networks from the network list`
**Configuration**: Network switching/dismissal performance

| Platform | Dismissal Max Time | Notes |
|----------|-------------------|--------|
| **Android** | 2,500ms | 2.5 seconds maximum |
| **iOS** | 1,500ms | 1.5 seconds maximum |

---

## 📊 Test Summary

### Account List Tests (4 tests)
- ✅ **Standard Load**: Multiple accounts, popular networks
- ✅ **Heavy Load**: Multiple accounts, 10 tokens, popular networks  
- ✅ **Baseline Test**: Minimal accounts, 2 tokens, default network
- ✅ **Dismissal Test**: Account switching performance

### Network List Tests (4 tests)
- ✅ **Standard Load**: Multiple accounts, popular networks
- ✅ **Heavy Load**: Multiple accounts, 10 tokens, popular networks
- ✅ **Baseline Test**: Minimal tokens, popular networks
- ✅ **Dismissal Test**: Network switching performance

**Total**: 8 performance tests across critical user flows

---

## 🚨 Quality Gate Rules

### Failure Criteria
Tests fail immediately when total time exceeds the maximum acceptable
time for any scenario.

### Performance Patterns
- **Heavy Token Load**: Increases render times but maintains same
thresholds as standard load
- **Platform Differences**: iOS consistently performs better than
Android
- **Baseline Tests**: Should complete quickly but have generous
thresholds for stability
- **Dismissal Tests**: Focus on UI responsiveness during state
transitions

### User Profile Testing
Tests run across different user states with varying account complexity:

#### User Profile Definitions
- **CASUAL_USER**: 2 EVM accounts from 1 SRP *(currently not used in
tests)*
- **CORE_USER**: 5 EVM accounts and 5 Solana accounts from 1 SRP
- **POWER_USER**: 15 EVM accounts from 2 SRPs + 5 Solana accounts

#### Current Test Coverage
- **CORE_USER**: Standard user configuration for baseline performance
- **POWER_USER**: Enhanced user configuration with maximum account
complexity

---

## 📈 Reporting Features

### Automated Reports
- **JSON Output**: Structured performance data for CI/CD integration
- **Test Metrics**: Total time measurements and performance thresholds
- **Threshold Tracking**: Pass/fail status with actual vs. expected
performance
- **User Profile Results**: Separate results for different user
configurations

### Report Usage
Performance reports can be used for:
- Continuous integration quality gates
- Performance regression detection
- Platform-specific optimization insights
- User experience benchmarking

---


**What can we do to improve it**
* Instead of measuring the action once, we perform the action multiple
times and we get an average and the top duration, both need to have base
lines.
* To get more granular function reports, we can use this sample work
done to measure multiple functions of one flow, for example if add
account if it's split in 2 functions, instead of one big measure, we can
measure both functions, and get more deep knowledge of where we can
improve #16806


**Bitrise Runs**
*
https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/05c14c4e-4868-43b0-9642-0e5d69ebd82b?tab=workflows
*
https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/9b63202e-8944-46b5-8867-ab94f748f37f
*
https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/98ca0cf1-edfa-4888-9e38-ef64422d6833
*
https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/7857fee5-3f27-41a6-90a9-0332e1a81435


Table if 8 bitrise runs results
| **Test Name** | **Platform** | **Total Time (Runs)** | **Min** |
**Max** | **Delta** | **Average** |

|---------------------------------------------------------|--------------|--------------------------------------------------------------------------------------|---------|---------|-----------|-------------|
| render account list efficiently with multiple accounts and networks |
IOS | 2.54, 2.61, 3.65, 2.46, 2.88, 3.1, 2.94, 2.87 | 2.46 | 3.65 | 1.19
| 2.88 |
| render account list efficiently with multiple accounts and networks |
ANDROID | 2.83, 2.64, 2.92, 2.28, 2.62, 2.76, 1.95, 2.37 | 1.95 | 2.92 |
0.97 | 2.55 |
| handle account list performance with heavy token load | IOS | 4.01,
5.72, 4.02, 5.3, 2.49, 2.48, 2.67, 2.78 | 2.48 | 5.72 | 3.24 | 3.93 |
| handle account list performance with heavy token load | ANDROID |
2.15, 2.51, 1.97, 2.78, 2.13, 2.29, 2.23, 2.41 | 1.97 | 2.78 | 0.81 |
2.29 |
| benchmark account list with minimal load | IOS | 2.86, 3.48, 2.87,
3.32, 2.81, 3.35, 2.85, 3.13 | 2.81 | 3.48 | 0.67 | 3.08 |
| benchmark account list with minimal load | ANDROID | 2.08, 2.17, 2.59,
2.28, 2.07, 2.19, 2.13, 2.49 | 2.07 | 2.59 | 0.52 | 2.25 |
| benchmark switching networks from the network list | IOS | 6.61, 8.11,
7.65, 8.08, 7.64, 6.53, 6.49, 6.55 | 6.49 | 8.11 | 1.62 | 7.34 |
| benchmark switching networks from the network list | ANDROID | 7.68,
4.26, 4.11, 5.6, 4.04, 4.32, 4.11, 4.28 | 4.04 | 7.68 | 3.64 | 4.92 |
| benchmark switching accounts from the account list | IOS | 5.99, 4.37,
3.14, 4.55, 3.36, 4.33 | 3.14 | 5.99 | 2.85 | 4.29 |
| benchmark switching accounts from the account list | ANDROID | 2.01,
2.28, 1.71, 2.05, 1.77, 2.28, 1.75, 2.39 | 1.71 | 2.39 | 0.68 | 2.06 |

## **Related issues**

Fixes:

## **Manual testing steps**

1. Go to this page...
2.
3.

## **Screenshots/Recordings**

<!-- If applicable, add screenshots and/or recordings to visualize the
before and after of your change. -->

### **Before**

<!-- [screenshots/recordings] -->

### **After**

<!-- [screenshots/recordings] -->

## **Pre-merge author checklist**

- [ ] I’ve followed [MetaMask Contributor
Docs](https://github.com/MetaMask/contributor-docs) and [MetaMask Mobile
Coding
Standards](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/CODING_GUIDELINES.md).
- [ ] I've completed the PR template to the best of my ability
- [ ] I’ve included tests if applicable
- [ ] I’ve documented my code using [JSDoc](https://jsdoc.app/) format
if applicable
- [ ] I’ve applied the right labels on the PR (see [labeling
guidelines](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/LABELING_GUIDELINES.md)).
Not required for external contributors.

## **Pre-merge reviewer checklist**

- [ ] I've manually tested the PR (e.g. pull and build branch, run the
app, test code being changed).
- [ ] I confirm that this PR addresses all acceptance criteria described
in the ticket it closes and includes the necessary testing evidence such
as recordings and or screenshots.

---------

Co-authored-by: Curtis David <Curtis.David7@gmail.com>
@NicolasMassart NicolasMassart force-pushed the feat/4428_sample_feature branch from 14b4243 to d5169b6 Compare September 2, 2025 13:03
@NicolasMassart NicolasMassart force-pushed the feat/4428_sample_feature branch from 223c084 to 946bde3 Compare October 22, 2025 10:26
Base automatically changed from feat/4428_sample_feature to main October 22, 2025 18:33
@github-actions github-actions Bot locked and limited conversation to collaborators Nov 3, 2025
@NicolasMassart

Copy link
Copy Markdown
Contributor Author

PoC PR, not expected to be merged

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

DO-NOT-MERGE Pull requests that should not be merged team-mobile-platform Mobile Platform team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant