Feat: local dev perf tests using e2e#16806
Closed
NicolasMassart wants to merge 67 commits into
Closed
Conversation
Otherwise impossible to build when trying to simply update with main
… feature implementation
# Conflicts: # app/core/Engine/Engine.ts
was still using the addressbook controller... and added metrics
- Remove address and name tracking from Sentry traces for privacy protection - Keep chainId and petNamesCount tracking (safe public/aggregate data) - Update tests to reflect privacy-focused trace data structure - Update README documentation to clarify privacy approach - Maintain performance monitoring capabilities while protecting user data This ensures user privacy is protected while still providing valuable performance monitoring and debugging capabilities.
for this specific one, we have to use interface as asked by lint, despite consistency
should be in another PR
Contributor
|
CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes. |
7 tasks
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Jul 23, 2025
<!-- Please submit this PR as a draft initially. Do not mark it as "Ready for review" until the template has been completely filled out, and PR status checks have passed at least once. --> ## **Description** # 🎯 Performance Quality Gates This document outlines the performance thresholds (quality gates) for MetaMask Mobile's critical user flows. These thresholds ensure optimal user experience across different platforms and scenarios. ## 📱 Platform Overview - **Android**: Generally has higher thresholds due to platform constraints - **iOS**: Lower thresholds leveraging platform optimizations - **Quality Gates**: Hard limits that cause test failures if exceeded ## 📊 Test Reporting Performance tests automatically generate detailed reports using the `PerformanceTestReporter` utility: - **JSON Reports**: Structured performance data for analysis - **Test Results**: Include timing metrics, thresholds, and pass/fail status - **User Profile Testing**: Tests run across different user states (CORE_USER, POWER_USER) --- ## 🏠 Account List Performance Tests ### Test: `render account list efficiently with multiple accounts and networks` **Configuration**: Multiple accounts, popular networks, profile syncing enabled | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 5000ms | 5 seconds maximum | | **iOS** | 7,500ms | 7.5 seconds maximum | --- ### Test: `handle account list performance with heavy token load` **Configuration**: Multiple accounts, popular networks, 10 tokens for stress testing | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 5000ms | 5 seconds maximum | | **iOS** | 7,500ms | 7.5 seconds maximum | --- ### Test: `benchmark account list with minimal load` **Configuration**: Minimal accounts, default network, 2 tokens (baseline measurement) | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 45,000ms | 45 seconds maximum | | **iOS** | 15,000ms | 15 seconds maximum | --- ### Test: `benchmark switching accounts from the account list` **Configuration**: Account switching/dismissal performance | Platform | Dismissal Max Time | Notes | |----------|-------------------|--------| | **Android** | 5,000ms | 5 seconds maximum | | **iOS** | 4,000ms | 4 seconds maximum | --- ## 🌐 Network List Performance Tests ### Test: `render network list efficiently with multiple accounts and all popular networks` **Configuration**: Multiple accounts, all popular networks | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 17,500ms | 17.5 seconds maximum | | **iOS** | 6,500ms | 6.5 seconds maximum | --- ### Test: `handle network list performance with heavy token load on all popular networks` **Configuration**: Multiple accounts, popular networks, 10 tokens for stress testing | Platform | Total Max Time | Notes | |----------|----------------|--------| | **Android** | 17,500ms | 17.5 seconds maximum | | **iOS** | 6,500ms | 6.5 seconds maximum | --- ### Test: `benchmark network list with minimal load` **Configuration**: Minimal tokens, popular networks (baseline measurement) | Platform | Render Max Time | Notes | |----------|----------------|--------| | **Android** | 2,500ms | 2.5 seconds maximum | | **iOS** | 1,500ms | 1.5 seconds maximum | --- ### Test: `benchmark switching networks from the network list` **Configuration**: Network switching/dismissal performance | Platform | Dismissal Max Time | Notes | |----------|-------------------|--------| | **Android** | 2,500ms | 2.5 seconds maximum | | **iOS** | 1,500ms | 1.5 seconds maximum | --- ## 📊 Test Summary ### Account List Tests (4 tests) - ✅ **Standard Load**: Multiple accounts, popular networks - ✅ **Heavy Load**: Multiple accounts, 10 tokens, popular networks - ✅ **Baseline Test**: Minimal accounts, 2 tokens, default network - ✅ **Dismissal Test**: Account switching performance ### Network List Tests (4 tests) - ✅ **Standard Load**: Multiple accounts, popular networks - ✅ **Heavy Load**: Multiple accounts, 10 tokens, popular networks - ✅ **Baseline Test**: Minimal tokens, popular networks - ✅ **Dismissal Test**: Network switching performance **Total**: 8 performance tests across critical user flows --- ## 🚨 Quality Gate Rules ### Failure Criteria Tests fail immediately when total time exceeds the maximum acceptable time for any scenario. ### Performance Patterns - **Heavy Token Load**: Increases render times but maintains same thresholds as standard load - **Platform Differences**: iOS consistently performs better than Android - **Baseline Tests**: Should complete quickly but have generous thresholds for stability - **Dismissal Tests**: Focus on UI responsiveness during state transitions ### User Profile Testing Tests run across different user states with varying account complexity: #### User Profile Definitions - **CASUAL_USER**: 2 EVM accounts from 1 SRP *(currently not used in tests)* - **CORE_USER**: 5 EVM accounts and 5 Solana accounts from 1 SRP - **POWER_USER**: 15 EVM accounts from 2 SRPs + 5 Solana accounts #### Current Test Coverage - **CORE_USER**: Standard user configuration for baseline performance - **POWER_USER**: Enhanced user configuration with maximum account complexity --- ## 📈 Reporting Features ### Automated Reports - **JSON Output**: Structured performance data for CI/CD integration - **Test Metrics**: Total time measurements and performance thresholds - **Threshold Tracking**: Pass/fail status with actual vs. expected performance - **User Profile Results**: Separate results for different user configurations ### Report Usage Performance reports can be used for: - Continuous integration quality gates - Performance regression detection - Platform-specific optimization insights - User experience benchmarking --- **What can we do to improve it** * Instead of measuring the action once, we perform the action multiple times and we get an average and the top duration, both need to have base lines. * To get more granular function reports, we can use this sample work done to measure multiple functions of one flow, for example if add account if it's split in 2 functions, instead of one big measure, we can measure both functions, and get more deep knowledge of where we can improve #16806 **Bitrise Runs** * https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/05c14c4e-4868-43b0-9642-0e5d69ebd82b?tab=workflows * https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/9b63202e-8944-46b5-8867-ab94f748f37f * https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/98ca0cf1-edfa-4888-9e38-ef64422d6833 * https://app.bitrise.io/app/be69d4368ee7e86d/pipelines/7857fee5-3f27-41a6-90a9-0332e1a81435 Table if 8 bitrise runs results | **Test Name** | **Platform** | **Total Time (Runs)** | **Min** | **Max** | **Delta** | **Average** | |---------------------------------------------------------|--------------|--------------------------------------------------------------------------------------|---------|---------|-----------|-------------| | render account list efficiently with multiple accounts and networks | IOS | 2.54, 2.61, 3.65, 2.46, 2.88, 3.1, 2.94, 2.87 | 2.46 | 3.65 | 1.19 | 2.88 | | render account list efficiently with multiple accounts and networks | ANDROID | 2.83, 2.64, 2.92, 2.28, 2.62, 2.76, 1.95, 2.37 | 1.95 | 2.92 | 0.97 | 2.55 | | handle account list performance with heavy token load | IOS | 4.01, 5.72, 4.02, 5.3, 2.49, 2.48, 2.67, 2.78 | 2.48 | 5.72 | 3.24 | 3.93 | | handle account list performance with heavy token load | ANDROID | 2.15, 2.51, 1.97, 2.78, 2.13, 2.29, 2.23, 2.41 | 1.97 | 2.78 | 0.81 | 2.29 | | benchmark account list with minimal load | IOS | 2.86, 3.48, 2.87, 3.32, 2.81, 3.35, 2.85, 3.13 | 2.81 | 3.48 | 0.67 | 3.08 | | benchmark account list with minimal load | ANDROID | 2.08, 2.17, 2.59, 2.28, 2.07, 2.19, 2.13, 2.49 | 2.07 | 2.59 | 0.52 | 2.25 | | benchmark switching networks from the network list | IOS | 6.61, 8.11, 7.65, 8.08, 7.64, 6.53, 6.49, 6.55 | 6.49 | 8.11 | 1.62 | 7.34 | | benchmark switching networks from the network list | ANDROID | 7.68, 4.26, 4.11, 5.6, 4.04, 4.32, 4.11, 4.28 | 4.04 | 7.68 | 3.64 | 4.92 | | benchmark switching accounts from the account list | IOS | 5.99, 4.37, 3.14, 4.55, 3.36, 4.33 | 3.14 | 5.99 | 2.85 | 4.29 | | benchmark switching accounts from the account list | ANDROID | 2.01, 2.28, 1.71, 2.05, 1.77, 2.28, 1.75, 2.39 | 1.71 | 2.39 | 0.68 | 2.06 | ## **Related issues** Fixes: ## **Manual testing steps** 1. Go to this page... 2. 3. ## **Screenshots/Recordings** <!-- If applicable, add screenshots and/or recordings to visualize the before and after of your change. --> ### **Before** <!-- [screenshots/recordings] --> ### **After** <!-- [screenshots/recordings] --> ## **Pre-merge author checklist** - [ ] I’ve followed [MetaMask Contributor Docs](https://github.com/MetaMask/contributor-docs) and [MetaMask Mobile Coding Standards](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/CODING_GUIDELINES.md). - [ ] I've completed the PR template to the best of my ability - [ ] I’ve included tests if applicable - [ ] I’ve documented my code using [JSDoc](https://jsdoc.app/) format if applicable - [ ] I’ve applied the right labels on the PR (see [labeling guidelines](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/LABELING_GUIDELINES.md)). Not required for external contributors. ## **Pre-merge reviewer checklist** - [ ] I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed). - [ ] I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots. --------- Co-authored-by: Curtis David <Curtis.David7@gmail.com>
14b4243 to
d5169b6
Compare
223c084 to
946bde3
Compare
Contributor
Author
|
PoC PR, not expected to be merged |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR implements a comprehensive E2E performance testing system that enables developers to measure, track, and compare performance metrics across test runs during local development.
What is the reason for the change?
Developers need a reliable way to measure performance improvements and regressions when iterating on code changes locally. Traditional manual performance testing is inconsistent and doesn't provide the controlled, repeatable environment needed for accurate performance analysis. There was no systematic way to track performance metrics across E2E test runs or compare results between different code changes during local development.
What is the improvement/solution?
The E2E performance system provides a complete workflow for local performance testing:
Key Features:
Related issues
Fixes: Need for systematic performance testing and comparison in local E2E development workflow
Manual testing steps
yarn test:e2e:ios -- --testNamePattern="SampleFeature"e2e-performance-results/directorynpx ts-node scripts/compare-e2e-performance.ts SampleFeature-sample-feature-counternpx ts-node scripts/compare-e2e-performance.ts listnpx ts-node scripts/compare-e2e-performance.tsScreenshots/Recordings
Before
N/A
After
e2e test run logs indicating performances tracking:
The comparison app displaying the results using
yarn test:e2e:performance:compare list:Pre-merge author checklist
Pre-merge reviewer checklist