Skip to content

fix(perps): optimize connection speed and consolidate architecture - manual cp into 7.57.0#21245

Merged
Cal-L merged 1 commit into
release/7.57.0from
cherry-pick-7-57-0-6b32d28
Oct 16, 2025
Merged

fix(perps): optimize connection speed and consolidate architecture - manual cp into 7.57.0#21245
Cal-L merged 1 commit into
release/7.57.0from
cherry-pick-7-57-0-6b32d28

Conversation

@Cal-L

@Cal-L Cal-L commented Oct 16, 2025

Copy link
Copy Markdown
Contributor

…-7.57.0 (#20946)

Description

Second part of Perps connection optimizations building on PR #20918. This PR delivers performance improvements, reliability fixes, and architectural cleanup for the Perps connection system.

Performance (TAT-1576): Faster HyperLiquid Connection

Replaced blocking getAccountState() API calls with lightweight WebSocket health checks:

  • Initial connection: Removed blocking HTTP call + stale connection check
  • Account switches: Removed redundant API call after provider reinitialization
  • New validation: Added provider.ping() WebSocket health check (5s timeout) instead of full API calls

Account data is now efficiently fetched via WebSocket subscriptions during preload, eliminating blocking HTTP requests and retry logic.

Reliability Improvements

Connection timeout protection (TAT-1576):

  • Added 30-second timeout for connection attempts with new CONNECTION_TIMEOUT error code
  • Prevents indefinite hanging on network issues
  • Clear user feedback when connection fails

Background reconnection fix (TAT-1559):

  • Added 300ms stabilization delay when app returns from background to prevent race conditions
  • Eliminates "Perps is temporarily offline" errors on app wake-up

Architecture: Error Logging & Code Consolidation

  • Standardized error logging: Added getErrorContext() helper and comprehensive Sentry logging to 20+ catch blocks with consistent context (feature, context, provider, network)
  • Performance metrics refactoring: Changed naming from perps_x_y to perps.x.y for hierarchical grouping and easier filtering in Sentry dashboards
  • Removed code duplication: Eliminated Controller.reconnectWithNewContext() - Manager calls initializeProviders() directly
  • Fixed retry button: Now uses reconnectWithNewContext({ force: true }) for proper WebSocket disconnect
  • Cleaner API: Extracted usePerpsConnection hook to separate file, added ReconnectOptions interface
  • Comprehensive docs: Created PERPS_CONNECTION_ARCHITECTURE.md with Mermaid diagrams explaining connection lifecycle and race condition handling

Impact:

  • Faster: Connection and reconnection complete faster by eliminating blocking HTTP calls
  • 🛡️ Reliable: Background reconnection works consistently; timeout protection prevents hanging
  • 🐛 Debuggable: Standardized error logging with consistent context helps diagnose production issues in Sentry
  • 🧹 Maintainable: Cleaner code with reduced duplication and clearer separation of concerns

Changelog

CHANGELOG entry: null

Related issues

Depends on: #20918
Fixes: TAT-1576, TAT-1559

Manual testing steps

Feature: Perps connection performance and reliability

  Scenario: faster initial connection
    Given Perps has never been connected in this session

    When user opens Perps tab
    Then connection establishes without blocking HTTP calls
    And loading time is noticeably faster than before
    And positions/orders appear without delay

  Scenario: faster account switching
    Given user is connected to Perps with Account A
    And user has positions open

    When user switches to Account B
    Then reconnection completes faster without getAccountState() call
    And Account B's positions load via WebSocket subscriptions
    And no Account A data leaks through

  Scenario: retry button performs full reconnection
    Given Perps fails to connect with error message

    When user taps "Retry Connection" button
    Then connection fully reinitializes including WebSocket disconnect
    And error is logged to Sentry with proper context

  Scenario: rapid account switches are serialized
    Given user is connected to Perps with Account A

    When user switches to Account B
    And immediately switches to Account C before B connection completes
    Then reconnections are serialized (no race conditions)
    And final connection uses Account C's address

  Scenario: app wake-up after inactivity (TAT-1559)
    Given app has been in background for extended period
    And user had Perps connected before backgrounding

    When user returns to app and opens Perps tab
    Then connection establishes after 300ms stabilization delay
    And "Perps is temporarily offline" error does not appear
    And positions/orders load successfully

Screenshots/Recordings

Before

  • Initial connection: ~3 blocking operations (stale check + HTTP API call + provider init)
  • Account switches: Provider init + HTTP API call with retry logic
  • No timeout protection (could hang indefinitely)
  • Background reconnection failures on app wake-up

After

  • Initial connection: 1 operation (provider init + WebSocket ping health check)
  • Account switches: 1 operation (provider init + WebSocket ping health check)
  • 30-second connection timeout with user feedback
  • Reliable background reconnection with 300ms stabilization delay
  • Account data fetched via WebSocket subscriptions during preload

Pre-merge author checklist

Pre-merge reviewer checklist

  • I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
  • I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

Notes for Reviewers

Key Changes

Performance (TAT-1576):

  • PerpsConnectionManager.ts: Removed blocking getAccountState() HTTP calls and retry logic
  • Replaced with provider.ping() WebSocket health check (5s timeout)
  • Account data now fetched via streamManager.account.prewarm() during preload subscriptions

Reliability:

  • Added connection timeout protection: 30-second limit with CONNECTION_TIMEOUT error code
  • Background reconnection delay: 300ms stabilization period when app returns from background (fixes TAT-1559)
  • All WebSocket operations now have timeout protection

Error Logging:

  • New getErrorContext() helper for consistent Sentry context across 20+ catch blocks
  • Performance metric names refactored to perps.{category}.{metric} format for better Sentry dashboards
  • Example: perps_websocket_connectionperps.websocket.connection_establishment

Architecture:

  • NEW: hooks/usePerpsConnection.ts - Extracted hook from provider
  • NEW: docs/perps/PERPS_CONNECTION_ARCHITECTURE.md - Architecture guide with Mermaid diagrams
  • NEW: types/index.ts - Added ReconnectOptions interface
  • REMOVED: Controller.reconnectWithNewContext() - Manager now calls initializeProviders() directly
  • UPDATED: Retry button now uses reconnectWithNewContext({ force: true })

Testing Focus Areas

  1. Connection speed: Initial Perps tab open should be noticeably faster without blocking HTTP calls
  2. Account switching: Rapid switches (3+ accounts) should serialize correctly without data leaks
  3. Background reconnection: App wake-up after extended background time should not show "temporarily offline" error
  4. Connection timeout: Simulate network issues - should timeout after 30s with clear error message
  5. Retry button: Should perform full reconnection with WebSocket disconnect

Note

Replaces blocking HTTP checks with WebSocket health ping, adds 30s connection timeout and foreground reconnection delay, standardizes error logging/metrics, and extracts connection hook with minor API changes and tests/docs updates.

  • Performance & Reliability:
    • Replace blocking getAccountState() checks with provider.ping() WebSocket health check; remove stale-connection HTTP retries.
    • Add 30s connection attempt timeout with new CONNECTION_TIMEOUT error code and i18n.
    • Add small reconnection delay on app foreground to prevent race conditions.
  • Architecture & Logging:
    • Standardize error logging via ensureError() and getErrorContext() across controllers/services.
    • Rename performance metrics to hierarchical perps.{category}.{metric}.
    • Extract usePerpsConnection hook; Provider now delegates retry to reconnectWithNewContext({ force: true }).
    • Remove PerpsController.reconnectWithNewContext; Manager calls initializeProviders() directly.
  • API & Types:
    • Add ping(timeoutMs?) to IPerpsProvider; implement in HyperLiquidProvider.
    • Add ReconnectOptions and FEATURE_NAME in perpsConfig.
  • UI/Hook Updates:
    • Switch imports to ../hooks/usePerpsConnection; update PerpsLoadingSkeleton to pass { force: true }.
  • Tests:
    • New/updated tests for connection lifecycle, provider ping, open orders/positions hooks, controller fee discount and logging changes.
  • Docs:
    • Add perps-connection-architecture.md; update Sentry reference guidelines.

Written by Cursor Bugbot for commit 3de4e33. This will update automatically on new commits. Configure here.


Description

Changelog

CHANGELOG entry:

Related issues

Fixes:

Manual testing steps

Feature: my feature name

  Scenario: user [verb for user action]
    Given [describe expected initial app state]

    When user [verb for user action]
    Then [describe expected outcome]

Screenshots/Recordings

Before

After

Pre-merge author checklist

Pre-merge reviewer checklist

  • I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
  • I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

…-7.57.0 (#20946)

## **Description**

Second part of Perps connection optimizations building on PR #20918.
This PR delivers performance improvements, reliability fixes, and
architectural cleanup for the Perps connection system.

**Performance (TAT-1576): Faster HyperLiquid Connection**

Replaced blocking `getAccountState()` API calls with lightweight
WebSocket health checks:
- **Initial connection**: Removed blocking HTTP call + stale connection
check
- **Account switches**: Removed redundant API call after provider
reinitialization
- **New validation**: Added `provider.ping()` WebSocket health check (5s
timeout) instead of full API calls

Account data is now efficiently fetched via WebSocket subscriptions
during preload, eliminating blocking HTTP requests and retry logic.

**Reliability Improvements**

Connection timeout protection (TAT-1576):
- Added 30-second timeout for connection attempts with new
`CONNECTION_TIMEOUT` error code
- Prevents indefinite hanging on network issues
- Clear user feedback when connection fails

Background reconnection fix (TAT-1559):
- Added 300ms stabilization delay when app returns from background to
prevent race conditions
- Eliminates "Perps is temporarily offline" errors on app wake-up

**Architecture: Error Logging & Code Consolidation**

- **Standardized error logging**: Added `getErrorContext()` helper and
comprehensive Sentry logging to 20+ catch blocks with consistent context
(feature, context, provider, network)
- **Performance metrics refactoring**: Changed naming from `perps_x_y`
to `perps.x.y` for hierarchical grouping and easier filtering in Sentry
dashboards
- **Removed code duplication**: Eliminated
`Controller.reconnectWithNewContext()` - Manager calls
`initializeProviders()` directly
- **Fixed retry button**: Now uses `reconnectWithNewContext({ force:
true })` for proper WebSocket disconnect
- **Cleaner API**: Extracted `usePerpsConnection` hook to separate file,
added `ReconnectOptions` interface
- **Comprehensive docs**: Created `PERPS_CONNECTION_ARCHITECTURE.md`
with Mermaid diagrams explaining connection lifecycle and race condition
handling

**Impact:**

- ⚡ **Faster**: Connection and reconnection complete faster by
eliminating blocking HTTP calls
- 🛡️ **Reliable**: Background reconnection works consistently; timeout
protection prevents hanging
- 🐛 **Debuggable**: Standardized error logging with consistent context
helps diagnose production issues in Sentry
- 🧹 **Maintainable**: Cleaner code with reduced duplication and clearer
separation of concerns

## **Changelog**

CHANGELOG entry: null

## **Related issues**

Depends on: #20918
Fixes: TAT-1576, TAT-1559

## **Manual testing steps**

```gherkin
Feature: Perps connection performance and reliability

  Scenario: faster initial connection
    Given Perps has never been connected in this session

    When user opens Perps tab
    Then connection establishes without blocking HTTP calls
    And loading time is noticeably faster than before
    And positions/orders appear without delay

  Scenario: faster account switching
    Given user is connected to Perps with Account A
    And user has positions open

    When user switches to Account B
    Then reconnection completes faster without getAccountState() call
    And Account B's positions load via WebSocket subscriptions
    And no Account A data leaks through

  Scenario: retry button performs full reconnection
    Given Perps fails to connect with error message

    When user taps "Retry Connection" button
    Then connection fully reinitializes including WebSocket disconnect
    And error is logged to Sentry with proper context

  Scenario: rapid account switches are serialized
    Given user is connected to Perps with Account A

    When user switches to Account B
    And immediately switches to Account C before B connection completes
    Then reconnections are serialized (no race conditions)
    And final connection uses Account C's address

  Scenario: app wake-up after inactivity (TAT-1559)
    Given app has been in background for extended period
    And user had Perps connected before backgrounding

    When user returns to app and opens Perps tab
    Then connection establishes after 300ms stabilization delay
    And "Perps is temporarily offline" error does not appear
    And positions/orders load successfully
```

## **Screenshots/Recordings**

### **Before**
- Initial connection: ~3 blocking operations (stale check + HTTP API
call + provider init)
- Account switches: Provider init + HTTP API call with retry logic
- No timeout protection (could hang indefinitely)
- Background reconnection failures on app wake-up

### **After**
- Initial connection: 1 operation (provider init + WebSocket ping health
check)
- Account switches: 1 operation (provider init + WebSocket ping health
check)
- 30-second connection timeout with user feedback
- Reliable background reconnection with 300ms stabilization delay
- Account data fetched via WebSocket subscriptions during preload

## **Pre-merge author checklist**

- [x] I've followed [MetaMask Contributor
Docs](https://github.com/MetaMask/contributor-docs) and [MetaMask Mobile
Coding
Standards](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/CODING_GUIDELINES.md).
- [x] I've completed the PR template to the best of my ability
- [x] I've included tests if applicable
- [x] I've documented my code using [JSDoc](https://jsdoc.app/) format
if applicable
- [x] I've applied the right labels on the PR (see [labeling
guidelines](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/LABELING_GUIDELINES.md)).
Not required for external contributors.

## **Pre-merge reviewer checklist**

- [ ] I've manually tested the PR (e.g. pull and build branch, run the
app, test code being changed).
- [ ] I confirm that this PR addresses all acceptance criteria described
in the ticket it closes and includes the necessary testing evidence such
as recordings and or screenshots.

## **Notes for Reviewers**

### Key Changes

**Performance (TAT-1576):**
- `PerpsConnectionManager.ts`: Removed blocking `getAccountState()` HTTP
calls and retry logic
- Replaced with `provider.ping()` WebSocket health check (5s timeout)
- Account data now fetched via `streamManager.account.prewarm()` during
preload subscriptions

**Reliability:**
- Added connection timeout protection: 30-second limit with
`CONNECTION_TIMEOUT` error code
- Background reconnection delay: 300ms stabilization period when app
returns from background (fixes TAT-1559)
- All WebSocket operations now have timeout protection

**Error Logging:**
- New `getErrorContext()` helper for consistent Sentry context across
20+ catch blocks
- Performance metric names refactored to `perps.{category}.{metric}`
format for better Sentry dashboards
- Example: `perps_websocket_connection` →
`perps.websocket.connection_establishment`

**Architecture:**
- **NEW**: `hooks/usePerpsConnection.ts` - Extracted hook from provider
- **NEW**: `docs/perps/PERPS_CONNECTION_ARCHITECTURE.md` - Architecture
guide with Mermaid diagrams
- **NEW**: `types/index.ts` - Added `ReconnectOptions` interface
- **REMOVED**: `Controller.reconnectWithNewContext()` - Manager now
calls `initializeProviders()` directly
- **UPDATED**: Retry button now uses `reconnectWithNewContext({ force:
true })`

### Testing Focus Areas

1. **Connection speed**: Initial Perps tab open should be noticeably
faster without blocking HTTP calls
2. **Account switching**: Rapid switches (3+ accounts) should serialize
correctly without data leaks
3. **Background reconnection**: App wake-up after extended background
time should not show "temporarily offline" error
4. **Connection timeout**: Simulate network issues - should timeout
after 30s with clear error message
5. **Retry button**: Should perform full reconnection with WebSocket
disconnect

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Replaces blocking HTTP checks with a WS ping for faster connects, adds
30s connection timeout and 300ms foreground delay, consolidates error
logging/metrics, and refactors connection APIs/hooks.
> 
> - **Performance & Reliability**:
> - Replace `getAccountState()` health checks with `provider.ping()` WS
health check; remove stale-connection HTTP checks.
> - Add 30s connection timeout (`CONNECTION_TIMEOUT`) and 300ms
post-foreground delay; preload account data via subscriptions.
> - **Architecture & APIs**:
> - Remove `PerpsController.reconnectWithNewContext`; Manager now calls
`initializeProviders()`; retry uses `reconnectWithNewContext({ force:
true })`.
> - Extract `usePerpsConnection` hook; add `ReconnectOptions`; switch
imports from provider to hook.
> - **Logging & Metrics**:
> - Standardize Sentry logging with `ensureError()` and
controller/provider `getErrorContext()`; update 20+ catch blocks.
> - Rename metrics to hierarchical `perps.*` (e.g., `perps.websocket.*`,
`perps.screen.*`).
> - **Provider & Services**:
> - Implement `ping(timeoutMs?)` in `HyperLiquidProvider`; add WS client
typing; improve subscription error handling.
> - Misc fixes in `PerpsStreamManager` and subscription services (error
logging, cache/prewarm cleanup).
> - **Config & i18n**:
> - Update `PERPS_CONSTANTS` (timeouts, ping); add `connectionTimeout`
i18n string.
> - **Docs & Tests**:
> - Add connection architecture docs; update/add extensive tests across
hooks, controller, provider, services.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
db00eee. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Nicholas Smith <nick.smith@consensys.net>
@Cal-L Cal-L requested a review from a team as a code owner October 16, 2025 00:35
@github-actions

Copy link
Copy Markdown
Contributor

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

@metamaskbot metamaskbot added the team-mobile-platform Mobile Platform team label Oct 16, 2025
}
}, PERPS_CONSTANTS.RECONNECTION_DELAY_ANDROID_MS);
// Store timer to clean up if component unmounts
return () => clearTimeout(timer);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: App State Tracking Bug Causes Memory Leak

The return () => clearTimeout(timer); in handleAppStateChange causes an early exit, preventing lastAppState.current from updating and breaking app state tracking. Since event listeners ignore return values, the setTimeout timer is also never cleared, leading to a memory leak.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abretonc7s Let's follow up with this if it's an issue

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
59.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@Cal-L Cal-L added the skip-sonar-cloud Only used for bypassing sonar cloud when failures are not relevant to the changes. label Oct 16, 2025
@Cal-L Cal-L merged commit a3958d8 into release/7.57.0 Oct 16, 2025
113 of 120 checks passed
@Cal-L Cal-L deleted the cherry-pick-7-57-0-6b32d28 branch October 16, 2025 01:08
@github-actions github-actions Bot locked and limited conversation to collaborators Oct 16, 2025
@metamaskbot metamaskbot added the release-7.57.0 Issue or pull request that will be included in release 7.57.0 label Oct 22, 2025
@metamaskbot

Copy link
Copy Markdown
Collaborator

No release label on PR. Adding release label release-7.57.0 on PR, as PR was added to branch 7.57.0 when release was cut.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

release-7.57.0 Issue or pull request that will be included in release 7.57.0 size-XL skip-sonar-cloud Only used for bypassing sonar cloud when failures are not relevant to the changes. team-mobile-platform Mobile Platform team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants