This repository was archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
/keys/claim is surprisingly slow #16554
Copy link
Copy link
Open
Labels
A-E2EEEnd-to-end encryption for Matrix clientsEnd-to-end encryption for Matrix clientsA-PerformancePerformance, both client-facing and admin-facingPerformance, both client-facing and admin-facingO-FrequentAffects or can be seen by most users regularly or impacts most users' first experienceAffects or can be seen by most users regularly or impacts most users' first experienceS-MinorBlocks non-critical functionality, workarounds exist.Blocks non-critical functionality, workarounds exist.T-DefectBugs, crashes, hangs, security vulnerabilities, or other reported issues.Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Description
/keys/claim requests often take multiple seconds when requesting keys for hundreds of devices.
Out of interest I looked at the anatomy of a slow /keys/claim request (https://jaeger.proxy.matrix.org/trace/62603ae20c639720). The request took 6.2 seconds altogether.
In this case, we were just attempting to claim keys for devices which we had previously failed to get one. (Due to matrix-org/matrix-rust-sdk#281, we do this a bit too often). Anyway the point is pretty much all of the devices in this request have run out of OTKs - but I think it is still instructive.
What I see is:
- 321 calls to
db.claim_e2e_one_time_keys. This is presumably one for each device formatrix.orgusers. These take us to about 1.8 seconds. - 321 calls to
db._get_fallback_key. Again one for eachmatrix.orgdevice. Another 2.1 seconds, bringing us to 4.0 seconds. - 21 calls to
claim_client_keys. One per federated destination. These all happen in parallel, so the critical path is the slowest homeserver to respond. The pathological case here is servers that respond within the timeout (so don't get backed off from) but slowly - and then the device doesn't have any keys so we have to do it again. In this case the slowest server was 2.1 seconds.
What I see here is some easy performance improvements. In particular:
- Doing remote and local claims in parallel would roughly halve the time.
- up for grabs!
- Doing more than one local device per db request would mean much less DB scheduling overhead.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-E2EEEnd-to-end encryption for Matrix clientsEnd-to-end encryption for Matrix clientsA-PerformancePerformance, both client-facing and admin-facingPerformance, both client-facing and admin-facingO-FrequentAffects or can be seen by most users regularly or impacts most users' first experienceAffects or can be seen by most users regularly or impacts most users' first experienceS-MinorBlocks non-critical functionality, workarounds exist.Blocks non-critical functionality, workarounds exist.T-DefectBugs, crashes, hangs, security vulnerabilities, or other reported issues.Bugs, crashes, hangs, security vulnerabilities, or other reported issues.