Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

State resolution on big rooms can get very CPU-hungry #6943

@babolivier

Description

@babolivier

Sometime between 1.9 and 1.10 (I think) abolivier.bzh started becoming very hungry on the boxe's CPU.

I used to host it on a VPS that had 1 vCore and it was perfectly happy with it, but lately it became so unresponsive so often (clearly due to being CPU-bound) that last week I had to move it to a new box with 2 vCores.

However, even though Synapse now responds in a more timely manner and doesn't seem as CPU-bound as it used to be, it still spends around half of the time using up 80% of the CPU resources and messages stay gray in Riot for maybe 5-15s (despite Grafana saying that the event took less than a second to send).

From the graphs it looks like these spikes are due to the persist_events background job, and the persist_events and state._resolve_events indexes.

image

image

It looks like these have matching spikes in the "DB time" graphs, though I'm not sure what could have happened since I don't remember updating PostgreSQL or its config on that box around that time.

I'm happy to pair with another member of the backend team to look at what might go wrong here, if that can help.

Metadata

Metadata

Assignees

Labels

A-PerformancePerformance, both client-facing and admin-facing

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions