State resolution on big rooms can get very CPU-hungry

Sometime between 1.9 and 1.10 (I think) `abolivier.bzh` started becoming very hungry on the boxe's CPU.

I used to host it on a VPS that had 1 vCore and it was perfectly happy with it, but lately it became so unresponsive so often (clearly due to being CPU-bound) that last week I had to move it to a new box with 2 vCores.

However, even though Synapse now responds in a more timely manner and doesn't seem as CPU-bound as it used to be, it still spends around half of the time using up 80% of the CPU resources and messages stay gray in Riot for maybe 5-15s (despite Grafana saying that the event took less than a second to send).

From the graphs it looks like these spikes are due to the `persist_events` background job, and the `persist_events` and `state._resolve_events` indexes.

![image](https://user-images.githubusercontent.com/5547783/74758065-bbde2700-526e-11ea-8d71-d07e39aafb78.png)

![image](https://user-images.githubusercontent.com/5547783/74758122-d0222400-526e-11ea-868a-386889db164d.png)

It looks like these have matching spikes in the "DB time" graphs, though I'm not sure what could have happened since I don't remember updating PostgreSQL or its config on that box around that time.

I'm happy to pair with another member of the backend team to look at what might go wrong here, if that can help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

State resolution on big rooms can get very CPU-hungry #6943

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

State resolution on big rooms can get very CPU-hungry #6943

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions