[BUG] Database deadlocks

fnetX commented

2023-01-08 19:21:55 +01:00

Owner

This one is not exactly new for Gitea (at least present during the 1.17 cycle), but I'm just in the situation that I cannot drop a big testing repo (many activity, thus database relations).

Post to https://codeberg.org/fnetX/wikitest/settings with delete repo action returns in error 500.

Log excerpt:

2023/01/08 18:19:16 .../web/repo/setting.go:757:SettingsPost() [E] [63bb091e-64] DeleteRepository: Error 1213: Deadlock found when trying to get lock; try restarting transaction

This one is not exactly new for Gitea (at least present during the 1.17 cycle), but I'm just in the situation that I cannot drop a big testing repo (many activity, thus database relations). Post to https://codeberg.org/fnetX/wikitest/settings with delete repo action returns in error 500. Log excerpt: ~~~ 2023/01/08 18:19:16 .../web/repo/setting.go:757:SettingsPost() [E] [63bb091e-64] DeleteRepository: Error 1213: Deadlock found when trying to get lock; try restarting transaction ~~~

👀 1

Ghost added the

labels

2023-01-08 20:55:41 +01:00

Ghost commented

2023-01-08 20:57:25 +01:00

Is this a transient failure? Or can it be reproduced?

fnetX commented

2023-01-08 21:14:20 +01:00

Author

Owner

I tried 10 times. Retrying now:

Jan 08 20:12:18 gitea-production gitea[3729473]: 2023/01/08 20:12:18 [63bb2265-71] router: completed POST /fnetX/wikitest/settings for 10.0.3.1:42244, 500 Internal Server Error in 316901.9ms @ repo/setting.go:108(repo.SettingsPost)

But my browser received
"504 Gateway Time-out
The server didn't respond in time." (from reverse proxy I suppose).

So yes, seems to be fully reproducible.

What also happens in the log (not sure for which repos):

2023/01/08 20:12:18 .../web/repo/setting.go:757:SettingsPost() [E] [63bb2265-71] DeleteRepository: context canceled

Both issues combined:

grep DeleteRepository /data/git/log/gitea.log | wc -l
54

I tried 10 times. Retrying now: ~~~ Jan 08 20:12:18 gitea-production gitea[3729473]: 2023/01/08 20:12:18 [63bb2265-71] router: completed POST /fnetX/wikitest/settings for 10.0.3.1:42244, 500 Internal Server Error in 316901.9ms @ repo/setting.go:108(repo.SettingsPost) ~~~ But my browser received "504 Gateway Time-out The server didn't respond in time." (from reverse proxy I suppose). So yes, seems to be fully reproducible. What also happens in the log (not sure for which repos): ~~~ 2023/01/08 20:12:18 .../web/repo/setting.go:757:SettingsPost() [E] [63bb2265-71] DeleteRepository: context canceled ~~~ Both issues combined: ~~~ grep DeleteRepository /data/git/log/gitea.log | wc -l 54 ~~~

fnetX commented

2023-01-08 21:17:22 +01:00

Author

Owner

The repo is gone now, so the last err 500 / timeout was actually a success it seems.

Also see Codeberg/Community#632 (was "resolved" by the locks that are now the problem).

The repo is gone now, so the last err 500 / timeout was actually a success it seems. Also see https://codeberg.org/Codeberg/Community/issues/632 (was "resolved" by the locks that are now the problem).

Ghost commented

2023-01-08 21:50:22 +01:00

Looks like it's going to be one of these race conditions that will take a very long time to figure out. Could you save somewhere as much of the current logs as you can for forensic analysis? Once there is more evidence of the same problem it will help cross reference and figure out the root cause. There are so many possible race conditions in this codepath that could lead to a deadlock that I'm not sure where to begin with what there is right now. But it will resurface I'm sure.

Gusted commented

2023-01-08 23:02:33 +01:00

Owner

I'm not sure if this is a race condition. This rather seems to me that the transaction is trying to lock too many tables and that MySQL is simply saying "too busy, we cannot lock some of these tables, lets return deadlock". So we rather want to know which queries are being run "roughly" at the same time?

I'm not sure if this is a race condition. This rather seems to me that the transaction is trying to lock [too many tables](https://codeberg.org/forgejo/forgejo/src/commit/a459fa530fcfbc4657b258ea503d163fd0102884/models/repo.go#L131-L154) and that MySQL is simply saying "too busy, we cannot lock some of these tables, lets return deadlock". So we rather want to know which queries are being run "roughly" at the same time?

Ghost commented

2023-01-08 23:57:16 +01:00

Can MySQL return deadlock when there is no deadlock?

fsologureng commented

2023-01-09 00:34:07 +01:00

Contributor

I'm not sure if this is a race condition. This rather seems to me that the transaction is trying to lock too many tables and that MySQL is simply saying "too busy, we cannot lock some of these tables, lets return deadlock". So we rather want to know which queries are being run "roughly" at the same time?

I tend to believe that when a DB engine said it was a deadlock, it really was.

The problem behind is probably related with the transaction isolation level chosen (or leaved by default)^1,2.

I really don't know where is defined the isolation level chosen by Gitea/Forgejo for their DB model as a whole or by particular transactions, but it has a huge impact in performance of concurrency under heavy load environment.

> I'm not sure if this is a race condition. This rather seems to me that the transaction is trying to lock [too many tables](https://codeberg.org/forgejo/forgejo/src/commit/a459fa530fcfbc4657b258ea503d163fd0102884/models/repo.go#L131-L154) and that MySQL is simply saying "too busy, we cannot lock some of these tables, lets return deadlock". So we rather want to know which queries are being run "roughly" at the same time? I tend to believe that when a DB engine said it was a deadlock, it really was. The problem behind is probably related with the transaction isolation level chosen (or leaved by default)<sup>[1][1],[2][2]</sup>. I really don't know where is defined the isolation level chosen by Gitea/Forgejo for their DB model as a whole or by particular transactions, but it has a huge impact in performance of concurrency under heavy load environment. [1]: https://mariadb.com/kb/en/set-transaction/ [2]: https://www.postgresql.org/docs/current/transaction-iso.html

Ghost commented

2023-01-09 09:47:36 +01:00

Here is one possible deadlock scenario.

Goroutine A locks table T1
Goroutine B locks table T2
Goroutine A waits on table T2
Goroutine B waits on table T1

In that particular case this code is run by one goroutine and locks one of the many tables. While it is holding this lock on T1 it also tries to get a lock on another table T2. But another goroutine has the lock on T2 so it must wait. Unfortunately (and here is the race) this other goroutine will wait on the lock from T1 to be released before it releases the lock on T2.

There is no architectural design in the codebase to prevent that kind of race, therefore it is bound to happen. And the odds of it happening increase when the system is under heavy load.

Here is one possible deadlock scenario. * Goroutine A locks table T1 * Goroutine B locks table T2 * Goroutine A waits on table T2 * Goroutine B waits on table T1 In that particular case [this code](https://codeberg.org/forgejo/forgejo/src/commit/a459fa530fcfbc4657b258ea503d163fd0102884/models/repo.go#L131-L154) is run by one goroutine and locks one of the many tables. While it is holding this lock on T1 it also tries to get a lock on another table T2. But another goroutine has the lock on T2 so it must wait. Unfortunately (and here is the race) this other goroutine will wait on the lock from T1 to be released before it releases the lock on T2. There is no architectural design in the codebase to prevent that kind of race, therefore it is bound to happen. And the odds of it happening increase when the system is under heavy load.

👍 1

Gusted commented

2023-01-09 16:27:33 +01:00

Owner

There is no architectural design in the codebase to prevent that kind of race, therefore it is bound to happen. And the odds of it happening increase when the system is under heavy load.

I'm by no means a expert at (scaling / high-load) database, but I would assume there's some strategy to avoid this deadlocks from happening? Maybe changing the transaction isolation level that @fsologureng mentioned?

> There is no architectural design in the codebase to prevent that kind of race, therefore it is bound to happen. And the odds of it happening increase when the system is under heavy load. I'm by no means a expert at (scaling / high-load) database, but I would assume there's some strategy to avoid this deadlocks from happening? Maybe changing the transaction isolation level that @fsologureng mentioned?

Ghost commented

2023-01-09 17:07:38 +01:00

Deadlocks are the result of bad code design. In order to figure out the root cause of a given deadlock, all you need is a stack trace of both goroutine and a dose of patience to analyze the associated codepath.

At present there only is a location in the code (web/repo/setting.go:757) and no stack trace. That's why I'm saying that collecting more evidence from future similar occurrences of a deadlock is necessary.

Deadlocks are the result of bad code design. In order to figure out the root cause of a given deadlock, all you need is a stack trace of both goroutine and a dose of patience to analyze the associated codepath. At present there only is a location in the code (web/repo/setting.go:757) and no stack trace. That's why I'm saying that collecting more evidence from future similar occurrences of a deadlock is necessary.

Gusted commented

2023-01-09 17:14:02 +01:00

Owner

At present there only is a location in the code (web/repo/setting.go:757) and no stack trace. That's why I'm saying that collecting more evidence from future similar occurrences of a deadlock is necessary.

Hmm why specifically stack trace? wouldn't it be more interesting to have the SQL queries that are being run roughly around the same time? Either way I can brew up custom patches to Codeberg to collect such information.

> At present there only is a location in the code (web/repo/setting.go:757) and no stack trace. That's why I'm saying that collecting more evidence from future similar occurrences of a deadlock is necessary. Hmm why specifically stack trace? wouldn't it be more interesting to have the SQL queries that are being run roughly around the same time? Either way I can brew up custom patches to Codeberg to collect such information.

Ghost commented

2023-01-09 17:25:16 +01:00

The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking.

fsologureng commented

2023-01-09 19:03:25 +01:00

Contributor

Deadlocks are the result of bad code design.

Yes, but not totally code dependant; The model of consistency chosen (part of the design) could be supported by the DB Engine too. Not all the consistencies need a serializationn of all the changes. That extreme case scales very bad. Instead, it is possible to relax some conditions. For example, If the code goes to delete a repo from its dedicated table (and all the related ones), but at the same time another thread is counting the repos, is possible that the counting return while the delete is occurring (or even after), informing a former state of the DB (because it started with access shared to the table). But the deleting thread couldn't inform a bad state of the DB after finished its transaction (or inside its transaction). This prevents inconsistency in the scope of each thread, but not necessarily at global scope. That kind of details about consistency have a huge impact in performance because define when a lock (begin transaction) can be acquired.

Viewing the documentation of the ORM used I suppose that the transaction isolation level is not defined at each transaction, mainly because their syntax is very Engine dependant. Furthermore, I can't find definition at Engine level in the code neither, so apparently the defaults of each installation are being used. I suppose that autocommit is being used too.

I can brew up custom patches to Codeberg to collect such information.

Obtaining the whole SQL transaction permits analyze the deadlock, because occurs at pure DB level.

I have experience with this kind of debug in PG. There are ways to log errors with the SQL command involved. In MariaDB I suppose its very possible too without code modification.

> Deadlocks are the result of bad code design. Yes, but not totally code dependant; The model of consistency chosen (part of the design) could be supported by the DB Engine too. Not all the consistencies need a serializationn of all the changes. That extreme case scales very bad. Instead, it is possible to relax some conditions. For example, If the code goes to delete a repo from its dedicated table (and all the related ones), but at the same time another thread is counting the repos, is possible that the counting return *while* the delete is occurring (or even after), informing a former state of the DB (because it started with access shared to the table). But the deleting thread couldn't inform a bad state of the DB after finished its transaction (or inside its transaction). This prevents inconsistency in the scope of each thread, but not necessarily at global scope. That kind of details about consistency have a huge impact in performance because define when a lock (begin transaction) can be acquired. Viewing the documentation of the [ORM used](https://xorm.io/docs/chapter-10/readme/) I suppose that the transaction isolation level is not defined at each transaction, mainly because their syntax is very Engine dependant. Furthermore, I can't find definition at Engine level in the code neither, so apparently the defaults of each installation are being used. I suppose that autocommit is being used too. > I can brew up custom patches to Codeberg to collect such information. Obtaining the whole SQL transaction permits analyze the deadlock, because occurs at pure DB level. I have experience with this kind of debug in PG. There are ways to log errors with the SQL command involved. In MariaDB I suppose its very possible too without code modification.

Gusted commented

2023-01-09 19:24:26 +01:00

Owner

The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking.

I think we misunderstand each other here, AFAIK there isn't another SQL query that's deadlocking at the same time.

Obtaining the whole SQL transaction permits analyze the deadlock, because occurs at pure DB level.

I have experience with this kind of debug in PG. There are ways to log errors with the SQL command involved. In MariaDB I suppose its very possible too without code modification.

There's a option in app.ini to enable SQL logging.

> The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking. I think we misunderstand each other here, AFAIK there isn't another SQL query that's deadlocking at the same time. > Obtaining the whole SQL transaction permits analyze the deadlock, because occurs at pure DB level. > > I have experience with this kind of debug in PG. There are ways to log errors with the SQL command involved. In MariaDB I suppose its very possible too without code modification. There's a option in app.ini to enable SQL logging.

Ghost commented

2023-01-09 20:07:35 +01:00

The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking.

I think we misunderstand each other here, AFAIK there isn't another SQL query that's deadlocking at the same time.

I don't get how a single SQL statement can deadlock, could you give me an example?

> > The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking. > > I think we misunderstand each other here, AFAIK there isn't another SQL query that's deadlocking at the same time. I don't get how a single SQL statement can deadlock, could you give me an example?

Gusted commented

2023-01-09 20:46:16 +01:00

Owner

I don't get how a single SQL statement can deadlock, could you give me an example?

I don't meant that, I meant that there aren't other SQL deadlocking at the same time. There's only one goroutine that executes a SQL query that will receive a deadlock error by MySQL. Your comment implied to me that you mean that there are two goroutines that receive the deadlocked error at the same time.

> I don't get how a single SQL statement can deadlock, could you give me an example? I don't meant that, I meant that there aren't other SQL deadlocking at the same time. There's only one goroutine that executes a SQL query that will receive a deadlock error by MySQL. Your comment implied to me that you mean that there are two goroutines that receive the deadlocked error at the same time.

Ghost commented

2023-01-10 00:54:57 +01:00

Thanks for explaining, I understand what you meant now.

fsologureng commented

2023-01-10 23:02:02 +01:00

Contributor

There's a option in app.ini to enable SQL logging.

In Codeberg I imagine that that is a nightmare. Are you planning to replicate the case in codeberg test?

PD: sorry that I pinned notifications about this, and didn't receive notifications 🤦

> There's a option in app.ini to enable SQL logging. In Codeberg I imagine that that is a nightmare. Are you planning to replicate the case in codeberg test? PD: sorry that I pinned notifications about this, and didn't receive notifications 🤦

fnetX referenced this issue from Codeberg/Community

2023-04-08 23:40:10 +02:00

Error 500 when trying to delete repository #987

fnetX referenced this issue from Codeberg/Community

2023-06-30 02:30:36 +02:00

Attempting to post comment in a specific issue results in HTTP 500 error #1092

fnetX commented

2023-06-30 02:32:23 +02:00

Author

Owner

Hi all, this issue is still super serious to Codeberg. Basically, you cannot remove two repositories or issues simultaneously, and first users even started to complain this would violate privacy guidelines. (Users can still ask us to delete it, though, but it's a terrible thing nevertheless).

Further, we just now got a report about even posting issue comments resulting in deadlocks here: Codeberg/Community#1092

From a scaling point of view, this might be the most serious bug Codeberg is currently facing (database deadlocks).

Hi all, this issue is still super serious to Codeberg. Basically, you cannot remove two repositories or issues simultaneously, and first users even started to complain this would violate privacy guidelines. (Users can still ask us to delete it, though, but it's a terrible thing nevertheless). Further, we just now got a report about even posting issue comments resulting in deadlocks here: https://codeberg.org/Codeberg/Community/issues/1092 From a scaling point of view, this might be the most serious bug Codeberg is currently facing (database deadlocks).

fnetX added the

forgejo/scaling

label

2023-06-30 02:33:42 +02:00

fnetX commented

2023-06-30 02:46:22 +02:00

Author

Owner

I continued in the Matrix channel at https://matrix.to/#/%23forgejo-development%3Amatrix.org/%24FqVOejU_MoGYi_mP3e1DjAKlO5Dxg2cqRsYwLQW3gt0?via=matrix.tu-berlin.de&via=matrix.org&via=aria-net.org&via=exozy.me because I wasn't able to post further comments to this thread.

fnetX commented

2023-06-30 02:46:42 +02:00

Author

Owner

Original comment:

The content I am trying to add:

Funnily enough, while trying to add the "scaling" label, I caused a db deadlock.

Server log from the past 30 minutes:

2023/06/30 00:00:17 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b0a-140] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:01:09 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b42-111] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:01:59 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b74-192] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:02:10 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b80-127] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:11:01 ...epo/issue_comment.go:367:CreateIssueComment() [E] [649e1d95-141] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:12:23 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1de3-134] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:29:53 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e21fd-119] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:32:42 .../repo/issue_label.go:212:UpdateIssueLabel() [E] [649e22aa-3] AddLabel: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Label deadlocks also have some history (see ...), although they were "fixed" multiple times already. It sounds like there is a fundamental flaw somewhere?

Original comment: The content I am trying to add: Funnily enough, while trying to add the "scaling" label, I caused a db deadlock. Server log from the past 30 minutes: ~~~ 2023/06/30 00:00:17 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b0a-140] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:01:09 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b42-111] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:01:59 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b74-192] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:02:10 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b80-127] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:11:01 ...epo/issue_comment.go:367:CreateIssueComment() [E] [649e1d95-141] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:12:23 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1de3-134] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:29:53 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e21fd-119] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:32:42 .../repo/issue_label.go:212:UpdateIssueLabel() [E] [649e22aa-3] AddLabel: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ~~~ Label deadlocks also have some history (see ...), although they were "fixed" multiple times already. It sounds like there is a fundamental flaw somewhere?

fnetX changed title from ~~[BUG] Database deadlock (error 500) when removing repo~~ to [BUG] Database deadlocks

2023-06-30 02:47:03 +02:00

Ghost commented

2023-06-30 07:51:02 +02:00

@fnetX could you add more logs that you collected in the past few hours? They are more clues to finding a workaround.

A proper fix is unlikely, as explained above, because the codebase lacks the proper mechanisms to avoid race conditions that leads to deadlocks under heavy load.

The most likely candidates for increasing the odds of a race are complex changes that involve modifying a number of tables such as deleting repositories, issues or pull requests.

@fnetX could you add more logs that you collected in the past few hours? They are more clues to finding a workaround. A proper fix is unlikely, as explained above, because the codebase lacks the proper mechanisms to avoid race conditions that leads to deadlocks under heavy load. The most likely candidates for increasing the odds of a race are complex changes that involve modifying a number of tables such as deleting repositories, issues or pull requests.

Ghost commented

2023-06-30 08:13:37 +02:00

A possible fix would be to have locks at the repository level, in a middleware for web / api routes, blocking all operations while expensive operations such as deletion of issues or pull requests are in progress because they are the most likely candidates for creating the conditions for a deadlock.

Ghost commented

2023-06-30 14:31:58 +02:00

6/30/2023, 7:37:24 AM - dachary.org: Otto Richter | codeberg.org/fnetX: I'll take a look now. Has this deadlock resurfaced more than what the issue history has?
6/30/2023, 7:38:20 AM - dachary.org: I'll have to re-read the discussion. If there is any more data on the matter, can it be copy/pasted somewhere else to avoid the deadlock?
6/30/2023, 7:51:31 AM - dachary.org: Ok, I'm up to date.
6/30/2023, 7:56:58 AM - dachary.org: <@otto_richter:matrix.tu-berlin.de "If someone wants to look at the ..."> Please share this with me.
6/30/2023, 12:45:08 PM - dachary.org: do you throttle with HAproxy based on the URL? For signups for instance?
6/30/2023, 12:45:12 PM - Otto Richter | codeberg.org/fnetX: <@dachary:matrix.org "Otto Richter | codeberg.org/fnet..."> In the past days, it happened 1 to 10 times per day, and as you can see in the log in the issue, it seems to happen at different places like PR creation, PR comments, label addition, issue deletion, repo deletion
6/30/2023, 12:46:31 PM - Otto Richter | codeberg.org/fnetX: <@dachary:matrix.org "do you throttle with HAproxy bas..."> There is rate limiting, but no real throttling. We limit connections to Forgejo to a few hundred to keep the instance responsive enough to handle them. This likely requires more fine-tuning, but the problem is that certain requests are long-running (e.g. those to the event endpoint for notifications), so if we limit to tightly, there is no headroom for non-background requests.
6/30/2023, 12:47:05 PM - dachary.org: I think the root cause is a long operation (such as deleting a repository) and another operation that happens while this is ongoing, in another goroutine. And they enter a deadlock because there is no guarantee they won't.
6/30/2023, 12:47:33 PM - dachary.org: Rate limiting is what I meant.
6/30/2023, 12:49:10 PM - dachary.org: I think a temporary workaround would be to rate limit aggressively (at most one at a time) operations such as deleting an issue, a pull request or a repository. They will fail with a 429 but I suspect the benefit is that it will dramatically reduce the frequency of the deadlocks.
6/30/2023, 12:52:45 PM - dachary.org: Of course this would only have a positive effect if there currently are more than one delete (pull/issue/repo) request at a time. Which may or may not be the case at the moment. Looking at the logs should give some clarity there.
6/30/2023, 12:54:13 PM - Otto Richter | codeberg.org/fnetX: I do not think that someone was working inside Forgejo or the other affected repo at the same time. The issue persisted for quite a long time. I think the problem is rather a race condition in the request itself, like, incrementing the counter and the other request tries to join the repo table to get some IDs. At least this is how I interpreted the deadlock report by MariaDB.
6/30/2023, 12:54:27 PM - Otto Richter | codeberg.org/fnetX: I'll try to have a look at the access logs, too.
6/30/2023, 1:01:32 PM - dachary.org: (looking at the logs)
6/30/2023, 1:02:10 PM - dachary.org: has there already been discussions regarding the interpretation of these logs somewhere?
6/30/2023, 1:04:10 PM - Otto Richter | codeberg.org/fnetX: Talking about the deadlock MariaDB report: I do not think so, I think only you and me have seen them so far.
6/30/2023, 1:06:33 PM - dachary.org: I'd like to see more because all of these involve UPDATE issue SET num_comments=num_comments+1 WHERE id=? which is strangely specific.
6/30/2023, 1:13:58 PM - Otto Richter | codeberg.org/fnetX: I think MariaDB only tells you those details about the last deadlock that happened. We have three reports, because I obtained the last report three times.
6/30/2023, 1:14:09 PM - Otto Richter | codeberg.org/fnetX: But I guess we need to wait for the next deadlock to get more logs.
6/30/2023, 1:14:17 PM - Otto Richter | codeberg.org/fnetX: Or maybe someone else knows if there is some more backlog about this?
6/30/2023, 1:14:54 PM - dachary.org: These reports are very interesting and may hold the solution.
6/30/2023, 1:16:36 PM - Otto Richter | codeberg.org/fnetX: I only learned about that command yesterday, else I would have shared them earlier :)
6/30/2023, 1:18:26 PM - Otto Richter | codeberg.org/fnetX: Hmm, I can give you a log for the repo deletion deadlock, this is quite reproducible
6/30/2023, 1:34:08 PM - dachary.org: In the last repo detection log it looks like it is part of a user deletion because there is a DELETE FROM user WHERE id=? in one transaction and a DELETE FROM action WHERE repo_id=? in another transaction.
6/30/2023, 1:35:23 PM - Otto Richter | codeberg.org/fnetX: Hmm, I only removed a couple of repos at the same time.
6/30/2023, 1:35:52 PM - Otto Richter | codeberg.org/fnetX: The action removal makes sense: While a repo is removed, all assigned actions must be removed, too.
6/30/2023, 1:35:59 PM - Otto Richter | codeberg.org/fnetX: The delete from user does not make much sense to me.
6/30/2023, 1:36:55 PM - Otto Richter | codeberg.org/fnetX: Maybe the repo deletion deadlocks because a user is currently removing itself, thus locking some tables?
6/30/2023, 1:37:55 PM - Otto Richter | codeberg.org/fnetX: My repo deletion attempts caused
2023/06/30 11:17:43 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d3-32] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 11:17:58 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d3-306] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 11:18:13 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d4-119] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 11:18:28 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d4-288] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
6/30/2023, 1:38:07 PM - Otto Richter | codeberg.org/fnetX: I'll again send you the access logs during this time so you can investigate if a user is currently deleting itself.
6/30/2023, 1:52:58 PM - dachary.org: I more or less get the structure of the MariaDB logs now. But interpreting it requires learning more, from documentation. I can't understand what the problem is based on the content of the log. In a nusthell here is what it says:

Transaction (1) UPDATE issue SET num_comments=num_comments+1 WHERE id=? is waiting for a lock on index PRIMARY of table gitea_production.issue and the transaction is rolled back.

Transaction (2) is running UPDATE comment SET poster_id = ?, original_author = ?, original_author_id = ? WHERE issue_id IN (SELECT issue.id FROM issue INNER JOIN repository ON issue.repo_id = repository.id WHERE repository.original_service_type=?) AND (comment.original_author_id = ?) and is waiting on index PRIMARY of table gitea_production.comment

Transaction (2) is holding a lock on index PRIMARY of table gitea_production.issue.

6/30/2023, 1:54:30 PM - dachary.org: I fail to see why this is a deadlock. I would understand that it is a deadlock if Transaction (1) held a lock on the same thing Transaction (2) is waiting for. But the MariaDB logs do not claim that it does.
6/30/2023, 1:58:35 PM - dachary.org: I'll educate myself this afternoon to try to make sense of these logs.
6/30/2023, 1:59:39 PM - dachary.org: https://mariadb.com/kb/en/mariadb-transactions-and-isolation-levels-for-sql-server-users/#deadlocks
6/30/2023, 2:03:28 PM - dachary.org: https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html
6/30/2023, 2:06:09 PM - dachary.org: https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlock-example.html
6/30/2023, 2:06:15 PM - dachary.org: excellent example, with a deadlock report that totally makes sense.
6/30/2023, 2:06:19 PM - Olivier: Maybe the transaction (1) contains other statements as well.
6/30/2023, 2:07:08 PM - Olivier: The UPDATE issue is called from updateCommentInfos in models/issues/comment.go
6/30/2023, 2:07:42 PM - Otto Richter | codeberg.org/fnetX: From the article you linked, I understand that the UPDATE transaction cannot be done while a SELECT / read transaction is active in the same table, because the SELECT issue.id FROM issue part needs to scan the whole table for the join, but a resource in the table is locked by the UPDATE statement. I do not understand why the UPDATE statement is locked, but it probably is because there is already a READ lock on the table.

With a quick glance at the code, it might happen like this:

transaction 1 is started, adding the comment and thus holding a Insert lock on the comment table

transaction 2 is started and creating a shared lock on the issue table to query the issue

transaction 1 tries to update issue in the same transaction, but fails because the issue table is locked

it thus can never release the lock for the issue insert statement

transaction 2 can never finish the shared lock, because it waits for the comment table to be released, which is still locked by transaction 1

6/30/2023, 2:08:12 PM - Olivier: this updateCommentInfos is called from CreateComment in the same file, which might be run inside a transaction
6/30/2023, 2:08:12 PM - dachary.org: <@olivier:pfad.fr "Maybe the transaction (1) contai..."> Well yes but the deadlock report is entirely useless if it does not mention the two locks that cause the deadlock.
6/30/2023, 2:10:13 PM - dachary.org: There is no doubt in my mind that there are zillions of opportunities for deadlocks. And fixing them is a lot of work. I'm hoping that with a proper deadlock report that provides actual evidence of the two locks being held, we can maybe workaround one of the more frequent problem. But since the report contains incomplete information, we're out of luck.
6/30/2023, 2:18:21 PM - dachary.org: Could it be a false positive? The detection claims there is a deadlock but is wrong.
6/30/2023, 2:18:41 PM - Otto Richter | codeberg.org/fnetX: You mean the deadlock detection detects a deadlock which in fact isn't
6/30/2023, 2:18:59 PM - dachary.org: Yes. That would explain the report that makes no sense.
6/30/2023, 2:19:51 PM - Otto Richter | codeberg.org/fnetX: To be honest, I'd rather trust MariaDB engineers to have an understanding on databases than the Forgejo code, because everything database related is always claimed to be a mess, and no one dares fixing it. But we have already hit bugs in MariaDB at Codeberg's scale, too.
6/30/2023, 2:20:01 PM - dachary.org: > When deadlock detection is enabled (the default) and a deadlock does occur, InnoDB detects the condition and rolls back one of the transactions (the victim). If deadlock detection is disabled using the innodb_deadlock_detect variable, InnoDB relies on the innodb_lock_wait_timeout setting to roll back transactions in case of a deadlock.

https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html
6/30/2023, 2:20:36 PM - dachary.org: It will still deadlock because of a timeout.
6/30/2023, 2:23:06 PM - dachary.org: https://dev.mysql.com/blog-archive/innodb-data-locking-part-3-deadlocks/
6/30/2023, 2:23:42 PM - dachary.org: > ... we have almost no false positives ...
6/30/2023, 2:24:23 PM - dachary.org: It is quite possible that because of the Forgejo code it pushes the deadlock detection over the edge.
6/30/2023, 2:26:08 PM - dachary.org: But before jumping to conclusions I would try the example using the exact same version of MariaDB and verify the report is as good as expected. And research more to understand why a report can show only one lock instead of two.
6/30/2023, 2:27:04 PM - Otto Richter | codeberg.org/fnetX: I have to quit here for some hours. We're using Debian 11 on that container.
6/30/2023, 2:27:24 PM - Otto Richter | codeberg.org/fnetX: mariadb Ver 15.1 Distrib 10.5.19-MariaDB, for debian-linux-gnu (x86_64) using EditLine wrapper
6/30/2023, 2:27:44 PM - Otto Richter | codeberg.org/fnetX: We can also try to upgrade and see if it helps :) Upgrades to Debian 12 are due anyway sooner or later ...

> 6/30/2023, 7:37:24 AM - dachary.org: Otto Richter | codeberg.org/fnetX: I'll take a look now. Has this deadlock resurfaced more than what the issue history has? > 6/30/2023, 7:38:20 AM - dachary.org: I'll have to re-read the discussion. If there is any more data on the matter, can it be copy/pasted somewhere else to avoid the deadlock? > 6/30/2023, 7:51:31 AM - dachary.org: Ok, I'm up to date. > 6/30/2023, 7:56:58 AM - dachary.org: <@otto_richter:matrix.tu-berlin.de "If someone wants to look at the ..."> Please share this with me. > 6/30/2023, 12:45:08 PM - dachary.org: do you throttle with HAproxy based on the URL? For signups for instance? > 6/30/2023, 12:45:12 PM - Otto Richter | codeberg.org/fnetX: <@dachary:matrix.org "Otto Richter | codeberg.org/fnet..."> In the past days, it happened 1 to 10 times per day, and as you can see in the log in the issue, it seems to happen at different places like PR creation, PR comments, label addition, issue deletion, repo deletion > 6/30/2023, 12:46:31 PM - Otto Richter | codeberg.org/fnetX: <@dachary:matrix.org "do you throttle with HAproxy bas..."> There is rate limiting, but no real throttling. We limit connections to Forgejo to a few hundred to keep the instance responsive enough to handle them. This likely requires more fine-tuning, but the problem is that certain requests are long-running (e.g. those to the event endpoint for notifications), so if we limit to tightly, there is no headroom for non-background requests. > 6/30/2023, 12:47:05 PM - dachary.org: I think the root cause is a long operation (such as deleting a repository) and another operation that happens while this is ongoing, in another goroutine. And they enter a deadlock because there is no guarantee they won't. > 6/30/2023, 12:47:33 PM - dachary.org: Rate limiting is what I meant. > 6/30/2023, 12:49:10 PM - dachary.org: I think a temporary workaround would be to rate limit aggressively (at most one at a time) operations such as deleting an issue, a pull request or a repository. They will fail with a 429 but I suspect the benefit is that it will dramatically reduce the frequency of the deadlocks. > 6/30/2023, 12:52:45 PM - dachary.org: Of course this would only have a positive effect if there currently are more than one delete (pull/issue/repo) request at a time. Which may or may not be the case at the moment. Looking at the logs should give some clarity there. > 6/30/2023, 12:54:13 PM - Otto Richter | codeberg.org/fnetX: I do not think that someone was working inside Forgejo or the other affected repo at the same time. The issue persisted for quite a long time. I think the problem is rather a race condition in the request itself, like, incrementing the counter and the other request tries to join the repo table to get some IDs. At least this is how I interpreted the deadlock report by MariaDB. > 6/30/2023, 12:54:27 PM - Otto Richter | codeberg.org/fnetX: I'll try to have a look at the access logs, too. > 6/30/2023, 1:01:32 PM - dachary.org: (looking at the logs) > 6/30/2023, 1:02:10 PM - dachary.org: has there already been discussions regarding the interpretation of these logs somewhere? > 6/30/2023, 1:04:10 PM - Otto Richter | codeberg.org/fnetX: Talking about the deadlock MariaDB report: I do not think so, I think only you and me have seen them so far. > 6/30/2023, 1:06:33 PM - dachary.org: I'd like to see more because all of these involve **UPDATE `issue` SET num_comments=num_comments+1 WHERE id=?** which is strangely specific. > 6/30/2023, 1:13:58 PM - Otto Richter | codeberg.org/fnetX: I think MariaDB only tells you those details about the last deadlock that happened. We have three reports, because I obtained the last report three times. > 6/30/2023, 1:14:09 PM - Otto Richter | codeberg.org/fnetX: But I guess we need to wait for the next deadlock to get more logs. > 6/30/2023, 1:14:17 PM - Otto Richter | codeberg.org/fnetX: Or maybe someone else knows if there is some more backlog about this? > 6/30/2023, 1:14:54 PM - dachary.org: These reports are very interesting and may hold the solution. > 6/30/2023, 1:16:36 PM - Otto Richter | codeberg.org/fnetX: I only learned about that command yesterday, else I would have shared them earlier :) > 6/30/2023, 1:18:26 PM - Otto Richter | codeberg.org/fnetX: Hmm, I can give you a log for the repo deletion deadlock, this is quite reproducible > 6/30/2023, 1:34:08 PM - dachary.org: In the last repo detection log it looks like it is part of a user deletion because there is a **DELETE FROM `user` WHERE `id`=?** in one transaction and a **DELETE FROM `action` WHERE `repo_id`=?** in another transaction. > 6/30/2023, 1:35:23 PM - Otto Richter | codeberg.org/fnetX: Hmm, I only removed a couple of repos at the same time. > 6/30/2023, 1:35:52 PM - Otto Richter | codeberg.org/fnetX: The action removal makes sense: While a repo is removed, all assigned actions must be removed, too. > 6/30/2023, 1:35:59 PM - Otto Richter | codeberg.org/fnetX: The delete from user does not make much sense to me. > 6/30/2023, 1:36:55 PM - Otto Richter | codeberg.org/fnetX: Maybe the repo deletion deadlocks because a user is currently removing itself, thus locking some tables? > 6/30/2023, 1:37:55 PM - Otto Richter | codeberg.org/fnetX: My repo deletion attempts caused > > ~~~ > 2023/06/30 11:17:43 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d3-32] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > 2023/06/30 11:17:58 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d3-306] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > 2023/06/30 11:18:13 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d4-119] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > 2023/06/30 11:18:28 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d4-288] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > ~~~ > > 6/30/2023, 1:38:07 PM - Otto Richter | codeberg.org/fnetX: I'll again send you the access logs during this time so you can investigate if a user is currently deleting itself. > 6/30/2023, 1:52:58 PM - dachary.org: I more or less get the structure of the MariaDB logs now. But interpreting it requires learning more, from documentation. I can't understand what the problem is based on the content of the log. In a nusthell here is what it says: > > * Transaction (1) **UPDATE `issue` SET num_comments=num_comments+1 WHERE id=?** is waiting for a lock on **index PRIMARY of table `gitea_production`.`issue`** and the transaction is rolled back. > * Transaction (2) is running **UPDATE `comment` SET `poster_id` = ?, `original_author` = ?, `original_author_id` = ? WHERE issue_id IN (SELECT issue.id FROM issue INNER JOIN repository ON issue.repo_id = repository.id WHERE repository.original_service_type=?) AND (comment.original_author_id = ?)** and is waiting on **index PRIMARY of table `gitea_production`.`comment`** > * Transaction (2) is holding a lock on **index PRIMARY of table `gitea_production`.`issue`**. > > 6/30/2023, 1:54:30 PM - dachary.org: I fail to see why this is a deadlock. I would understand that it is a deadlock if Transaction (1) held a lock on the same thing Transaction (2) is waiting for. But the MariaDB logs do not claim that it does. > 6/30/2023, 1:58:35 PM - dachary.org: I'll educate myself this afternoon to try to make sense of these logs. > 6/30/2023, 1:59:39 PM - dachary.org: https://mariadb.com/kb/en/mariadb-transactions-and-isolation-levels-for-sql-server-users/#deadlocks > 6/30/2023, 2:03:28 PM - dachary.org: https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html > 6/30/2023, 2:06:09 PM - dachary.org: https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlock-example.html > 6/30/2023, 2:06:15 PM - dachary.org: excellent example, with a deadlock report that **totally** makes sense. > 6/30/2023, 2:06:19 PM - Olivier: Maybe the transaction (1) contains other statements as well. > 6/30/2023, 2:07:08 PM - Olivier: The `UPDATE issue` is called from `updateCommentInfos` in `models/issues/comment.go` > 6/30/2023, 2:07:42 PM - Otto Richter | codeberg.org/fnetX: From the article you linked, I understand that the UPDATE transaction cannot be done while a SELECT / read transaction is active in the same table, because the `SELECT issue.id FROM issue` part needs to scan the whole table for the join, but a resource in the table is locked by the UPDATE statement. I do not understand why the UPDATE statement is locked, but it probably is because there is already a READ lock on the table. > > With a quick glance at the code, it might happen like this: > > - transaction 1 is started, adding the comment and thus holding a Insert lock on the comment table > - transaction 2 is started and creating a shared lock on the issue table to query the issue > - transaction 1 tries to update issue in the same transaction, but fails because the issue table is locked > - it thus can never release the lock for the issue insert statement > - transaction 2 can never finish the shared lock, because it waits for the comment table to be released, which is still locked by transaction 1 > > 6/30/2023, 2:08:12 PM - Olivier: this `updateCommentInfos` is called from `CreateComment` in the same file, which might be run inside a transaction > 6/30/2023, 2:08:12 PM - dachary.org: <@olivier:pfad.fr "Maybe the transaction (1) contai..."> Well yes but the deadlock report is entirely useless if it does not mention the **two** locks that cause the deadlock. > 6/30/2023, 2:10:13 PM - dachary.org: There is no doubt in my mind that there are zillions of opportunities for deadlocks. And fixing them is a lot of work. I'm hoping that with a proper deadlock report that provides actual evidence of the **two** locks being held, we can maybe workaround one of the more frequent problem. But since the report contains incomplete information, we're out of luck. > 6/30/2023, 2:18:21 PM - dachary.org: Could it be a false positive? The detection claims there is a deadlock but is wrong. > 6/30/2023, 2:18:41 PM - Otto Richter | codeberg.org/fnetX: You mean the deadlock detection detects a deadlock which in fact isn't > 6/30/2023, 2:18:59 PM - dachary.org: Yes. That would explain the report that makes no sense. > 6/30/2023, 2:19:51 PM - Otto Richter | codeberg.org/fnetX: To be honest, I'd rather trust MariaDB engineers to have an understanding on databases than the Forgejo code, because everything database related is always claimed to be a mess, and no one dares fixing it. But we have already hit bugs in MariaDB at Codeberg's scale, too. > 6/30/2023, 2:20:01 PM - dachary.org: > When deadlock detection is enabled (the default) and a deadlock does occur, InnoDB detects the condition and rolls back one of the transactions (the victim). If deadlock detection is disabled using the innodb_deadlock_detect variable, InnoDB relies on the innodb_lock_wait_timeout setting to roll back transactions in case of a deadlock. > > https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html > 6/30/2023, 2:20:36 PM - dachary.org: It will still deadlock because of a timeout. > 6/30/2023, 2:23:06 PM - dachary.org: https://dev.mysql.com/blog-archive/innodb-data-locking-part-3-deadlocks/ > 6/30/2023, 2:23:42 PM - dachary.org: > ... we have almost no false positives ... > 6/30/2023, 2:24:23 PM - dachary.org: It is quite possible that **because of the Forgejo code** it pushes the deadlock detection over the edge. > 6/30/2023, 2:26:08 PM - dachary.org: **But** before jumping to conclusions I would try the example using the exact same version of MariaDB and verify the report is as good as expected. And research more to understand why a report can show only one lock instead of two. > 6/30/2023, 2:27:04 PM - Otto Richter | codeberg.org/fnetX: I have to quit here for some hours. We're using Debian 11 on that container. > 6/30/2023, 2:27:24 PM - Otto Richter | codeberg.org/fnetX: mariadb Ver 15.1 Distrib 10.5.19-MariaDB, for debian-linux-gnu (x86_64) using EditLine wrapper > 6/30/2023, 2:27:44 PM - Otto Richter | codeberg.org/fnetX: We can also try to upgrade and see if it helps :) Upgrades to Debian 12 are due anyway sooner or later ... >

earl-warren referenced this issue from Codeberg/Community

2023-07-01 08:02:17 +02:00

Creating a new release, gives blank page, Release is added at the end of the list #1090

circlebuilder referenced this issue from fediverse/fep

2023-07-08 07:28:47 +02:00

update 9606 -> 888d #116

fnetX commented

2023-07-17 13:56:41 +02:00

Author

Owner

A new one appeared:

2023/07/17 00:11:02 ...ons/runner/runner.go:106:FetchTask() [E] [64b48716-124] pick task failed: CreateTaskForRunner: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

for

Jul 17 00:11:02 gitea-production gitea[684]: 2023/07/17 00:11:02 [64b48716-124] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.0.3.1:46848, 500 Internal Server Error in 54.0ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm)

A new one appeared: ~~~ 2023/07/17 00:11:02 ...ons/runner/runner.go:106:FetchTask() [E] [64b48716-124] pick task failed: CreateTaskForRunner: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ~~~ for ~~~ Jul 17 00:11:02 gitea-production gitea[684]: 2023/07/17 00:11:02 [64b48716-124] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.0.3.1:46848, 500 Internal Server Error in 54.0ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm) ~~~

Gusted commented

2023-07-17 23:20:54 +02:00

Owner

A new one appeared:

2023/07/17 00:11:02 ...ons/runner/runner.go:106:FetchTask() [E] [64b48716-124] pick task failed: CreateTaskForRunner: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

for

Jul 17 00:11:02 gitea-production gitea[684]: 2023/07/17 00:11:02 [64b48716-124] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.0.3.1:46848, 500 Internal Server Error in 54.0ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm)

I took a look, that's probably because it uses some commit status database lookups, which are prone to deadlocks, as seen in the previous shared deadlock logs.

> A new one appeared: > > > ~~~ > 2023/07/17 00:11:02 ...ons/runner/runner.go:106:FetchTask() [E] [64b48716-124] pick task failed: CreateTaskForRunner: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > ~~~ > > for > > ~~~ > Jul 17 00:11:02 gitea-production gitea[684]: 2023/07/17 00:11:02 [64b48716-124] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.0.3.1:46848, 500 Internal Server Error in 54.0ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm) > ~~~ I took a look, that's probably because it uses some commit status database lookups, which are prone to deadlocks, as seen in the previous shared deadlock logs.

Gusted commented

2023-07-22 14:49:15 +02:00

Owner

Some activity around this topic, https://github.com/go-gitea/gitea/pull/26055

Gusted referenced this issue from a commit

Rows
Columns

[BUG] Database deadlocks #220