[BUG] Database deadlocks #220

Closed
opened 2023-01-08 19:21:55 +01:00 by fnetX · 31 comments
Owner

This one is not exactly new for Gitea (at least present during the 1.17 cycle), but I'm just in the situation that I cannot drop a big testing repo (many activity, thus database relations).

Post to https://codeberg.org/fnetX/wikitest/settings with delete repo action returns in error 500.

Log excerpt:

2023/01/08 18:19:16 .../web/repo/setting.go:757:SettingsPost() [E] [63bb091e-64] DeleteRepository: Error 1213: Deadlock found when trying to get lock; try restarting transaction
This one is not exactly new for Gitea (at least present during the 1.17 cycle), but I'm just in the situation that I cannot drop a big testing repo (many activity, thus database relations). Post to https://codeberg.org/fnetX/wikitest/settings with delete repo action returns in error 500. Log excerpt: ~~~ 2023/01/08 18:19:16 .../web/repo/setting.go:757:SettingsPost() [E] [63bb091e-64] DeleteRepository: Error 1213: Deadlock found when trying to get lock; try restarting transaction ~~~

Is this a transient failure? Or can it be reproduced?

Is this a transient failure? Or can it be reproduced?
Author
Owner

I tried 10 times. Retrying now:

Jan 08 20:12:18 gitea-production gitea[3729473]: 2023/01/08 20:12:18 [63bb2265-71] router: completed POST /fnetX/wikitest/settings for 10.0.3.1:42244, 500 Internal Server Error in 316901.9ms @ repo/setting.go:108(repo.SettingsPost)

But my browser received
"504 Gateway Time-out
The server didn't respond in time." (from reverse proxy I suppose).

So yes, seems to be fully reproducible.

What also happens in the log (not sure for which repos):

2023/01/08 20:12:18 .../web/repo/setting.go:757:SettingsPost() [E] [63bb2265-71] DeleteRepository: context canceled

Both issues combined:

grep DeleteRepository /data/git/log/gitea.log | wc -l
54
I tried 10 times. Retrying now: ~~~ Jan 08 20:12:18 gitea-production gitea[3729473]: 2023/01/08 20:12:18 [63bb2265-71] router: completed POST /fnetX/wikitest/settings for 10.0.3.1:42244, 500 Internal Server Error in 316901.9ms @ repo/setting.go:108(repo.SettingsPost) ~~~ But my browser received "504 Gateway Time-out The server didn't respond in time." (from reverse proxy I suppose). So yes, seems to be fully reproducible. What also happens in the log (not sure for which repos): ~~~ 2023/01/08 20:12:18 .../web/repo/setting.go:757:SettingsPost() [E] [63bb2265-71] DeleteRepository: context canceled ~~~ Both issues combined: ~~~ grep DeleteRepository /data/git/log/gitea.log | wc -l 54 ~~~
Author
Owner

The repo is gone now, so the last err 500 / timeout was actually a success it seems.

Also see Codeberg/Community#632 (was "resolved" by the locks that are now the problem).

The repo is gone now, so the last err 500 / timeout was actually a success it seems. Also see https://codeberg.org/Codeberg/Community/issues/632 (was "resolved" by the locks that are now the problem).

Looks like it's going to be one of these race conditions that will take a very long time to figure out. Could you save somewhere as much of the current logs as you can for forensic analysis? Once there is more evidence of the same problem it will help cross reference and figure out the root cause. There are so many possible race conditions in this codepath that could lead to a deadlock that I'm not sure where to begin with what there is right now. But it will resurface I'm sure.

Looks like it's going to be one of these race conditions that will take a very long time to figure out. Could you save somewhere as much of the current logs as you can for forensic analysis? Once there is more evidence of the same problem it will help cross reference and figure out the root cause. There are so many possible race conditions in this codepath that could lead to a deadlock that I'm not sure where to begin with what there is right now. But it will resurface I'm sure.
Owner

I'm not sure if this is a race condition. This rather seems to me that the transaction is trying to lock too many tables and that MySQL is simply saying "too busy, we cannot lock some of these tables, lets return deadlock". So we rather want to know which queries are being run "roughly" at the same time?

I'm not sure if this is a race condition. This rather seems to me that the transaction is trying to lock [too many tables](https://codeberg.org/forgejo/forgejo/src/commit/a459fa530fcfbc4657b258ea503d163fd0102884/models/repo.go#L131-L154) and that MySQL is simply saying "too busy, we cannot lock some of these tables, lets return deadlock". So we rather want to know which queries are being run "roughly" at the same time?

Can MySQL return deadlock when there is no deadlock?

Can MySQL return deadlock when there is no deadlock?
Contributor

I'm not sure if this is a race condition. This rather seems to me that the transaction is trying to lock too many tables and that MySQL is simply saying "too busy, we cannot lock some of these tables, lets return deadlock". So we rather want to know which queries are being run "roughly" at the same time?

I tend to believe that when a DB engine said it was a deadlock, it really was.

The problem behind is probably related with the transaction isolation level chosen (or leaved by default)1,2.

I really don't know where is defined the isolation level chosen by Gitea/Forgejo for their DB model as a whole or by particular transactions, but it has a huge impact in performance of concurrency under heavy load environment.

> I'm not sure if this is a race condition. This rather seems to me that the transaction is trying to lock [too many tables](https://codeberg.org/forgejo/forgejo/src/commit/a459fa530fcfbc4657b258ea503d163fd0102884/models/repo.go#L131-L154) and that MySQL is simply saying "too busy, we cannot lock some of these tables, lets return deadlock". So we rather want to know which queries are being run "roughly" at the same time? I tend to believe that when a DB engine said it was a deadlock, it really was. The problem behind is probably related with the transaction isolation level chosen (or leaved by default)<sup>[1][1],[2][2]</sup>. I really don't know where is defined the isolation level chosen by Gitea/Forgejo for their DB model as a whole or by particular transactions, but it has a huge impact in performance of concurrency under heavy load environment. [1]: https://mariadb.com/kb/en/set-transaction/ [2]: https://www.postgresql.org/docs/current/transaction-iso.html

Here is one possible deadlock scenario.

  • Goroutine A locks table T1
  • Goroutine B locks table T2
  • Goroutine A waits on table T2
  • Goroutine B waits on table T1

In that particular case this code is run by one goroutine and locks one of the many tables. While it is holding this lock on T1 it also tries to get a lock on another table T2. But another goroutine has the lock on T2 so it must wait. Unfortunately (and here is the race) this other goroutine will wait on the lock from T1 to be released before it releases the lock on T2.

There is no architectural design in the codebase to prevent that kind of race, therefore it is bound to happen. And the odds of it happening increase when the system is under heavy load.

Here is one possible deadlock scenario. * Goroutine A locks table T1 * Goroutine B locks table T2 * Goroutine A waits on table T2 * Goroutine B waits on table T1 In that particular case [this code](https://codeberg.org/forgejo/forgejo/src/commit/a459fa530fcfbc4657b258ea503d163fd0102884/models/repo.go#L131-L154) is run by one goroutine and locks one of the many tables. While it is holding this lock on T1 it also tries to get a lock on another table T2. But another goroutine has the lock on T2 so it must wait. Unfortunately (and here is the race) this other goroutine will wait on the lock from T1 to be released before it releases the lock on T2. There is no architectural design in the codebase to prevent that kind of race, therefore it is bound to happen. And the odds of it happening increase when the system is under heavy load.
Owner

There is no architectural design in the codebase to prevent that kind of race, therefore it is bound to happen. And the odds of it happening increase when the system is under heavy load.

I'm by no means a expert at (scaling / high-load) database, but I would assume there's some strategy to avoid this deadlocks from happening? Maybe changing the transaction isolation level that @fsologureng mentioned?

> There is no architectural design in the codebase to prevent that kind of race, therefore it is bound to happen. And the odds of it happening increase when the system is under heavy load. I'm by no means a expert at (scaling / high-load) database, but I would assume there's some strategy to avoid this deadlocks from happening? Maybe changing the transaction isolation level that @fsologureng mentioned?

Deadlocks are the result of bad code design. In order to figure out the root cause of a given deadlock, all you need is a stack trace of both goroutine and a dose of patience to analyze the associated codepath.

At present there only is a location in the code (web/repo/setting.go:757) and no stack trace. That's why I'm saying that collecting more evidence from future similar occurrences of a deadlock is necessary.

Deadlocks are the result of bad code design. In order to figure out the root cause of a given deadlock, all you need is a stack trace of both goroutine and a dose of patience to analyze the associated codepath. At present there only is a location in the code (web/repo/setting.go:757) and no stack trace. That's why I'm saying that collecting more evidence from future similar occurrences of a deadlock is necessary.
Owner

At present there only is a location in the code (web/repo/setting.go:757) and no stack trace. That's why I'm saying that collecting more evidence from future similar occurrences of a deadlock is necessary.

Hmm why specifically stack trace? wouldn't it be more interesting to have the SQL queries that are being run roughly around the same time? Either way I can brew up custom patches to Codeberg to collect such information.

> At present there only is a location in the code (web/repo/setting.go:757) and no stack trace. That's why I'm saying that collecting more evidence from future similar occurrences of a deadlock is necessary. Hmm why specifically stack trace? wouldn't it be more interesting to have the SQL queries that are being run roughly around the same time? Either way I can brew up custom patches to Codeberg to collect such information.

The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking.

The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking.
Contributor

Deadlocks are the result of bad code design.

Yes, but not totally code dependant; The model of consistency chosen (part of the design) could be supported by the DB Engine too. Not all the consistencies need a serializationn of all the changes. That extreme case scales very bad. Instead, it is possible to relax some conditions. For example, If the code goes to delete a repo from its dedicated table (and all the related ones), but at the same time another thread is counting the repos, is possible that the counting return while the delete is occurring (or even after), informing a former state of the DB (because it started with access shared to the table). But the deleting thread couldn't inform a bad state of the DB after finished its transaction (or inside its transaction). This prevents inconsistency in the scope of each thread, but not necessarily at global scope. That kind of details about consistency have a huge impact in performance because define when a lock (begin transaction) can be acquired.

Viewing the documentation of the ORM used I suppose that the transaction isolation level is not defined at each transaction, mainly because their syntax is very Engine dependant. Furthermore, I can't find definition at Engine level in the code neither, so apparently the defaults of each installation are being used. I suppose that autocommit is being used too.

I can brew up custom patches to Codeberg to collect such information.

Obtaining the whole SQL transaction permits analyze the deadlock, because occurs at pure DB level.

I have experience with this kind of debug in PG. There are ways to log errors with the SQL command involved. In MariaDB I suppose its very possible too without code modification.

> Deadlocks are the result of bad code design. Yes, but not totally code dependant; The model of consistency chosen (part of the design) could be supported by the DB Engine too. Not all the consistencies need a serializationn of all the changes. That extreme case scales very bad. Instead, it is possible to relax some conditions. For example, If the code goes to delete a repo from its dedicated table (and all the related ones), but at the same time another thread is counting the repos, is possible that the counting return *while* the delete is occurring (or even after), informing a former state of the DB (because it started with access shared to the table). But the deleting thread couldn't inform a bad state of the DB after finished its transaction (or inside its transaction). This prevents inconsistency in the scope of each thread, but not necessarily at global scope. That kind of details about consistency have a huge impact in performance because define when a lock (begin transaction) can be acquired. Viewing the documentation of the [ORM used](https://xorm.io/docs/chapter-10/readme/) I suppose that the transaction isolation level is not defined at each transaction, mainly because their syntax is very Engine dependant. Furthermore, I can't find definition at Engine level in the code neither, so apparently the defaults of each installation are being used. I suppose that autocommit is being used too. > I can brew up custom patches to Codeberg to collect such information. Obtaining the whole SQL transaction permits analyze the deadlock, because occurs at pure DB level. I have experience with this kind of debug in PG. There are ways to log errors with the SQL command involved. In MariaDB I suppose its very possible too without code modification.
Owner

The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking.

I think we misunderstand each other here, AFAIK there isn't another SQL query that's deadlocking at the same time.

Obtaining the whole SQL transaction permits analyze the deadlock, because occurs at pure DB level.

I have experience with this kind of debug in PG. There are ways to log errors with the SQL command involved. In MariaDB I suppose its very possible too without code modification.

There's a option in app.ini to enable SQL logging.

> The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking. I think we misunderstand each other here, AFAIK there isn't another SQL query that's deadlocking at the same time. > Obtaining the whole SQL transaction permits analyze the deadlock, because occurs at pure DB level. > > I have experience with this kind of debug in PG. There are ways to log errors with the SQL command involved. In MariaDB I suppose its very possible too without code modification. There's a option in app.ini to enable SQL logging.

The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking.

I think we misunderstand each other here, AFAIK there isn't another SQL query that's deadlocking at the same time.

I don't get how a single SQL statement can deadlock, could you give me an example?

> > The SQL queries alone won't tell you much (unless they are super specific) about the two goroutine that ended up deadlocking. > > I think we misunderstand each other here, AFAIK there isn't another SQL query that's deadlocking at the same time. I don't get how a single SQL statement can deadlock, could you give me an example?
Owner

I don't get how a single SQL statement can deadlock, could you give me an example?

I don't meant that, I meant that there aren't other SQL deadlocking at the same time. There's only one goroutine that executes a SQL query that will receive a deadlock error by MySQL. Your comment implied to me that you mean that there are two goroutines that receive the deadlocked error at the same time.

> I don't get how a single SQL statement can deadlock, could you give me an example? I don't meant that, I meant that there aren't other SQL deadlocking at the same time. There's only one goroutine that executes a SQL query that will receive a deadlock error by MySQL. Your comment implied to me that you mean that there are two goroutines that receive the deadlocked error at the same time.

Thanks for explaining, I understand what you meant now.

Thanks for explaining, I understand what you meant now.
Contributor

There's a option in app.ini to enable SQL logging.

In Codeberg I imagine that that is a nightmare. Are you planning to replicate the case in codeberg test?

PD: sorry that I pinned notifications about this, and didn't receive notifications 🤦

> There's a option in app.ini to enable SQL logging. In Codeberg I imagine that that is a nightmare. Are you planning to replicate the case in codeberg test? PD: sorry that I pinned notifications about this, and didn't receive notifications 🤦
Author
Owner

Hi all, this issue is still super serious to Codeberg. Basically, you cannot remove two repositories or issues simultaneously, and first users even started to complain this would violate privacy guidelines. (Users can still ask us to delete it, though, but it's a terrible thing nevertheless).

Further, we just now got a report about even posting issue comments resulting in deadlocks here: Codeberg/Community#1092

From a scaling point of view, this might be the most serious bug Codeberg is currently facing (database deadlocks).

Hi all, this issue is still super serious to Codeberg. Basically, you cannot remove two repositories or issues simultaneously, and first users even started to complain this would violate privacy guidelines. (Users can still ask us to delete it, though, but it's a terrible thing nevertheless). Further, we just now got a report about even posting issue comments resulting in deadlocks here: https://codeberg.org/Codeberg/Community/issues/1092 From a scaling point of view, this might be the most serious bug Codeberg is currently facing (database deadlocks).
Author
Owner
I continued in the Matrix channel at https://matrix.to/#/%23forgejo-development%3Amatrix.org/%24FqVOejU_MoGYi_mP3e1DjAKlO5Dxg2cqRsYwLQW3gt0?via=matrix.tu-berlin.de&via=matrix.org&via=aria-net.org&via=exozy.me because I wasn't able to post further comments to this thread.
Author
Owner

Original comment:

The content I am trying to add:

Funnily enough, while trying to add the "scaling" label, I caused a db deadlock.

Server log from the past 30 minutes:

2023/06/30 00:00:17 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b0a-140] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:01:09 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b42-111] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:01:59 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b74-192] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:02:10 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b80-127] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:11:01 ...epo/issue_comment.go:367:CreateIssueComment() [E] [649e1d95-141] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:12:23 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1de3-134] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:29:53 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e21fd-119] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 00:32:42 .../repo/issue_label.go:212:UpdateIssueLabel() [E] [649e22aa-3] AddLabel: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Label deadlocks also have some history (see ...), although they were "fixed" multiple times already. It sounds like there is a fundamental flaw somewhere?

Original comment: The content I am trying to add: Funnily enough, while trying to add the "scaling" label, I caused a db deadlock. Server log from the past 30 minutes: ~~~ 2023/06/30 00:00:17 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b0a-140] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:01:09 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b42-111] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:01:59 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b74-192] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:02:10 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1b80-127] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:11:01 ...epo/issue_comment.go:367:CreateIssueComment() [E] [649e1d95-141] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:12:23 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e1de3-134] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:29:53 ...rs/web/repo/issue.go:2851:NewComment() [E] [649e21fd-119] CreateIssueComment: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 2023/06/30 00:32:42 .../repo/issue_label.go:212:UpdateIssueLabel() [E] [649e22aa-3] AddLabel: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ~~~ Label deadlocks also have some history (see ...), although they were "fixed" multiple times already. It sounds like there is a fundamental flaw somewhere?
fnetX changed title from [BUG] Database deadlock (error 500) when removing repo to [BUG] Database deadlocks 2023-06-30 02:47:03 +02:00

@fnetX could you add more logs that you collected in the past few hours? They are more clues to finding a workaround.

A proper fix is unlikely, as explained above, because the codebase lacks the proper mechanisms to avoid race conditions that leads to deadlocks under heavy load.

The most likely candidates for increasing the odds of a race are complex changes that involve modifying a number of tables such as deleting repositories, issues or pull requests.

@fnetX could you add more logs that you collected in the past few hours? They are more clues to finding a workaround. A proper fix is unlikely, as explained above, because the codebase lacks the proper mechanisms to avoid race conditions that leads to deadlocks under heavy load. The most likely candidates for increasing the odds of a race are complex changes that involve modifying a number of tables such as deleting repositories, issues or pull requests.

A possible fix would be to have locks at the repository level, in a middleware for web / api routes, blocking all operations while expensive operations such as deletion of issues or pull requests are in progress because they are the most likely candidates for creating the conditions for a deadlock.

A possible fix would be to have locks at the repository level, in a middleware for web / api routes, blocking all operations while expensive operations such as deletion of issues or pull requests are in progress because they are the most likely candidates for creating the conditions for a deadlock.

6/30/2023, 7:37:24 AM - dachary.org: Otto Richter | codeberg.org/fnetX: I'll take a look now. Has this deadlock resurfaced more than what the issue history has?
6/30/2023, 7:38:20 AM - dachary.org: I'll have to re-read the discussion. If there is any more data on the matter, can it be copy/pasted somewhere else to avoid the deadlock?
6/30/2023, 7:51:31 AM - dachary.org: Ok, I'm up to date.
6/30/2023, 7:56:58 AM - dachary.org: <@otto_richter:matrix.tu-berlin.de "If someone wants to look at the ..."> Please share this with me.
6/30/2023, 12:45:08 PM - dachary.org: do you throttle with HAproxy based on the URL? For signups for instance?
6/30/2023, 12:45:12 PM - Otto Richter | codeberg.org/fnetX: <@dachary:matrix.org "Otto Richter | codeberg.org/fnet..."> In the past days, it happened 1 to 10 times per day, and as you can see in the log in the issue, it seems to happen at different places like PR creation, PR comments, label addition, issue deletion, repo deletion
6/30/2023, 12:46:31 PM - Otto Richter | codeberg.org/fnetX: <@dachary:matrix.org "do you throttle with HAproxy bas..."> There is rate limiting, but no real throttling. We limit connections to Forgejo to a few hundred to keep the instance responsive enough to handle them. This likely requires more fine-tuning, but the problem is that certain requests are long-running (e.g. those to the event endpoint for notifications), so if we limit to tightly, there is no headroom for non-background requests.
6/30/2023, 12:47:05 PM - dachary.org: I think the root cause is a long operation (such as deleting a repository) and another operation that happens while this is ongoing, in another goroutine. And they enter a deadlock because there is no guarantee they won't.
6/30/2023, 12:47:33 PM - dachary.org: Rate limiting is what I meant.
6/30/2023, 12:49:10 PM - dachary.org: I think a temporary workaround would be to rate limit aggressively (at most one at a time) operations such as deleting an issue, a pull request or a repository. They will fail with a 429 but I suspect the benefit is that it will dramatically reduce the frequency of the deadlocks.
6/30/2023, 12:52:45 PM - dachary.org: Of course this would only have a positive effect if there currently are more than one delete (pull/issue/repo) request at a time. Which may or may not be the case at the moment. Looking at the logs should give some clarity there.
6/30/2023, 12:54:13 PM - Otto Richter | codeberg.org/fnetX: I do not think that someone was working inside Forgejo or the other affected repo at the same time. The issue persisted for quite a long time. I think the problem is rather a race condition in the request itself, like, incrementing the counter and the other request tries to join the repo table to get some IDs. At least this is how I interpreted the deadlock report by MariaDB.
6/30/2023, 12:54:27 PM - Otto Richter | codeberg.org/fnetX: I'll try to have a look at the access logs, too.
6/30/2023, 1:01:32 PM - dachary.org: (looking at the logs)
6/30/2023, 1:02:10 PM - dachary.org: has there already been discussions regarding the interpretation of these logs somewhere?
6/30/2023, 1:04:10 PM - Otto Richter | codeberg.org/fnetX: Talking about the deadlock MariaDB report: I do not think so, I think only you and me have seen them so far.
6/30/2023, 1:06:33 PM - dachary.org: I'd like to see more because all of these involve UPDATE issue SET num_comments=num_comments+1 WHERE id=? which is strangely specific.
6/30/2023, 1:13:58 PM - Otto Richter | codeberg.org/fnetX: I think MariaDB only tells you those details about the last deadlock that happened. We have three reports, because I obtained the last report three times.
6/30/2023, 1:14:09 PM - Otto Richter | codeberg.org/fnetX: But I guess we need to wait for the next deadlock to get more logs.
6/30/2023, 1:14:17 PM - Otto Richter | codeberg.org/fnetX: Or maybe someone else knows if there is some more backlog about this?
6/30/2023, 1:14:54 PM - dachary.org: These reports are very interesting and may hold the solution.
6/30/2023, 1:16:36 PM - Otto Richter | codeberg.org/fnetX: I only learned about that command yesterday, else I would have shared them earlier :)
6/30/2023, 1:18:26 PM - Otto Richter | codeberg.org/fnetX: Hmm, I can give you a log for the repo deletion deadlock, this is quite reproducible
6/30/2023, 1:34:08 PM - dachary.org: In the last repo detection log it looks like it is part of a user deletion because there is a DELETE FROM user WHERE id=? in one transaction and a DELETE FROM action WHERE repo_id=? in another transaction.
6/30/2023, 1:35:23 PM - Otto Richter | codeberg.org/fnetX: Hmm, I only removed a couple of repos at the same time.
6/30/2023, 1:35:52 PM - Otto Richter | codeberg.org/fnetX: The action removal makes sense: While a repo is removed, all assigned actions must be removed, too.
6/30/2023, 1:35:59 PM - Otto Richter | codeberg.org/fnetX: The delete from user does not make much sense to me.
6/30/2023, 1:36:55 PM - Otto Richter | codeberg.org/fnetX: Maybe the repo deletion deadlocks because a user is currently removing itself, thus locking some tables?
6/30/2023, 1:37:55 PM - Otto Richter | codeberg.org/fnetX: My repo deletion attempts caused

2023/06/30 11:17:43 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d3-32] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 11:17:58 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d3-306] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 11:18:13 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d4-119] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2023/06/30 11:18:28 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d4-288] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

6/30/2023, 1:38:07 PM - Otto Richter | codeberg.org/fnetX: I'll again send you the access logs during this time so you can investigate if a user is currently deleting itself.
6/30/2023, 1:52:58 PM - dachary.org: I more or less get the structure of the MariaDB logs now. But interpreting it requires learning more, from documentation. I can't understand what the problem is based on the content of the log. In a nusthell here is what it says:

  • Transaction (1) UPDATE issue SET num_comments=num_comments+1 WHERE id=? is waiting for a lock on index PRIMARY of table gitea_production.issue and the transaction is rolled back.
  • Transaction (2) is running UPDATE comment SET poster_id = ?, original_author = ?, original_author_id = ? WHERE issue_id IN (SELECT issue.id FROM issue INNER JOIN repository ON issue.repo_id = repository.id WHERE repository.original_service_type=?) AND (comment.original_author_id = ?) and is waiting on index PRIMARY of table gitea_production.comment
  • Transaction (2) is holding a lock on index PRIMARY of table gitea_production.issue.

6/30/2023, 1:54:30 PM - dachary.org: I fail to see why this is a deadlock. I would understand that it is a deadlock if Transaction (1) held a lock on the same thing Transaction (2) is waiting for. But the MariaDB logs do not claim that it does.
6/30/2023, 1:58:35 PM - dachary.org: I'll educate myself this afternoon to try to make sense of these logs.
6/30/2023, 1:59:39 PM - dachary.org: https://mariadb.com/kb/en/mariadb-transactions-and-isolation-levels-for-sql-server-users/#deadlocks
6/30/2023, 2:03:28 PM - dachary.org: https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html
6/30/2023, 2:06:09 PM - dachary.org: https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlock-example.html
6/30/2023, 2:06:15 PM - dachary.org: excellent example, with a deadlock report that totally makes sense.
6/30/2023, 2:06:19 PM - Olivier: Maybe the transaction (1) contains other statements as well.
6/30/2023, 2:07:08 PM - Olivier: The UPDATE issue is called from updateCommentInfos in models/issues/comment.go
6/30/2023, 2:07:42 PM - Otto Richter | codeberg.org/fnetX: From the article you linked, I understand that the UPDATE transaction cannot be done while a SELECT / read transaction is active in the same table, because the SELECT issue.id FROM issue part needs to scan the whole table for the join, but a resource in the table is locked by the UPDATE statement. I do not understand why the UPDATE statement is locked, but it probably is because there is already a READ lock on the table.

With a quick glance at the code, it might happen like this:

  • transaction 1 is started, adding the comment and thus holding a Insert lock on the comment table
  • transaction 2 is started and creating a shared lock on the issue table to query the issue
  • transaction 1 tries to update issue in the same transaction, but fails because the issue table is locked
  • it thus can never release the lock for the issue insert statement
  • transaction 2 can never finish the shared lock, because it waits for the comment table to be released, which is still locked by transaction 1

6/30/2023, 2:08:12 PM - Olivier: this updateCommentInfos is called from CreateComment in the same file, which might be run inside a transaction
6/30/2023, 2:08:12 PM - dachary.org: <@olivier:pfad.fr "Maybe the transaction (1) contai..."> Well yes but the deadlock report is entirely useless if it does not mention the two locks that cause the deadlock.
6/30/2023, 2:10:13 PM - dachary.org: There is no doubt in my mind that there are zillions of opportunities for deadlocks. And fixing them is a lot of work. I'm hoping that with a proper deadlock report that provides actual evidence of the two locks being held, we can maybe workaround one of the more frequent problem. But since the report contains incomplete information, we're out of luck.
6/30/2023, 2:18:21 PM - dachary.org: Could it be a false positive? The detection claims there is a deadlock but is wrong.
6/30/2023, 2:18:41 PM - Otto Richter | codeberg.org/fnetX: You mean the deadlock detection detects a deadlock which in fact isn't
6/30/2023, 2:18:59 PM - dachary.org: Yes. That would explain the report that makes no sense.
6/30/2023, 2:19:51 PM - Otto Richter | codeberg.org/fnetX: To be honest, I'd rather trust MariaDB engineers to have an understanding on databases than the Forgejo code, because everything database related is always claimed to be a mess, and no one dares fixing it. But we have already hit bugs in MariaDB at Codeberg's scale, too.
6/30/2023, 2:20:01 PM - dachary.org: > When deadlock detection is enabled (the default) and a deadlock does occur, InnoDB detects the condition and rolls back one of the transactions (the victim). If deadlock detection is disabled using the innodb_deadlock_detect variable, InnoDB relies on the innodb_lock_wait_timeout setting to roll back transactions in case of a deadlock.

https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html
6/30/2023, 2:20:36 PM - dachary.org: It will still deadlock because of a timeout.
6/30/2023, 2:23:06 PM - dachary.org: https://dev.mysql.com/blog-archive/innodb-data-locking-part-3-deadlocks/
6/30/2023, 2:23:42 PM - dachary.org: > ... we have almost no false positives ...
6/30/2023, 2:24:23 PM - dachary.org: It is quite possible that because of the Forgejo code it pushes the deadlock detection over the edge.
6/30/2023, 2:26:08 PM - dachary.org: But before jumping to conclusions I would try the example using the exact same version of MariaDB and verify the report is as good as expected. And research more to understand why a report can show only one lock instead of two.
6/30/2023, 2:27:04 PM - Otto Richter | codeberg.org/fnetX: I have to quit here for some hours. We're using Debian 11 on that container.
6/30/2023, 2:27:24 PM - Otto Richter | codeberg.org/fnetX: mariadb Ver 15.1 Distrib 10.5.19-MariaDB, for debian-linux-gnu (x86_64) using EditLine wrapper
6/30/2023, 2:27:44 PM - Otto Richter | codeberg.org/fnetX: We can also try to upgrade and see if it helps :) Upgrades to Debian 12 are due anyway sooner or later ...

> 6/30/2023, 7:37:24 AM - dachary.org: Otto Richter | codeberg.org/fnetX: I'll take a look now. Has this deadlock resurfaced more than what the issue history has? > 6/30/2023, 7:38:20 AM - dachary.org: I'll have to re-read the discussion. If there is any more data on the matter, can it be copy/pasted somewhere else to avoid the deadlock? > 6/30/2023, 7:51:31 AM - dachary.org: Ok, I'm up to date. > 6/30/2023, 7:56:58 AM - dachary.org: <@otto_richter:matrix.tu-berlin.de "If someone wants to look at the ..."> Please share this with me. > 6/30/2023, 12:45:08 PM - dachary.org: do you throttle with HAproxy based on the URL? For signups for instance? > 6/30/2023, 12:45:12 PM - Otto Richter | codeberg.org/fnetX: <@dachary:matrix.org "Otto Richter | codeberg.org/fnet..."> In the past days, it happened 1 to 10 times per day, and as you can see in the log in the issue, it seems to happen at different places like PR creation, PR comments, label addition, issue deletion, repo deletion > 6/30/2023, 12:46:31 PM - Otto Richter | codeberg.org/fnetX: <@dachary:matrix.org "do you throttle with HAproxy bas..."> There is rate limiting, but no real throttling. We limit connections to Forgejo to a few hundred to keep the instance responsive enough to handle them. This likely requires more fine-tuning, but the problem is that certain requests are long-running (e.g. those to the event endpoint for notifications), so if we limit to tightly, there is no headroom for non-background requests. > 6/30/2023, 12:47:05 PM - dachary.org: I think the root cause is a long operation (such as deleting a repository) and another operation that happens while this is ongoing, in another goroutine. And they enter a deadlock because there is no guarantee they won't. > 6/30/2023, 12:47:33 PM - dachary.org: Rate limiting is what I meant. > 6/30/2023, 12:49:10 PM - dachary.org: I think a temporary workaround would be to rate limit aggressively (at most one at a time) operations such as deleting an issue, a pull request or a repository. They will fail with a 429 but I suspect the benefit is that it will dramatically reduce the frequency of the deadlocks. > 6/30/2023, 12:52:45 PM - dachary.org: Of course this would only have a positive effect if there currently are more than one delete (pull/issue/repo) request at a time. Which may or may not be the case at the moment. Looking at the logs should give some clarity there. > 6/30/2023, 12:54:13 PM - Otto Richter | codeberg.org/fnetX: I do not think that someone was working inside Forgejo or the other affected repo at the same time. The issue persisted for quite a long time. I think the problem is rather a race condition in the request itself, like, incrementing the counter and the other request tries to join the repo table to get some IDs. At least this is how I interpreted the deadlock report by MariaDB. > 6/30/2023, 12:54:27 PM - Otto Richter | codeberg.org/fnetX: I'll try to have a look at the access logs, too. > 6/30/2023, 1:01:32 PM - dachary.org: (looking at the logs) > 6/30/2023, 1:02:10 PM - dachary.org: has there already been discussions regarding the interpretation of these logs somewhere? > 6/30/2023, 1:04:10 PM - Otto Richter | codeberg.org/fnetX: Talking about the deadlock MariaDB report: I do not think so, I think only you and me have seen them so far. > 6/30/2023, 1:06:33 PM - dachary.org: I'd like to see more because all of these involve **UPDATE `issue` SET num_comments=num_comments+1 WHERE id=?** which is strangely specific. > 6/30/2023, 1:13:58 PM - Otto Richter | codeberg.org/fnetX: I think MariaDB only tells you those details about the last deadlock that happened. We have three reports, because I obtained the last report three times. > 6/30/2023, 1:14:09 PM - Otto Richter | codeberg.org/fnetX: But I guess we need to wait for the next deadlock to get more logs. > 6/30/2023, 1:14:17 PM - Otto Richter | codeberg.org/fnetX: Or maybe someone else knows if there is some more backlog about this? > 6/30/2023, 1:14:54 PM - dachary.org: These reports are very interesting and may hold the solution. > 6/30/2023, 1:16:36 PM - Otto Richter | codeberg.org/fnetX: I only learned about that command yesterday, else I would have shared them earlier :) > 6/30/2023, 1:18:26 PM - Otto Richter | codeberg.org/fnetX: Hmm, I can give you a log for the repo deletion deadlock, this is quite reproducible > 6/30/2023, 1:34:08 PM - dachary.org: In the last repo detection log it looks like it is part of a user deletion because there is a **DELETE FROM `user` WHERE `id`=?** in one transaction and a **DELETE FROM `action` WHERE `repo_id`=?** in another transaction. > 6/30/2023, 1:35:23 PM - Otto Richter | codeberg.org/fnetX: Hmm, I only removed a couple of repos at the same time. > 6/30/2023, 1:35:52 PM - Otto Richter | codeberg.org/fnetX: The action removal makes sense: While a repo is removed, all assigned actions must be removed, too. > 6/30/2023, 1:35:59 PM - Otto Richter | codeberg.org/fnetX: The delete from user does not make much sense to me. > 6/30/2023, 1:36:55 PM - Otto Richter | codeberg.org/fnetX: Maybe the repo deletion deadlocks because a user is currently removing itself, thus locking some tables? > 6/30/2023, 1:37:55 PM - Otto Richter | codeberg.org/fnetX: My repo deletion attempts caused > > ~~~ > 2023/06/30 11:17:43 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d3-32] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > 2023/06/30 11:17:58 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d3-306] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > 2023/06/30 11:18:13 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d4-119] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > 2023/06/30 11:18:28 .../api/v1/repo/repo.go:1082:Delete() [E] [649eb9d4-288] DeleteRepository: deleteBeans: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > ~~~ > > 6/30/2023, 1:38:07 PM - Otto Richter | codeberg.org/fnetX: I'll again send you the access logs during this time so you can investigate if a user is currently deleting itself. > 6/30/2023, 1:52:58 PM - dachary.org: I more or less get the structure of the MariaDB logs now. But interpreting it requires learning more, from documentation. I can't understand what the problem is based on the content of the log. In a nusthell here is what it says: > > * Transaction (1) **UPDATE `issue` SET num_comments=num_comments+1 WHERE id=?** is waiting for a lock on **index PRIMARY of table `gitea_production`.`issue`** and the transaction is rolled back. > * Transaction (2) is running **UPDATE `comment` SET `poster_id` = ?, `original_author` = ?, `original_author_id` = ? WHERE issue_id IN (SELECT issue.id FROM issue INNER JOIN repository ON issue.repo_id = repository.id WHERE repository.original_service_type=?) AND (comment.original_author_id = ?)** and is waiting on **index PRIMARY of table `gitea_production`.`comment`** > * Transaction (2) is holding a lock on **index PRIMARY of table `gitea_production`.`issue`**. > > 6/30/2023, 1:54:30 PM - dachary.org: I fail to see why this is a deadlock. I would understand that it is a deadlock if Transaction (1) held a lock on the same thing Transaction (2) is waiting for. But the MariaDB logs do not claim that it does. > 6/30/2023, 1:58:35 PM - dachary.org: I'll educate myself this afternoon to try to make sense of these logs. > 6/30/2023, 1:59:39 PM - dachary.org: https://mariadb.com/kb/en/mariadb-transactions-and-isolation-levels-for-sql-server-users/#deadlocks > 6/30/2023, 2:03:28 PM - dachary.org: https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html > 6/30/2023, 2:06:09 PM - dachary.org: https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlock-example.html > 6/30/2023, 2:06:15 PM - dachary.org: excellent example, with a deadlock report that **totally** makes sense. > 6/30/2023, 2:06:19 PM - Olivier: Maybe the transaction (1) contains other statements as well. > 6/30/2023, 2:07:08 PM - Olivier: The `UPDATE issue` is called from `updateCommentInfos` in `models/issues/comment.go` > 6/30/2023, 2:07:42 PM - Otto Richter | codeberg.org/fnetX: From the article you linked, I understand that the UPDATE transaction cannot be done while a SELECT / read transaction is active in the same table, because the `SELECT issue.id FROM issue` part needs to scan the whole table for the join, but a resource in the table is locked by the UPDATE statement. I do not understand why the UPDATE statement is locked, but it probably is because there is already a READ lock on the table. > > With a quick glance at the code, it might happen like this: > > - transaction 1 is started, adding the comment and thus holding a Insert lock on the comment table > - transaction 2 is started and creating a shared lock on the issue table to query the issue > - transaction 1 tries to update issue in the same transaction, but fails because the issue table is locked > - it thus can never release the lock for the issue insert statement > - transaction 2 can never finish the shared lock, because it waits for the comment table to be released, which is still locked by transaction 1 > > 6/30/2023, 2:08:12 PM - Olivier: this `updateCommentInfos` is called from `CreateComment` in the same file, which might be run inside a transaction > 6/30/2023, 2:08:12 PM - dachary.org: <@olivier:pfad.fr "Maybe the transaction (1) contai..."> Well yes but the deadlock report is entirely useless if it does not mention the **two** locks that cause the deadlock. > 6/30/2023, 2:10:13 PM - dachary.org: There is no doubt in my mind that there are zillions of opportunities for deadlocks. And fixing them is a lot of work. I'm hoping that with a proper deadlock report that provides actual evidence of the **two** locks being held, we can maybe workaround one of the more frequent problem. But since the report contains incomplete information, we're out of luck. > 6/30/2023, 2:18:21 PM - dachary.org: Could it be a false positive? The detection claims there is a deadlock but is wrong. > 6/30/2023, 2:18:41 PM - Otto Richter | codeberg.org/fnetX: You mean the deadlock detection detects a deadlock which in fact isn't > 6/30/2023, 2:18:59 PM - dachary.org: Yes. That would explain the report that makes no sense. > 6/30/2023, 2:19:51 PM - Otto Richter | codeberg.org/fnetX: To be honest, I'd rather trust MariaDB engineers to have an understanding on databases than the Forgejo code, because everything database related is always claimed to be a mess, and no one dares fixing it. But we have already hit bugs in MariaDB at Codeberg's scale, too. > 6/30/2023, 2:20:01 PM - dachary.org: > When deadlock detection is enabled (the default) and a deadlock does occur, InnoDB detects the condition and rolls back one of the transactions (the victim). If deadlock detection is disabled using the innodb_deadlock_detect variable, InnoDB relies on the innodb_lock_wait_timeout setting to roll back transactions in case of a deadlock. > > https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html > 6/30/2023, 2:20:36 PM - dachary.org: It will still deadlock because of a timeout. > 6/30/2023, 2:23:06 PM - dachary.org: https://dev.mysql.com/blog-archive/innodb-data-locking-part-3-deadlocks/ > 6/30/2023, 2:23:42 PM - dachary.org: > ... we have almost no false positives ... > 6/30/2023, 2:24:23 PM - dachary.org: It is quite possible that **because of the Forgejo code** it pushes the deadlock detection over the edge. > 6/30/2023, 2:26:08 PM - dachary.org: **But** before jumping to conclusions I would try the example using the exact same version of MariaDB and verify the report is as good as expected. And research more to understand why a report can show only one lock instead of two. > 6/30/2023, 2:27:04 PM - Otto Richter | codeberg.org/fnetX: I have to quit here for some hours. We're using Debian 11 on that container. > 6/30/2023, 2:27:24 PM - Otto Richter | codeberg.org/fnetX: mariadb Ver 15.1 Distrib 10.5.19-MariaDB, for debian-linux-gnu (x86_64) using EditLine wrapper > 6/30/2023, 2:27:44 PM - Otto Richter | codeberg.org/fnetX: We can also try to upgrade and see if it helps :) Upgrades to Debian 12 are due anyway sooner or later ... >
Author
Owner

A new one appeared:

2023/07/17 00:11:02 ...ons/runner/runner.go:106:FetchTask() [E] [64b48716-124] pick task failed: CreateTaskForRunner: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

for

Jul 17 00:11:02 gitea-production gitea[684]: 2023/07/17 00:11:02 [64b48716-124] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.0.3.1:46848, 500 Internal Server Error in 54.0ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm)
A new one appeared: ~~~ 2023/07/17 00:11:02 ...ons/runner/runner.go:106:FetchTask() [E] [64b48716-124] pick task failed: CreateTaskForRunner: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ~~~ for ~~~ Jul 17 00:11:02 gitea-production gitea[684]: 2023/07/17 00:11:02 [64b48716-124] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.0.3.1:46848, 500 Internal Server Error in 54.0ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm) ~~~
Owner

A new one appeared:

2023/07/17 00:11:02 ...ons/runner/runner.go:106:FetchTask() [E] [64b48716-124] pick task failed: CreateTaskForRunner: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

for

Jul 17 00:11:02 gitea-production gitea[684]: 2023/07/17 00:11:02 [64b48716-124] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.0.3.1:46848, 500 Internal Server Error in 54.0ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm)

I took a look, that's probably because it uses some commit status database lookups, which are prone to deadlocks, as seen in the previous shared deadlock logs.

> A new one appeared: > > > ~~~ > 2023/07/17 00:11:02 ...ons/runner/runner.go:106:FetchTask() [E] [64b48716-124] pick task failed: CreateTaskForRunner: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction > ~~~ > > for > > ~~~ > Jul 17 00:11:02 gitea-production gitea[684]: 2023/07/17 00:11:02 [64b48716-124] router: completed POST /api/actions/runner.v1.RunnerService/FetchTask for 10.0.3.1:46848, 500 Internal Server Error in 54.0ms @ <autogenerated>:1(http.Handler.ServeHTTP-fm) > ~~~ I took a look, that's probably because it uses some commit status database lookups, which are prone to deadlocks, as seen in the previous shared deadlock logs.
Owner

Some activity around this topic, https://github.com/go-gitea/gitea/pull/26055

Some activity around this topic, https://github.com/go-gitea/gitea/pull/26055
Contributor

When was the the last Deadlock found when trying to get lock in the logs?

When was the the last `Deadlock found when trying to get lock` in the logs?
Author
Owner

About one minute ago.

Here are some that seem to occur often (picked at "random")

/data/git/log/gitea.log.2023-12-26.006.gz:2023/12/26 17:21:20 ...ules/context/repo.go:682:RepoAssignment() [E] SyncRepoBranches: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

very very often and apparently batched.

I also have

2023/12/25 21:18:35 .../indexer/stats/db.go:73:Index() [E] Unable to update language stats for ID d6cc804e15d5d8aea6af07ce9f09aecccb8359c9 for default branch main in /mnt/ceph-cluster/git/gitea-repositories/owner/repo.git. Error: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

and

2023/12/25 21:18:35 ...dexer/stats/queue.go:24:handler() [E] stats queue indexer.Index(168638) failed: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
About one minute ago. Here are some that seem to occur often (picked at "random") ~~~ /data/git/log/gitea.log.2023-12-26.006.gz:2023/12/26 17:21:20 ...ules/context/repo.go:682:RepoAssignment() [E] SyncRepoBranches: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ~~~ very very often and apparently batched. I also have ~~~ 2023/12/25 21:18:35 .../indexer/stats/db.go:73:Index() [E] Unable to update language stats for ID d6cc804e15d5d8aea6af07ce9f09aecccb8359c9 for default branch main in /mnt/ceph-cluster/git/gitea-repositories/owner/repo.git. Error: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ~~~ and ~~~ 2023/12/25 21:18:35 ...dexer/stats/queue.go:24:handler() [E] stats queue indexer.Index(168638) failed: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ~~~
Author
Owner

The last one was

2023/12/26 21:10:46 ...ons/runner/runner.go:154:FetchTask() [E] pick task failed: CreateTaskForRunner: update run 16008: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
The last one was ~~~ 2023/12/26 21:10:46 ...ons/runner/runner.go:154:FetchTask() [E] pick task failed: CreateTaskForRunner: update run 16008: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ~~~
Author
Owner

Not exactly resolved, but mostly worked around.

Not exactly resolved, but mostly worked around.
fnetX closed this issue 2024-03-23 19:00:00 +01:00
Sign in to join this conversation.
No labels
arch
riscv64
backport/v1.19
backport/v1.20
backport/v1.21/forgejo
backport/v10.0/forgejo
backport/v11.0/forgejo
backport/v12.0/forgejo
backport/v13.0/forgejo
backport/v14.0/forgejo
backport/v15.0/forgejo
backport/v7.0/forgejo
backport/v8.0/forgejo
backport/v9.0/forgejo
breaking
bug
bug
confirmed
bug
duplicate
bug
needs-more-info
bug
new-report
bug
reported-upstream
code/actions
code/api
code/auth
code/auth/faidp
code/auth/farp
code/email
code/federation
code/git
code/migrations
code/packages
code/wiki
database
MySQL
database
PostgreSQL
database
SQLite
dependency-upgrade
dependency
certmagic
dependency
chart.js
dependency
Chi
dependency
Chroma
dependency
citation.js
dependency
codespell
dependency
css-loader
dependency
devcontainers
dependency
dropzone
dependency
editorconfig-checker
dependency
elasticsearch
dependency
enmime
dependency
F3
dependency
ForgeFed
dependency
garage
dependency
Git
dependency
git-backporting
dependency
Gitea
dependency
gitignore
dependency
go-ap
dependency
go-enry
dependency
go-gitlab
dependency
Go-org
dependency
go-rpmutils
dependency
go-sql-driver mysql
dependency
go-swagger
dependency
go-version
dependency
go-webauthn
dependency
gocron
dependency
Golang
dependency
goldmark
dependency
goquery
dependency
Goth
dependency
grpc-go
dependency
happy-dom
dependency
Helm
dependency
image-spec
dependency
jsonschema
dependency
KaTeX
dependency
lint
dependency
MariaDB
dependency
Mermaid
dependency
minio-go
dependency
misspell
dependency
Monaco
dependency
PDFobject
dependency
playwright
dependency
postcss
dependency
postcss-plugins
dependency
pprof
dependency
prometheus client_golang
dependency
protobuf
dependency
relative-time-element
dependency
renovate
dependency
reply
dependency
ssh
dependency
swagger-ui
dependency
tailwind
dependency
temporal-polyfill
dependency
terminal-to-html
dependency
tests-only
dependency
text-expander-element
dependency
urfave
dependency
vfsgen
dependency
vite
dependency
Woodpecker CI
dependency
x tools
dependency
XORM
Discussion
duplicate
enhancement/feature
forgejo/accessibility
forgejo/branding
forgejo/ci
forgejo/commit-graph
forgejo/documentation
forgejo/furnace cleanup
forgejo/i18n
forgejo/interop
forgejo/moderation
forgejo/privacy
forgejo/release
forgejo/scaling
forgejo/security
forgejo/ui
Gain
High
Gain
Nice to have
Gain
Undefined
Gain
Very High
good first issue
i18n/backport-stable
impact
large
impact
medium
impact
small
impact
unknown
Incompatible license
issue
closed
issue
do-not-exist-yet
issue
open
manual test
Manually tested during feature freeze
OS
FreeBSD
OS
Linux
OS
macOS
OS
Windows
problem
QA
regression
release blocker
Release Cycle
Feature Freeze
release-blocker
v7.0
release-blocker
v7.0.1
release-blocker
v7.0.2
release-blocker
v7.0.3
release-blocker
v7.0.4
release-blocker
v8.0.0
release-blocker/v9.0.0
run-all-playwright-tests
run-end-to-end-tests
test
manual
test
needed
test
needs-help
test
not-needed
test
present
untested
User research - time-tracker
valuable code
worth a release-note
User research - Accessibility
User research - Blocked
User research - Community
User research - Config (instance)
User research - Errors
User research - Filters
User research - Future backlog
User research - Git workflow
User research - Labels
User research - Moderation
User research - Needs input
User research - Notifications/Dashboard
User research - Rendering
User research - Repo creation
User research - Repo units
User research - Security
User research - Settings (in-app)
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/forgejo#220
No description provided.