feat: TagValueIterator holds RLock for too long (#26369)#26372
Merged
devanbenz merged 1 commit intomaster-1.xfrom Aug 20, 2025
Merged
feat: TagValueIterator holds RLock for too long (#26369)#26372devanbenz merged 1 commit intomaster-1.xfrom
devanbenz merged 1 commit intomaster-1.xfrom
Conversation
I looks like we are holding a reader lock and defer'ing it, while at the same time acquiring that same reader lock within the call to `return tk.TagValueIterator()` According to Go's RWMutex documentation this is not a good idea: https://pkg.go.dev/sync#RWMutex > If any goroutine calls [RWMutex.Lock](https://pkg.go.dev/sync#RWMutex.Lock) while the lock is already held by one or more readers, concurrent calls to [RWMutex.RLock](https://pkg.go.dev/sync#RWMutex.RLock) will block until the writer has acquired (and released) the lock, to ensure that the lock eventually becomes available to the writer. Note that this prohibits recursive read-locking. I believe this is what is causing the deadlock we are seeing in a few different tickets. I had ran a reproducer where I was effectively reading and writing from a bucket that I had set the retention policy to 1 minute and the shard rollover to 30 seconds. I was able to reproduce the deadlock consistently this way. Adding some logging around the mutexes I am seeing the following during every reproduction: ``` TagValueIterator is RUnlock() TagValueIterator is about to RLock() TagValueIterator is RLock() TagValueIterator tk about to RLock() TagValueIterator tk is RLock() TagValueIterator tk about to RUnlock() TagValueIterator tk is RUnlock() TagValueIterator is RUnlock() TagValueIterator is about to RLock() TagValueIterator is RLock() TagValueIterator tk about to RLock() TagValueIterator tk is RLock() TagValueIterator tk about to RUnlock() TagValueIterator tk is RUnlock() TagValueIterator is RUnlock() TagValueIterator is about to RLock() TagValueIterator is RLock() Lock() is about to be called in AddSeriesList TagValueIterator tk about to RLock() ``` The `TagValueIterator tk` is found within `return tk.TagValueIterator()` https://github.com/influxdata/influxdb/blob/eebe655e4c596b8f78bae30e5729092b6732d1a7/tsdb/index/tsi1/log_file.go#L1388 The `AddSeriesList Lock()` is found here: https://github.com/influxdata/influxdb/blob/eebe655e4c596b8f78bae30e5729092b6732d1a7/tsdb/index/tsi1/log_file.go#L545 So it appears that we are attempt to lock inside both `AddSeriesList` and `TagValueIterator` with different Lock types on the same `RWMutex`. After modifying the code to the following I'm no longer seeing the problem. I have a feeling that the recursive call to `RLock()` is the underlying issue. (cherry picked from commit 2e842ff)
gwossum
approved these changes
May 8, 2025
Member
gwossum
left a comment
There was a problem hiding this comment.
Looks like the original.
Note that 1.x does not have the same lock issue that 2.x has, so this does not immediately fix any issues for 1.x. However, #20008 probably needs to be backported to master-1.x, which will make this PR necessary.
devanbenz
added a commit
that referenced
this pull request
Sep 26, 2025
devanbenz
added a commit
that referenced
this pull request
Sep 26, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I looks like we are holding a reader lock and defer'ing it, while at the same time acquiring that same reader lock within the call to
return tk.TagValueIterator()According to Go's RWMutex documentation this is not a good idea: https://pkg.go.dev/sync#RWMutex
I believe this is what is causing the deadlock we are seeing in a few different tickets. I had ran a reproducer where I was effectively reading and writing from a bucket that I had set the retention policy to 1 minute and the shard rollover to 30 seconds. I was able to reproduce the deadlock consistently this way. Adding some logging around the mutexes I am seeing the following during every reproduction:
The
TagValueIterator tkis found withinreturn tk.TagValueIterator()influxdb/tsdb/index/tsi1/log_file.go
Line 1388 in eebe655
The
AddSeriesList Lock()is found here:influxdb/tsdb/index/tsi1/log_file.go
Line 545 in eebe655
So it appears that we are attempt to lock inside both
AddSeriesListandTagValueIteratorwith different Lock types on the sameRWMutex.After modifying the code to the following I'm no longer seeing the problem. I have a feeling that the recursive call to
RLock()is the underlying issue.(cherry picked from commit 2e842ff)
Closes #
Describe your proposed changes here.