Fix race condition in StringCache#672
Conversation
|
Thanks! |
|
@KirillOsenkov Thanks for reviewing and merging this: that was fast! |
|
yes will try to get to it today |
|
published 2.1.814, thanks again! |
At least until we figure out the root cause of #673
|
When you were seeing the race conditions initially, did the build you were using contain the lock statements I added in d1c4eff? |
|
If possible, if you could debug that race condition under the VS debugger, and when the exception happens, print the call stacks of threads accessing the dictionary. The Instances property is only used during search, which shouldn't happen during binlog loading time. |
|
I found what was wrong here: It's subtle and not easy to see, so not blaming you at all. We relied on the fact that Dictionary<K,V>.Keys never changes, but for ConcurrentDictionary<K,V>.Keys returns a different read-only collection every time you call it. Makes sense in hindsight. Let's confirm that using a simple Dictionary with locks fixes your race conditions. I made that commit on March 26: whereas the previous release was on March 13: So I think what happened is that you saw the race conditions without the lock that I added on March 26. I will be reluctant to accept another change to ConcurrentDictionary, because now that I think about it, there might be performance implications for both binlog loading time as well as search. We'd need to carefully measure the binlog loading perf for large binlogs for both approaches, as well as search for strings in a large binlog, and see which approach is faster. |
The stack trace reported here (full version of the stack trace at the end of this message) was generated by an execution in our CI infrastructure, using version 2.1.630, which indeed didn't include d1c4eff. We have also reproduced the issue with 2.1.790 (latest released version before 2.1.814), on a binlog file for a different project (nodatime). That version didn't include the lock either. However, I have reproduce a similar race condition locally, on top of 0ab53ca (which does include d1c4eff, with the following test: [Fact]
public void Intern()
{
var cache = new StringCache() { DisableDeduplication = false };
var internerThread = new System.Threading.Thread(() =>
{
for (var i = 0L; i < long.MaxValue; i++)
{
cache.Intern(i.ToString());
//System.Threading.Thread.Sleep(10);
}
});
var instanceAccessThread = new System.Threading.Thread(() =>
{
for (var i = 0L; i < long.MaxValue; i++)
{
System.Threading.Tasks.Parallel.ForEach(cache.Instances, s => _ = s.IndexOf("0"));
//System.Threading.Thread.Sleep(10);
}
});
internerThread.Start();
instanceAccessThread.Start();
internerThread.Join();
instanceAccessThread.Join();
}Running this test in debug from Visual Studio produces This test code, however, is much simpler and different from the So I am going to update our CI to use 2.1.815 and report a new issue in case we can still see a race condition happening. The line in our code, generating the var root = BinaryLog.ReadBuild(filePath);Here is the full stack trace: |
Indeed. I would say that even a simple lock of the That may be an indication of more multithreading issues: we will collect more data and report if we keep observing the behavior and we gain insight on the possible cause of the issue. |
|
Yes, the change to make binlog loading parallel is recent, and I didn't review it very thoroughly. It does seem like it needs to be ironed out. Agreed that your unit-test reproduces the issue. However in real life the Instances collection is only accessed from two places: from the search function and the find in files function. Both happen after the binlog has loaded. We are not accessing the Instances collection (which gets the Keys) during the binlog loading. I'd like to know more about the timeouts you're seeing. It may be that it's a separate bug, I'd be happy to help you investigate those. |
Context
Microsoft.Build.Logging.StructuredLogger.StringCache.Internuses a non-concurrentDictionaryinstance,deduplicationMap, to store strings.Accesses to
deduplicationMapis currently done inlockstatements, in theInternandContainsmethods.However,
StringCache.Instancesstores and publicly exposes theKeysproperty of the dictionary.That occasionally results in race conditions, such as the following one:
Fix
This PR changes
deduplicationMapinto aConcurrentDictionary, usingGetOrAddinstead of locks on the object.A similar fix was done for a different dictionary in #658.