-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] Lock access to descriptor pool map. #56113
Conversation
|
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption, contact "@test-exemption-reviewer" in the #hackers channel in Discord (don't just cc them here, they won't see it!). If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. The test exemption team is a small volunteer group, so all reviewers should feel empowered to ask for tests, without delegating that responsibility entirely to the test exemption group. |
|
I think this makes sense? Let me know if I'm wrong :) |
jason-simmons
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be feasible to use thread-local storage similar to CommandPoolMap?
|
Yeah I can do that |
|
kk we are TLS |
|
hmm, This will still leak descriptor pools though |
|
Well, unless we call flush correctly on all threads? 🤔 |
| using DescriptorPoolMap = | ||
| std::unordered_map<uint64_t, std::shared_ptr<DescriptorPoolVK>>; | ||
|
|
||
| static thread_local std::unique_ptr<DescriptorPoolMap> tls_descriptor_pool_map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has vulkan resources that aren't being cleaned up until the accessing thread is destroyed and there is no guarantee that that is happening before the vulkan context is destroyed. DisposeThreadLocalCachedResources will clear them out, but is that getting called for all threads that are calling CreateCommandBuffer()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread local storage is really tricky to get right and this has bitten us in the past when it's storing vulkan resources. It's actually a violation of the c++ style guide that states static variables should only be trivially destructed. Can we instead pass in the pool as an argument to creating a command buffer? Then objects on each thread can own a pool with proper scoping.
https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead pass in the pool as an argument to creating a command buffer?
That doesn't really solve the problem this code is trying to solve - we need to share the descriptor pool across many command buffers as they are expensive to create, but we don't really have any other place to store these besides TLS in the context.
I agree that its bad, I think a better design would be having a single thread local object that provided access to all thread specific APIs. But I'm not sure how to make that work at the abstraction layer we have, because it would be vulkan only AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you make the pool a required argument you can percolate it up the call stack until there is a location that is somewhat synonymous with being associated with a thread. There is a absl wrapper from objects that leaks them that can be used for static variables for this. I can't recall it off the top of my head. At a bare minimum we should do that to avoid weird crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can go back to the lock on the hashmap in the meantime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked over that version and seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you make the pool a required argument you can percolate it up the call stack until there is a location that is somewhat synonymous with being associated with a thread. There is a absl wrapper from objects that leaks them that can be used for static variables for this. I can't recall it off the top of my head. At a bare minimum we should do that to avoid weird crashes.
The pool is a vulkan specifc class. I need to redesign parts of the hal to lets this sort of abstraction exist across platforms. Then you're right, I can just have the rasterizer own it and release it at the end of the frame instead of the context doing it magically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thread-local command pools handled this by keeping a global map of every command pool created on behalf of each ContextVK.
When a ContextVK is deleted, it will release the resources held by all of its per-thread pools. (See g_all_pools_map_mutex and CommandPoolRecyclerVK::DestroyThreadLocalPools)
| } | ||
|
|
||
| void ContextVK::DisposeThreadLocalCachedResources() { | ||
| Lock lock(desc_pool_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a lock in ~ContextVK, or do we have a guarentee that no one will be calling these from a separate thread while the contextvk is being deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no, because the the destructor can only run once there are no more references to the context. So nothing else could be calling anything that references the desc pool? (unless done through a raw or weak ptr during destruction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a ContextVK instance is being deleted, then the current thread is releasing the last reference to that instance and no other thread could be holding any references.
So there is no need to acquire any lock when the cached_descriptor_pool_ member is destructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Jason
gaaclarke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
| } | ||
|
|
||
| void ContextVK::DisposeThreadLocalCachedResources() { | ||
| Lock lock(desc_pool_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minimize the scope where this lock is held:
{
Lock lock(desc_pool_mutex_);
cached_descriptor_pool_.erase(std::this_thread::get_id());
}
command_pool_recycler_->Dispose();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
auto label is removed for flutter/engine/56113, due to - The status or check suite Linux local_engine_builds has failed. Please fix the issues identified (or deflake) before re-applying this label. |
|
auto label is removed for flutter/engine/56113, due to - The status or check suite Linux local_engine_builds has failed. Please fix the issues identified (or deflake) before re-applying this label. |
flutter/engine@43e4d9a...7c5c5fe 2024-10-25 jonahwilliams@google.com [Impeller] Lock access to descriptor pool map. (flutter/engine#56113) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC codefu@google.com,zra@google.com on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://issues.skia.org/issues/new?component=1389291&template=1850622 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
Speculative fix for flutter#157565 which looks like the kind of error that might happen if we concurrently mutate this hashmap.
Speculative fix for flutter/flutter#157565 which looks like the kind of error that might happen if we concurrently mutate this hashmap.