[Impeller] Lock access to descriptor pool map. #56113

jonahwilliams · 2024-10-24T23:58:30Z

Speculative fix for flutter/flutter#157565 which looks like the kind of error that might happen if we concurrently mutate this hashmap.

flutter-dashboard · 2024-10-24T23:58:32Z

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption, contact "@test-exemption-reviewer" in the #hackers channel in Discord (don't just cc them here, they won't see it!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. The test exemption team is a small volunteer group, so all reviewers should feel empowered to ask for tests, without delegating that responsibility entirely to the test exemption group.

jonahwilliams · 2024-10-24T23:58:47Z

I think this makes sense? Let me know if I'm wrong :)

jason-simmons

Would it be feasible to use thread-local storage similar to CommandPoolMap?

jonahwilliams · 2024-10-25T00:43:30Z

Yeah I can do that

jonahwilliams · 2024-10-25T05:16:14Z

kk we are TLS

jonahwilliams · 2024-10-25T15:33:14Z

hmm, This will still leak descriptor pools though

jonahwilliams · 2024-10-25T15:48:04Z

Well, unless we call flush correctly on all threads? 🤔

gaaclarke · 2024-10-25T17:50:40Z

impeller/renderer/backend/vulkan/context_vk.cc

+using DescriptorPoolMap =
+    std::unordered_map<uint64_t, std::shared_ptr<DescriptorPoolVK>>;
+
+static thread_local std::unique_ptr<DescriptorPoolMap> tls_descriptor_pool_map;


This has vulkan resources that aren't being cleaned up until the accessing thread is destroyed and there is no guarantee that that is happening before the vulkan context is destroyed. DisposeThreadLocalCachedResources will clear them out, but is that getting called for all threads that are calling CreateCommandBuffer()?

I think so?

Thread local storage is really tricky to get right and this has bitten us in the past when it's storing vulkan resources. It's actually a violation of the c++ style guide that states static variables should only be trivially destructed. Can we instead pass in the pool as an argument to creating a command buffer? Then objects on each thread can own a pool with proper scoping.

https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables

Can we instead pass in the pool as an argument to creating a command buffer?

That doesn't really solve the problem this code is trying to solve - we need to share the descriptor pool across many command buffers as they are expensive to create, but we don't really have any other place to store these besides TLS in the context.

I agree that its bad, I think a better design would be having a single thread local object that provided access to all thread specific APIs. But I'm not sure how to make that work at the abstraction layer we have, because it would be vulkan only AFAIK.

I think if you make the pool a required argument you can percolate it up the call stack until there is a location that is somewhat synonymous with being associated with a thread. There is a absl wrapper from objects that leaks them that can be used for static variables for this. I can't recall it off the top of my head. At a bare minimum we should do that to avoid weird crashes.

I can go back to the lock on the hashmap in the meantime?

I just looked over that version and seems reasonable to me.

I think if you make the pool a required argument you can percolate it up the call stack until there is a location that is somewhat synonymous with being associated with a thread. There is a absl wrapper from objects that leaks them that can be used for static variables for this. I can't recall it off the top of my head. At a bare minimum we should do that to avoid weird crashes.

The pool is a vulkan specifc class. I need to redesign parts of the hal to lets this sort of abstraction exist across platforms. Then you're right, I can just have the rasterizer own it and release it at the end of the frame instead of the context doing it magically.

The thread-local command pools handled this by keeping a global map of every command pool created on behalf of each ContextVK.

When a ContextVK is deleted, it will release the resources held by all of its per-thread pools. (See g_all_pools_map_mutex and CommandPoolRecyclerVK::DestroyThreadLocalPools)

This reverts commit a629f5c.

This reverts commit cc37761.

gaaclarke · 2024-10-25T18:46:55Z

impeller/renderer/backend/vulkan/context_vk.cc

 }

 void ContextVK::DisposeThreadLocalCachedResources() {
+  Lock lock(desc_pool_mutex_);


Do we need a lock in ~ContextVK, or do we have a guarentee that no one will be calling these from a separate thread while the contextvk is being deleted?

hmm. I'm not sure.

I think no, because the the destructor can only run once there are no more references to the context. So nothing else could be calling anything that references the desc pool? (unless done through a raw or weak ptr during destruction)

If a ContextVK instance is being deleted, then the current thread is releasing the last reference to that instance and no other thread could be holding any references.

So there is no need to acquire any lock when the cached_descriptor_pool_ member is destructed.

Thank you Jason

gaaclarke

lgtm!

jason-simmons · 2024-10-25T18:57:33Z

impeller/renderer/backend/vulkan/context_vk.cc

 }

 void ContextVK::DisposeThreadLocalCachedResources() {
+  Lock lock(desc_pool_mutex_);


Minimize the scope where this lock is held:

{ Lock lock(desc_pool_mutex_); cached_descriptor_pool_.erase(std::this_thread::get_id()); } command_pool_recycler_->Dispose();

auto-submit · 2024-10-25T19:40:06Z

auto label is removed for flutter/engine/56113, due to - The status or check suite Linux local_engine_builds has failed. Please fix the issues identified (or deflake) before re-applying this label.

auto-submit · 2024-10-25T20:14:19Z

auto label is removed for flutter/engine/56113, due to - The status or check suite Linux local_engine_builds has failed. Please fix the issues identified (or deflake) before re-applying this label.

…e#56113)

flutter/engine@43e4d9a...7c5c5fe 2024-10-25 jonahwilliams@google.com [Impeller] Lock access to descriptor pool map. (flutter/engine#56113) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC codefu@google.com,zra@google.com on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://issues.skia.org/issues/new?component=1389291&template=1850622 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md

Speculative fix for flutter#157565 which looks like the kind of error that might happen if we concurrently mutate this hashmap.

[Impeller] lock concurrent access to descriptor pool map.

9f348cc

github-actions bot added the e: impeller label Oct 24, 2024

jonahwilliams requested a review from jason-simmons October 24, 2024 23:58

jason-simmons approved these changes Oct 25, 2024

View reviewed changes

switch to using TLS

cc37761

jonahwilliams changed the title ~~[Impeller] lock concurrent access to descriptor pool map.~~ [Impeller] switch descriptor pool map to using TLS. Oct 25, 2024

++

a629f5c

jonahwilliams requested a review from jason-simmons October 25, 2024 06:01

jonahwilliams requested a review from gaaclarke October 25, 2024 17:32

gaaclarke reviewed Oct 25, 2024

View reviewed changes

jonahwilliams added 2 commits October 25, 2024 11:36

Revert "++"

7ef2850

This reverts commit a629f5c.

Revert "switch to using TLS"

97d9a85

This reverts commit cc37761.

jonahwilliams requested a review from gaaclarke October 25, 2024 18:42

jonahwilliams changed the title ~~[Impeller] switch descriptor pool map to using TLS.~~ [Impeller] Lock access to descriptor pool map. Oct 25, 2024

gaaclarke reviewed Oct 25, 2024

View reviewed changes

gaaclarke approved these changes Oct 25, 2024

View reviewed changes

jason-simmons reviewed Oct 25, 2024

View reviewed changes

++

ab81dbc

jonahwilliams added the autosubmit Merge PR when tree becomes green via auto submit App label Oct 25, 2024

auto-submit bot removed the autosubmit Merge PR when tree becomes green via auto submit App label Oct 25, 2024

jonahwilliams added the autosubmit Merge PR when tree becomes green via auto submit App label Oct 25, 2024

auto-submit bot removed the autosubmit Merge PR when tree becomes green via auto submit App label Oct 25, 2024

jonahwilliams merged commit 7c5c5fe into flutter:main Oct 25, 2024

jonahwilliams deleted the lock_mpa branch October 25, 2024 20:31

engine-flutter-autoroll mentioned this pull request Oct 25, 2024

Roll Flutter Engine from 43e4d9a30666 to 7c5c5fe5c84d (1 revision) flutter/flutter#157644

Merged

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 25, 2024

7c5c5fe [Impeller] Lock access to descriptor pool map. (flutter/engin…

4aff172

…e#56113)

jonahwilliams mentioned this pull request Oct 25, 2024

Mac impeller unittests have started flaking flutter/flutter#157552

Closed

[Impeller] Lock access to descriptor pool map. #56113

[Impeller] Lock access to descriptor pool map. #56113

Uh oh!

Conversation

jonahwilliams commented Oct 24, 2024

Uh oh!

flutter-dashboard bot commented Oct 24, 2024

Uh oh!

jonahwilliams commented Oct 24, 2024

Uh oh!

jason-simmons left a comment

Choose a reason for hiding this comment

Uh oh!

jonahwilliams commented Oct 25, 2024

Uh oh!

jonahwilliams commented Oct 25, 2024

Uh oh!

jonahwilliams commented Oct 25, 2024

Uh oh!

jonahwilliams commented Oct 25, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gaaclarke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

auto-submit bot commented Oct 25, 2024

Uh oh!

auto-submit bot commented Oct 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants