Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@jonahwilliams
Copy link
Contributor

Speculative fix for flutter/flutter#157565 which looks like the kind of error that might happen if we concurrently mutate this hashmap.

@flutter-dashboard
Copy link

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption, contact "@test-exemption-reviewer" in the #hackers channel in Discord (don't just cc them here, they won't see it!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. The test exemption team is a small volunteer group, so all reviewers should feel empowered to ask for tests, without delegating that responsibility entirely to the test exemption group.

@jonahwilliams
Copy link
Contributor Author

I think this makes sense? Let me know if I'm wrong :)

Copy link
Member

@jason-simmons jason-simmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be feasible to use thread-local storage similar to CommandPoolMap?

@jonahwilliams
Copy link
Contributor Author

Yeah I can do that

@jonahwilliams jonahwilliams changed the title [Impeller] lock concurrent access to descriptor pool map. [Impeller] switch descriptor pool map to using TLS. Oct 25, 2024
@jonahwilliams
Copy link
Contributor Author

kk we are TLS

@jonahwilliams
Copy link
Contributor Author

hmm, This will still leak descriptor pools though

@jonahwilliams
Copy link
Contributor Author

Well, unless we call flush correctly on all threads? 🤔

using DescriptorPoolMap =
std::unordered_map<uint64_t, std::shared_ptr<DescriptorPoolVK>>;

static thread_local std::unique_ptr<DescriptorPoolMap> tls_descriptor_pool_map;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has vulkan resources that aren't being cleaned up until the accessing thread is destroyed and there is no guarantee that that is happening before the vulkan context is destroyed. DisposeThreadLocalCachedResources will clear them out, but is that getting called for all threads that are calling CreateCommandBuffer()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread local storage is really tricky to get right and this has bitten us in the past when it's storing vulkan resources. It's actually a violation of the c++ style guide that states static variables should only be trivially destructed. Can we instead pass in the pool as an argument to creating a command buffer? Then objects on each thread can own a pool with proper scoping.

https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead pass in the pool as an argument to creating a command buffer?

That doesn't really solve the problem this code is trying to solve - we need to share the descriptor pool across many command buffers as they are expensive to create, but we don't really have any other place to store these besides TLS in the context.

I agree that its bad, I think a better design would be having a single thread local object that provided access to all thread specific APIs. But I'm not sure how to make that work at the abstraction layer we have, because it would be vulkan only AFAIK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you make the pool a required argument you can percolate it up the call stack until there is a location that is somewhat synonymous with being associated with a thread. There is a absl wrapper from objects that leaks them that can be used for static variables for this. I can't recall it off the top of my head. At a bare minimum we should do that to avoid weird crashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go back to the lock on the hashmap in the meantime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked over that version and seems reasonable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you make the pool a required argument you can percolate it up the call stack until there is a location that is somewhat synonymous with being associated with a thread. There is a absl wrapper from objects that leaks them that can be used for static variables for this. I can't recall it off the top of my head. At a bare minimum we should do that to avoid weird crashes.

The pool is a vulkan specifc class. I need to redesign parts of the hal to lets this sort of abstraction exist across platforms. Then you're right, I can just have the rasterizer own it and release it at the end of the frame instead of the context doing it magically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thread-local command pools handled this by keeping a global map of every command pool created on behalf of each ContextVK.

When a ContextVK is deleted, it will release the resources held by all of its per-thread pools. (See g_all_pools_map_mutex and CommandPoolRecyclerVK::DestroyThreadLocalPools)

jonahwilliams added 2 commits October 25, 2024 11:36
This reverts commit a629f5c.
This reverts commit cc37761.
@jonahwilliams jonahwilliams changed the title [Impeller] switch descriptor pool map to using TLS. [Impeller] Lock access to descriptor pool map. Oct 25, 2024
}

void ContextVK::DisposeThreadLocalCachedResources() {
Lock lock(desc_pool_mutex_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a lock in ~ContextVK, or do we have a guarentee that no one will be calling these from a separate thread while the contextvk is being deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no, because the the destructor can only run once there are no more references to the context. So nothing else could be calling anything that references the desc pool? (unless done through a raw or weak ptr during destruction)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a ContextVK instance is being deleted, then the current thread is releasing the last reference to that instance and no other thread could be holding any references.

So there is no need to acquire any lock when the cached_descriptor_pool_ member is destructed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Jason

Copy link
Member

@gaaclarke gaaclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

}

void ContextVK::DisposeThreadLocalCachedResources() {
Lock lock(desc_pool_mutex_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimize the scope where this lock is held:

{
  Lock lock(desc_pool_mutex_);

 
 cached_descriptor_pool_.erase(std::this_thread::get_id());
}
command_pool_recycler_->Dispose();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jonahwilliams jonahwilliams added the autosubmit Merge PR when tree becomes green via auto submit App label Oct 25, 2024
@auto-submit
Copy link
Contributor

auto-submit bot commented Oct 25, 2024

auto label is removed for flutter/engine/56113, due to - The status or check suite Linux local_engine_builds has failed. Please fix the issues identified (or deflake) before re-applying this label.

@auto-submit auto-submit bot removed the autosubmit Merge PR when tree becomes green via auto submit App label Oct 25, 2024
@jonahwilliams jonahwilliams added the autosubmit Merge PR when tree becomes green via auto submit App label Oct 25, 2024
@auto-submit auto-submit bot removed the autosubmit Merge PR when tree becomes green via auto submit App label Oct 25, 2024
@auto-submit
Copy link
Contributor

auto-submit bot commented Oct 25, 2024

auto label is removed for flutter/engine/56113, due to - The status or check suite Linux local_engine_builds has failed. Please fix the issues identified (or deflake) before re-applying this label.

@jonahwilliams jonahwilliams merged commit 7c5c5fe into flutter:main Oct 25, 2024
@jonahwilliams jonahwilliams deleted the lock_mpa branch October 25, 2024 20:31
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 25, 2024
jonahwilliams pushed a commit to flutter/flutter that referenced this pull request Oct 25, 2024
flutter/engine@43e4d9a...7c5c5fe

2024-10-25 jonahwilliams@google.com [Impeller] Lock access to descriptor
pool map. (flutter/engine#56113)

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC codefu@google.com,zra@google.com on the revert to ensure that
a human
is aware of the problem.

To file a bug in Flutter:
https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
nick9822 pushed a commit to nick9822/flutter that referenced this pull request Dec 18, 2024
Speculative fix for flutter#157565
which looks like the kind of error that might happen if we concurrently
mutate this hashmap.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants