Skip to content

Remote entity reservation v2#18266

Closed
ElliottjPierce wants to merge 63 commits intobevyengine:mainfrom
ElliottjPierce:remote-entity-reservation-v2
Closed

Remote entity reservation v2#18266
ElliottjPierce wants to merge 63 commits intobevyengine:mainfrom
ElliottjPierce:remote-entity-reservation-v2

Conversation

@ElliottjPierce
Copy link
Copy Markdown
Contributor

@ElliottjPierce ElliottjPierce commented Mar 11, 2025

fixes #18003

Objective

Long story short: we need to be able to reserve entities from outside &World.

This is an alternative to #18195.

Solution

Thanks to everyone who helped brainstorm on discord. There's still ideas being tossed around, but here's how this PR works:

All entity meta is kept in a single unlocked Vec<EntityMeta>. We keep a pending list that can be used remotely to reserve entities, etc.

When we allocate an entity directly, we pop from the new owned: Vec<Entity>, and if it's empty, we reserve some more entities in bulk. These entities are not flushed.

To ensure we don't loose entities anywhere, flushing rebalances pending and owned. We can tweak the proportions later, but currently, it splits the pending entities between the two evenly.

Practical Change

It used to be that a brand new Entities would yield sequential indices via alloc. This was never a guarantee, but many tests relied on this ordering. The ordering is still deterministic but is no longer sequential or aligned. In other words, if you allocate 5 entities. You might get back indices [0, 4, 2, 6, 7]. Generally entities like this are opaque and this should not affect users. However, tests and benchmarks that depend on this previous, not guaranteed behavior had to be reworked.

One practical effect this may have is that systems that run through Entitys sequentially, will have a different order. This may affect query orders, observer ordering, etc. None of these things were guarantees before, but this may expose some existing bugs in those systems.

I suspect this is why some rendering systems broke.

Future Work

Flushing is now completely unnecessary for many of the interactions with [Entities]. We still need to do it before applying command queues, etc, but we don't need to flush before allocations or freeing. If we go in this direction, we should explore relaxing those requirements for performance.

Testing

Functionality hasn't changed, so no new tests were added. A few old tests were revised because of the practical changes.

However, it makes sense to add tests for remote registration, so if anyone has specific concerns, I'd be happy to write a test for it.

@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible X-Needs-SME This type of work requires an SME to approve it. S-Needs-Review Needs reviewer attention (from anyone!) to move forward M-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide labels Mar 11, 2025
ElliottjPierce and others added 15 commits March 12, 2025 15:09
This makes AtomicPtr use correct in theory.
still has leak to prevent double free.
I don't want to admit how long I was trying to figure this out.

`get_mut` returns &mu *mut T, which coerces to *mut T **AT THE FIRST REF**
Co-authored-by: Brezak <bezak.adam@proton.me>
This was missing from a review suggestion.
I should have caught it there, but instead I'll fix it here.
Copy link
Copy Markdown
Contributor

@chescock chescock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here roughly

struct EntityReservations {
    pending: RwLock<Vec<Entity>>,
    next_pending_index: AtomicIdCursor,
}

except that you're combining the atomic for the lock with the atomic for next_pending_index so that instead of three atomic operations (lock, decrement, unlock), you only have to do two (lock-decrement, unlock-decrement)?

I think that could be made to work! It might be easier to reason about if you can encapsulate the lock-plus-counter in a generic lock type, and present a safe API that EntityReservations can use.

But there are still a lot of ways to introduce subtle race conditions, so I'd be wary of building all of this just to go from 3 atomic ops to 2.

.fetch_sub(num, Ordering::Relaxed);
Some(result)
} else {
// unsuccessful, so we need to undo the change `next_pending_index_intention` to keep parity.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to use a compare-exchange loop instead of making a change and trying to undo it. Something like

self.next_pending_index_intention
    .fetch_update(set_order, fetch_order, |index| (index >= num).then_some(index - num))

Otherwise you can get weird values while some operations are in progress. For example, suppose upper_pending_index is 10, and we start operations for num of 20, 1, and 5, and they interleave like this:

  • index is 10, op for 20 does fetch_sub(20)
  • index is -10, op for 1 does fetch_sub(1)
  • index is -11, op for 20 rolls back and does fetch_add(20)
  • index is 9, op for 5 does fetch_sub(5)
  • index is 4, op for 1 rolls back and does fetch_add(1)
  • index is 5

The only successful operation was for 5, but it takes the slice [4..9] instead of [5..10]. So the entity at index 10 is never taken, and the one at index 4 will be taken twice.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something. The first fetch_sub(20) isn't added back, it's subtracted from next_pending_index_truth, signifying a successful op. So that first one would get indices [0..10] exclusively. This leaves next_pending_index_intention at -10.

All the other ops just subtract num, see that it's negative, and add it back.
I don't think there's an issue here. Unless I'm not seeing some ordering problem??

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I thought the <= 0 check was on the new value, not the old value.

I think there is still a problem if the next_pending_index_intention goes from negative to positive in between the fetch_sub and fetch_add, though. It looks like that can only happen from pending_scope_mut. So, I think the sequence is

  • pending_scope_mut does a fetch_sub
  • op for 1 does fetch_sub(1) and fails
  • pending_scope_mut finishes and does a fetch_add
  • op for 5 does fetch_sub(5) and succeeds
  • op for 1 rolls back and does fetch_add(1)

And the op for 5 saw a value that was too small because it started before the op for 1 finished rolling back.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm going to walk though this to help me think through it. Let's start from both the intention and truth values being equal to some medium positive value x, lets say 4.

  • pending_scope_mut does a fetch_sub. Intention: x -u32::MAX, truth: x
  • reserve for 1 starts. Intention x - u32::MAX - 1, truth: x
  • pending_scope_mut finishes with exclusive access: Intention: x - 1 + y, truth: x + y (where y is the diff for resizes/rebalancing).
  • reserve for 5 starts. Intention: x + y - 1 - 5, truth x + y.
  • reserve for 1 fails (it was negative). Intention: x + y - 5, truth: x + y. (The reserve then returns a new index. One may have been available, but it was not safe to access.)
  • reserve for 5 succeeds. It uses the slice ending on x + y - 1. Intention: x + y - 5, truth: x + y - 5. Oh no! We skipped the pending entity at x + y!

Yikes! Good catch!

When we base the slice on the intention, and the intention fails and is added to, reservations that happen in the middle skip whatever slice the failed reservation would have used.

I'm going to think of a fix here, but I wanted to go ahead and reply.

let prev_pending_cap = pending.capacity();

if next_pending_index == next_pending_index_intention {
// SAFETY: we just told all the remote entities that the next index is negative,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This claims that things happen in a certain order, but all of the operations are Relaxed, so there are very few guarantees. For example, the compiler and CPU are free to reorder the accesses in pending_scope to be after the final next_pending_index_truth.fetch_sub. You need that fetch_sub to use Ordering::Release and the load here to use Ordering::Acquire in order to ensure that these accesses to the Vec happen after those.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I may not get the chance to fix this until Monday, but I do see the problem and I will fix it.

This is exactly the kind of thing I'll need help catching, so thanks!

self.pending_scope_mut(|pending, next_pending_index, in_use_elsewhere| {
// flush
let new_pending_len = (next_pending_index + 1).max(0) as usize;
for reused in pending.drain(new_pending_len..) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This touches pending before checking in_use_elsewhere. Won't this be UB because drain takes &mut access to the contents of the slice while another thread could have & access?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other threads can only access the contents at/before next_pending_index, so the drain doesn't affect them. That is what determines the "length" in terms of what is still pending, though it's not the length in terms of Vec::len. Hopefully that made sense :).

Drain doesn't cause reallocations or changes to capacity, so it satisfies safety. All it does is bring Vec::len back into agreement with next_pending_index. We have to return before the rebalancing since that can resize the vec or access contents before next_pending_index.

It's also worth mentioning that there's almost never a conflict here. A remote registration would have to be in progress when flush is called.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, the implementation of drain() never actually touches the rest of the slice, so that's not UB.

Making an &mut Vec when we don't have exclusive access to the entire slice still makes me really nervous! I think it would be a little safer to create only a &[T] when the slice was partially locked. You could combine the pending and in_use_elsewhere parameters into a Result, or even pass two closures for the two cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making an &mut Vec when we don't have exclusive access to the entire slice still makes me really nervous!

Agreed. I'm making better safety comments to help combat this.

I think it would be a little safer to create only a &[T] when the slice was partially locked. You could combine the pending and in_use_elsewhere parameters into a Result, or even pass two closures for the two cases.

I tried this originally. I agree this would be better in an ideal world. But even if something is accessing the pending part of the vec, we still need to be able to change the vec's length to signify that the flush happened. I don't see a good solution. Instead, I'm throwing safety comments everywhere. It's very unsafe code, but it's also only a tiny function to check the safety for.

Entity::from_raw(
u32::try_from(self.meta.len() as IdCursor - n).expect("too many entities"),
)
self.pending_scope(1, |reserved| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, then when a flush is running, pending_scope_mut ensures that calls to pending_scope return None. And when that happens, we fall back to allocating new entity indexes. Is that right?

Won't that mean that meta_len continues to grow as the app runs, even if the total number of live entities doesn't? Even if there are always enough recycled Entity values to use, we'll occasionally lose the race for the lock and allocate more memory in entity_meta.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly right. Keep in mind that this only happens if someone is remotely reserving an entity at precisely the same time as flush_pending. And even that reserved entity can be reused. We have 4 billion entities. This might eat four or five of those, maybe.

If you can think of a better solution, I'm all for it, but I think this is a reasonable price to pay IMO.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that even a small memory leak will make Bevy unusable for really long-running applications.

I suppose it's mitigated here by the fact that flush is only called with &mut World access, so if you're running in a regular system then commands.spawn() is guaranteed to be able to acquire a read lock. But if we start using these for assets, then an application that cycles through different assets will leak memory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this isn't a memory leak. The entities that are reserved in this (very rare) case are not lost, and they are recycled as well.

This only ever happens when you have something like allocating from async during an &mut World flush. very rare. Lets say this happens 10_000 times, which is completely absurd. That's wasted memory of a grand total of 200_000 bytes, not even a megabyte.

And it's only wasted if an the app doesn't end up using them. Having extra pending entities can help balance pending vs owned. At the very least, these extra entities would speed up alloc. But again, there would be very few of these.

I really think this is a non issue. But maybe I'm missing something. Let me know if you think of a situation that could cause a problem here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this isn't a memory leak. The entities that are reserved in this (very rare) case are not lost, and they are recycled as well.

I was being a little loose with my terminology, and referring to the fact that meta: Vec<EntityMeta> and pending will grow indefinitely. So it's not a memory leak in the sense that any individual EntityMeta will eventually get re-used, but it is a memory leak in the sense that the application will consume more memory over time even if the number of live entities stays constant.

Anyway, if you're planning to switch to an actual RwLock and we think contention is rare, then we can just use blocking read() and write() locks and the problem will go away!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I see. I still think this is such a small leak that it's a non issue, but you're right, the RwLock will fix it anyway, so let's not worry about it.

Just to keep you in the loop, I've had another idea for how to do this. Should be even faster, but we'll see. I'm going to experiment with that and then come back to the RwLock if it doesn't work. I've been inspired by some of your ideas and the left_right crate. I think I can cook up something that is fully lock free with minimal atomics. But I'll know more soon.

this was the breaking change I forgot about
@ElliottjPierce
Copy link
Copy Markdown
Contributor Author

Is the idea here roughly

struct EntityReservations {
    pending: RwLock<Vec<Entity>>,
    next_pending_index: AtomicIdCursor,
}

except that you're combining the atomic for the lock with the atomic for next_pending_index so that instead of three atomic operations (lock, decrement, unlock), you only have to do two (lock-decrement, unlock-decrement)?

That's pretty much exactly the idea, yes. There's some extra data in there like meta_len, but that's the core concept.

I think that could be made to work! It might be easier to reason about if you can encapsulate the lock-plus-counter in a generic lock type, and present a safe API that EntityReservations can use.

I could do that. I don't think we would reuse it elsewhere, but it may make it easier to understand. Then again, its also very connected to the rest of EntityReservations, so it might become limiting. I think I'll come back to this after some more review.

But there are still a lot of ways to introduce subtle race conditions, so I'd be wary of building all of this just to go from 3 atomic ops to 2.

This is hard, and it needs lots of review, but we don't just save one atomic op for this. Here's the inner RwLock's read functions:

    #[inline]
    pub fn try_read(&self) -> bool {
        self.state.fetch_update(Acquire, Relaxed, read_lock).is_ok()
    }

    #[inline]
    pub fn read(&self) {
        if !self.try_read() {
            self.lock_contended(false)
        }
    }

The function read_lock is some bit magic.

Anyway, fetch_update is implemented with another load and store, 2 atomic ops. Unlocking also uses 2. See here for the LLVM this lowers to. Note that fetch_update isn't available.

So this trick actually goes from 5 atomic ops with some additional bit work, to 2 atomic ops that just add/subtract (which does have direct hardware support), so this actually saves quite a lot. We are leveraging the fact that these ops are commutative and the ordering doesn't matter to get the seed up. The RwLock is only needed if ordering matters and the work is not commutative.

I know this takes careful review to get right, and I don't pretend to have caught all the race conditions here myself, but if we can sort out those details, I think this custom solution is worth the effort.

@ElliottjPierce
Copy link
Copy Markdown
Contributor Author

@chescock has found a lot of nasty bugs. I have a solution for them, but it's effectively a RwLock. (It's not quite. It's I think 1 atomic op shorter, but at this point, I think paying that one op is better than trying to find all the little edge and corner cases here).

When I get the chance, I'll open a v3 of this with a RwLock based on this branch. We can experiment there and come back to this if anyone can think of something better. Or I might think of something else and v3 will be different. We'll see. For now, I think we pause review on this, and come back to it later.

@ElliottjPierce
Copy link
Copy Markdown
Contributor Author

I'm going to close this in favor of v4. At time of writing, the final choice is up in the air, but it's pretty clear that v4 is better than this in every way. Thanks for all the review on this!

@cart cart moved this from Respond (With Priority) to Responded in @cart's attention Mar 25, 2025
@cart cart removed this from @cart's attention Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible M-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide S-Needs-Review Needs reviewer attention (from anyone!) to move forward X-Needs-SME This type of work requires an SME to approve it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reserve entities from async

7 participants