Skip to content

fix(napi): use Mutex instead of Atomic in ThreadsafeFunction#1234

Merged
Brooooooklyn merged 2 commits intomainfrom
mutex-in-tsfn
Jul 10, 2022
Merged

fix(napi): use Mutex instead of Atomic in ThreadsafeFunction#1234
Brooooooklyn merged 2 commits intomainfrom
mutex-in-tsfn

Conversation

@Brooooooklyn
Copy link
Member

@Brooooooklyn Brooooooklyn merged commit 2f59c6a into main Jul 10, 2022
@Brooooooklyn Brooooooklyn deleted the mutex-in-tsfn branch July 10, 2022 02:34
));
}
drop(is_aborted);
self.ref_count.fetch_add(1, Ordering::AcqRel);
Copy link

@SevInf SevInf Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand ref counter usage in refer/unref. Original napi calls are not referenced counted and idempotent. And they don't control memory allocation, release/acquire do. So, while this fixes QueryEngine problem, doesn't it also introduce leak in other direction? For example:

let mut tsfn: ThreadsafeFunction<String> =
    func.create_threadsafe_function(0,...)

  tsfn.refer(&env)?;

This refer call is perfectly safe when using underlying APIs, albeit a little bit useless, but now it will cause function to leak.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refer/unref API should not exist at all, it should be exposed as clone/drop in Rust world. 🤔 Maybe we should remove it in next major release.

This refer call is perfectly safe when using underlying APIs, albeit a little bit useless, but now it will cause function to leak.

Nothing leak after this PR, the Arc will be dropped in the fn drop in ThreadsafeFunction

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, i think you are mixing up ref/unref and acquire/release API.
refer/unref do not have anything to do with threadsafe function reference counting, despite what their name might suggest. They solely exist for letting or forbidding llibuv event loop completion without function cleanup. No matter how many times you call napi_unref_threadsafe_function it won't release the function and no matter how many times you call napi_ref_threadsafe_function it won't prevent function from being released. Moreover, no matter how many times you'd call napi_ref_threadsafe_function, single napi_unref_threadsafe_function call will undo everything.

So that's why adding reference counter in Rust wrapper does not make sense to me — this is not how N-API calls you are wrapping work.

Now let's walk through my example:

  1. New function is created, internal reference counter is at 1.
  2. We call refer. Original C-API does not do anything since function is already referenced. Rust wrapper on the other hand increments internal counter by 1, it is now at 2.
  3. We drop the function. Rust internal ref counter is at 2. We skip release call, decrement the counter and finish the drop. Now neither the function is released, nor we have any pointer to the function. Function and it's context is leaked.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the relevant piece from NodeApi docs I mentioned in my original report:

Neither does napi_unref_threadsafe_function mark the thread-safe functions as able to be destroyed nor does napi_ref_threadsafe_function prevent it from being destroyed.

@@ -211,17 +215,16 @@ impl<T: 'static, ES: ErrorStrategy::T> ThreadsafeFunction<T, ES> {
initial_thread_count,
Copy link

@SevInf SevInf Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might've been an issue already before, but I think there is no safe way to release a function, created with inital_thread_count > 1. The only place where release could happens is drop, and even there, release will be skipped if ref_count (which is initialized with initial_thread_count) is greater than 1.

Not sure how to fix it though. Maybe it makes sense to remove the ability to set initial_thread_count from public API altogether and always use 1 internally? You'll still need to clone ThreadsafeFunction to use it from multiple threads, so initial_thread_count > 1 does not make much sense in Rust world anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, initial_thread_count is not public, and it's always 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory and safety issues with ThreadsafeFunction

2 participants