-
Notifications
You must be signed in to change notification settings - Fork 358
feat(Rust): support context_pool to reduce context allocation && support se/de in multi-thread #2737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Btw, |
|
When in reset, will call functions named |
|
maybe I can set Fory::metastring_resolver to Arc<rwlock> |
Split into read and write, Then hold it on Read Write context |
use reset instead, it used in java/python |
|
@urlyy Nice work! Could you run fory-benchmarks and compare the performance with previous version? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
rust/fory-core/src/buffer.rs
Outdated
| impl Reader { | ||
| pub fn new(bf: &[u8]) -> Reader { | ||
| Reader { | ||
| bf: bf.to_vec(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this introduce a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but only when enter a deserialize will do such a clone.
|
@chaokunyang Handling these |
@urlyy You can create a new struct to manage read state: pub struct ReadState {
pub meta_resolver: MetaReaderResolver,
meta_string_resolver: MetaStringReaderResolver,
pub ref_reader: RefReader,
}And pooling this class: struct Fory {
...,
read_context_pool: Pool<ReadState>,
}For deserialize, just create ReadContext everytime from ReadState, and use lifetime for |
|
Here is the output after replace type of |
|
@chaokunyang cargo-criterion can export output in |
|
btw, https://github.com/apache/fory/actions/runs/18417030950/job/52483058475?pr=2737, this CI takes too much time, 17 minutes! I can't bare this. |
|
Using raw pointers is unsafe, but as long as we don’t continue holding references after deserialization and only use the reference within a single thread, it won’t cause any issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I will finish remaining refactor in later PR
## Why? Simplify the serialization interface for better ergonomics API ## What does this PR do? - Remove `Fory` from `Serializer` trait - Make `WriteContext` as only enviroment for serialization - Make `ReadContext` as only enviroment for deserialization - Use `OnceLock` for thread-safe, lazily initialization for `Pool` ## Related issues #2737 ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fory/issues/new/choose) describing the need to do so and update the document if necessary. Delete section if not applicable. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. Delete section if not applicable. -->
…ort se/de in multi-thread (apache#2737) ## What does this PR do? 1. Remove the lifetime annotations from `WriteContext` and `ReadContext` because managing their lifetimes was too troublesome. The **trade-off** is that, previously, `MetaWriterResolver` held references to the corresponding `TypeDef` obtained from `TypeResolver`, and both contexts also fetched references to `Harness` objects managed by `TypeResolver`. Now, all these references have been replaced with `clone()` to simplify lifetime management. This may introduce a non-negligible performance overhead, but it was the only practical solution I could implement.🥲 2. Remove `&fory` from the member variables of `Context`. Instead, `&fory` is now passed in as a function parameter rather than being retrieved via `context.get_fory()`. This change modified many API signatures. 3. Add two pools to `Fory`’s member variables to allow reuse of both types of contexts. Tests confirmed that the same context addresses were reused multiple times. However, since automatic recycling would require `Arc<Fory>`, only manual recycling has been implemented so far — this operation is handled internally(within `serialize()/deserialize()`), and users don’t need to recycle contexts manually. Also, for the `de/serialize_with_context(context)` functions, if users call them directly, the user's manually managed contexts will **not** be returned to the pool. 4. Add `reset()` methods to be executed on `Context objects` before recycling. 5. Modified `TypeResolver::sorted_field_names_map` from `RefCell<...>` to `RwLock<...>`, and changed `MetaReaderResolver::reading_type_defs` from `Vec<Rc<TypeMeta>>` to `Vec<Arc<TypeMeta>>`. Split `Fory::MetaStringResolver` to `ReadContext::MetaStringResolver` and `WriteContext::MetaStringResolver`. In addition, `RefReader` was unsafely marked as `Send` and `Sync`. These changes allow `Fory` to support serialization across multiple threads. A concrete example can be found in `rust/tests/tests/test_multi_thread.rs`. However, I’m not sure whether using `Lock` and `Arc` will impact single-thread performance. But it may be troublesome to write another `ThreadSafeFory`. 6. The reason why I `unsafe impl Send/Sync for RefReader {}`, is that , `RefReader` contains `Box<dyn Any>`, which by default is not `Send`/`Sync`. But In our usage, each ref_reader is only accessed within a context, by `one` thread at a time, so it is safe to implement `Send`/`Sync` manually using `unsafe`. ## Related issues - close apache#2717 ## Does this PR introduce any user-facing change? yes, like implementing in EXT, need to pass a extra parameter `fory`.
## Why? Simplify the serialization interface for better ergonomics API ## What does this PR do? - Remove `Fory` from `Serializer` trait - Make `WriteContext` as only enviroment for serialization - Make `ReadContext` as only enviroment for deserialization - Use `OnceLock` for thread-safe, lazily initialization for `Pool` ## Related issues apache#2737 ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fory/issues/new/choose) describing the need to do so and update the document if necessary. Delete section if not applicable. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. Delete section if not applicable. -->
What does this PR do?
Remove the lifetime annotations from
WriteContextandReadContextbecause managing their lifetimes was too troublesome. The trade-off is that, previously,MetaWriterResolverheld references to the correspondingTypeDefobtained fromTypeResolver, and both contexts also fetched references toHarnessobjects managed byTypeResolver. Now, all these references have been replaced withclone()to simplify lifetime management. This may introduce a non-negligible performance overhead, but it was the only practical solution I could implement.🥲Remove
&foryfrom the member variables ofContext. Instead,&foryis now passed in as a function parameter rather than being retrieved viacontext.get_fory(). This change modified many API signatures.Add two pools to
Fory’s member variables to allow reuse of both types of contexts. Tests confirmed that the same context addresses were reused multiple times. However, since automatic recycling would requireArc<Fory>, only manual recycling has been implemented so far — this operation is handled internally(withinserialize()/deserialize()), and users don’t need to recycle contexts manually. Also, for thede/serialize_with_context(context)functions, if users call them directly, the user's manually managed contexts will not be returned to the pool.Add
reset()methods to be executed onContext objectsbefore recycling.Modified
TypeResolver::sorted_field_names_mapfromRefCell<...>toRwLock<...>, and changedMetaReaderResolver::reading_type_defsfromVec<Rc<TypeMeta>>toVec<Arc<TypeMeta>>. SplitFory::MetaStringResolvertoReadContext::MetaStringResolverandWriteContext::MetaStringResolver. In addition,RefReaderwas unsafely marked asSendandSync. These changes allowForyto support serialization across multiple threads. A concrete example can be found inrust/tests/tests/test_multi_thread.rs. However, I’m not sure whether usingLockandArcwill impact single-thread performance. But it may be troublesome to write anotherThreadSafeFory.The reason why I
unsafe impl Send/Sync for RefReader {}, is that ,RefReadercontainsBox<dyn Any>, which by default is notSend/Sync. But In our usage, each ref_reader is only accessed within a context, byonethread at a time, so it is safe to implementSend/Syncmanually usingunsafe.Related issues
Does this PR introduce any user-facing change?
yes, like implementing in EXT, need to pass a extra parameter
fory.