-
Notifications
You must be signed in to change notification settings - Fork 358
perf(c++): optimize cpp serialization performance #2944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pandalee99
approved these changes
Nov 28, 2025
Contributor
pandalee99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
0a38ae9 to
e348131
Compare
PutVarUint32 uses bulk writes (uint64_t for 5-byte varints) which can write up to 8 bytes. The grow(5) was insufficient, causing heap corruption on macOS/Windows where allocators are stricter.
791f4b5 to
c024c26
Compare
When a Buffer wraps external data (own_data_=false) and needs to grow, the Reserve function allocated new memory but didn't copy the existing data, causing data loss. This fix adds memcpy before switching ownership.
2 tasks
chaokunyang
added a commit
that referenced
this pull request
Nov 30, 2025
…cost (#2951) ## Why? Eliminate `shared_ptr<TypeInfo>` overhead from the hot serialization code path. The atomic reference counting in `shared_ptr` adds unnecessary overhead when TypeInfo objects are accessed frequently during serialization/deserialization. By switching to raw pointers with clear ownership semantics, we can improve performance on the critical path. ## What does this PR do? This PR refactors the C++ serialization library's TypeInfo ownership model: ### TypeInfo Changes - Changed internal fields (`type_meta`, `encoded_namespace`, `encoded_type_name`) from `shared_ptr` to `unique_ptr` - Added `TypeInfo::deep_clone()` method for creating deep copies - Made `TypeInfo` non-copyable (deleted copy constructor/assignment) ### TypeResolver Storage Changes - Changed from `map<K, shared_ptr<TypeInfo>>` to primary storage pattern: - Primary storage: `vector<unique_ptr<TypeInfo>>` (owns all TypeInfo objects) - Lookup maps: Raw pointers (`TypeInfo*`) pointing to primary storage - Lookup methods now return: - `const TypeInfo*` for nullable lookups (`get_type_info_by_id`, `get_type_info_by_name`) - `Result<const TypeInfo&>` for error-handling lookups (`get_type_info`, `get_struct_type_info<T>`) ### Context Ownership Changes - `WriteContext` and `ReadContext` now hold `unique_ptr<TypeResolver>` instead of `shared_ptr` - Contexts are created lazily after type resolver finalization with deep-cloned resolvers - `Fory` class uses `std::optional` for contexts to support lazy initialization ### Code Pattern Updates - Updated all call sites that used `FORY_TRY` with `Result<const TypeInfo&>` to use explicit error handling (since TypeInfo is now non-copyable) - Updated `read_struct_fields_compatible` signature from `shared_ptr<TypeMeta>` to `const TypeMeta*` ## Related issues #2906 #2944 ## Does this PR introduce any user-facing change? - [x] Does this PR introduce any public API change? - Internal API only. TypeResolver lookup methods now return raw pointers or references instead of shared_ptr. User-facing Fory API remains unchanged. - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark This change eliminates atomic reference counting overhead on the hot serialization path: - `shared_ptr` copy/destruction involves atomic increment/decrement operations - Raw pointer access is a simple dereference with no atomic operations - Deep cloning happens once during context creation, not per-serialization Expected improvement: Reduced CPU overhead in tight serialization loops, especially noticeable when serializing many small objects where TypeInfo lookups are frequent relative to data size.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR significantly optimizes C++ serialization performance through multiple strategies:
Performance Optimizations
Buffer Performance Improvements
vector<uint8_t>to avoid unnecessary data copiesThread-Safe Fory Implementation
ThreadSafeForyclass for concurrent usageensure_finalizedchecks to reduce overheadCompile-Time Type Indexing
cpp/fory/meta/type_index.h) to eliminate runtimetypeidoverheadDeserialization Performance
Varint Encoding Optimization
Code Improvements
fory.hstructureconst refsupport toResulttypeserializer_traits.hRelated issues
#2958
Does this PR introduce any user-facing change?
Benchmark
| Datatype | Operation | Fory (ns) | Protobuf (ns) | Faster |
| Sample | Deserialize | 402.8 | 453.2 | Fory (1.1x) |
| Struct | Serialize | 10.9 | 19.7 | Fory (1.8x) |
Sample Deserialize and Struct Serialize are faster than protobuf, others are still slow. I will optimize them in next PR by reduce type dispatch cost
Files Changed Summary
fory.h,struct_serializer.h,serializer_traits.hbuffer.h,buffer.cctype_resolver.h,type_index.hresult.h,result_test.ccbenchmark.cc,profile.sh,run.sh