Skip to content

Conversation

@chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Nov 28, 2025

What does this PR do?

This PR significantly optimizes C++ serialization performance through multiple strategies:

Performance Optimizations

  1. Buffer Performance Improvements

    • Zero-copy wrapping for vector<uint8_t> to avoid unnecessary data copies
    • Optimized buffer growth and memory allocation strategies
    • Refined primitive write operations with separate handling for primitive fields
  2. Thread-Safe Fory Implementation

    • Added ThreadSafeFory class for concurrent usage
    • Optimized ensure_finalized checks to reduce overhead
  3. Compile-Time Type Indexing

    • Introduced compile-time type index (cpp/fory/meta/type_index.h) to eliminate runtime typeid overhead
    • Removes runtime type identification for non-polymorphic types, improving dispatch performance
  4. Deserialization Performance

    • Optimized deserialization path for faster object reconstruction
  5. Varint Encoding Optimization

    • Uses bulk writes (uint64_t) for varint encoding to reduce memory operations

Code Improvements

  • Simplified fory.h structure
  • Added const ref support to Result type
  • Improved serializer traits with new serializer_traits.h
  • Enhanced struct serializer with better field handling

Related issues

#2958

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

| Datatype | Operation | Fory (ns) | Protobuf (ns) | Faster |
| Sample | Deserialize | 402.8 | 453.2 | Fory (1.1x) |
| Struct | Serialize | 10.9 | 19.7 | Fory (1.8x) |

Sample Deserialize and Struct Serialize are faster than protobuf, others are still slow. I will optimize them in next PR by reduce type dispatch cost

Files Changed Summary

Category Files Key Changes
Serialization Core fory.h, struct_serializer.h, serializer_traits.h Major refactoring and optimization
Buffer buffer.h, buffer.cc Zero-copy support, overflow fixes
Type System type_resolver.h, type_index.h Compile-time type indexing
Result Type result.h, result_test.cc Const ref support, improved API
Benchmarks benchmark.cc, profile.sh, run.sh Enhanced profiling capabilities

@chaokunyang chaokunyang changed the title perf(c++): optimize cpp perf perf(c++): optimize cpp serialization performance Nov 28, 2025
Copy link
Contributor

@pandalee99 pandalee99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaokunyang chaokunyang force-pushed the optimize_cpp_perf branch 2 times, most recently from 0a38ae9 to e348131 Compare November 30, 2025 09:41
PutVarUint32 uses bulk writes (uint64_t for 5-byte varints) which can
write up to 8 bytes. The grow(5) was insufficient, causing heap
corruption on macOS/Windows where allocators are stricter.
When a Buffer wraps external data (own_data_=false) and needs to grow,
the Reserve function allocated new memory but didn't copy the existing
data, causing data loss. This fix adds memcpy before switching ownership.
@chaokunyang chaokunyang mentioned this pull request Nov 30, 2025
17 tasks
@chaokunyang chaokunyang merged commit 46329f4 into apache:main Nov 30, 2025
64 checks passed
chaokunyang added a commit that referenced this pull request Nov 30, 2025
…cost (#2951)

## Why?

Eliminate `shared_ptr<TypeInfo>` overhead from the hot serialization
code path. The atomic reference counting in `shared_ptr` adds
unnecessary overhead when TypeInfo objects are accessed frequently
during serialization/deserialization. By switching to raw pointers with
clear ownership semantics, we can improve performance on the critical
path.

## What does this PR do?

This PR refactors the C++ serialization library's TypeInfo ownership
model:

### TypeInfo Changes
- Changed internal fields (`type_meta`, `encoded_namespace`,
`encoded_type_name`) from `shared_ptr` to `unique_ptr`
- Added `TypeInfo::deep_clone()` method for creating deep copies
- Made `TypeInfo` non-copyable (deleted copy constructor/assignment)

### TypeResolver Storage Changes
- Changed from `map<K, shared_ptr<TypeInfo>>` to primary storage
pattern:
- Primary storage: `vector<unique_ptr<TypeInfo>>` (owns all TypeInfo
objects)
  - Lookup maps: Raw pointers (`TypeInfo*`) pointing to primary storage
- Lookup methods now return:
- `const TypeInfo*` for nullable lookups (`get_type_info_by_id`,
`get_type_info_by_name`)
- `Result<const TypeInfo&>` for error-handling lookups (`get_type_info`,
`get_struct_type_info<T>`)

### Context Ownership Changes
- `WriteContext` and `ReadContext` now hold `unique_ptr<TypeResolver>`
instead of `shared_ptr`
- Contexts are created lazily after type resolver finalization with
deep-cloned resolvers
- `Fory` class uses `std::optional` for contexts to support lazy
initialization

### Code Pattern Updates
- Updated all call sites that used `FORY_TRY` with `Result<const
TypeInfo&>` to use explicit error handling (since TypeInfo is now
non-copyable)
- Updated `read_struct_fields_compatible` signature from
`shared_ptr<TypeMeta>` to `const TypeMeta*`

## Related issues

#2906 
#2944 

## Does this PR introduce any user-facing change?

- [x] Does this PR introduce any public API change?
- Internal API only. TypeResolver lookup methods now return raw pointers
or references instead of shared_ptr. User-facing Fory API remains
unchanged.
- [ ] Does this PR introduce any binary protocol compatibility change?

## Benchmark

This change eliminates atomic reference counting overhead on the hot
serialization path:
- `shared_ptr` copy/destruction involves atomic increment/decrement
operations
- Raw pointer access is a simple dereference with no atomic operations
- Deep cloning happens once during context creation, not
per-serialization

Expected improvement: Reduced CPU overhead in tight serialization loops,
especially noticeable when serializing many small objects where TypeInfo
lookups are frequent relative to data size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants