Skip to content

Conversation

@chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Dec 2, 2025

Why?

What does this PR do?

  1. Type-based encoding detection: Added compile-time helpers to correctly distinguish signed (varint) vs unsigned (fixed) integers:
    • field_is_fixed_primitive() - bool, int8, uint8, int16, uint16, uint32, uint64, float, double
    • field_is_varint_primitive() - int32_t, int, int64_t, long long (zigzag varint)
  2. Optimized fixed field reading:
    - Compute field offsets at compile time with compute_fixed_field_offset<T, I>()
    - Read all fixed fields at absolute offsets without per-field reader_index updates
    - Single reader_index update after all fixed fields
  3. Optimized varint field reading:
    - Track offset locally during batch reading
    - Removed overly conservative max-varint-bytes pre-check (varints are variable-length)
    - Single reader_index update after all varints
  4. Three-phase deserialization:
    - Phase 1: Batch read leading fixed-size primitives
    - Phase 2: Batch read consecutive varint primitives
    - Phase 3: Read remaining fields normally

Related issues

#2958
#2906

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

Datatype Operation Fory (ns) Protobuf (ns) Faster
Sample Serialize 103.9 59.2 Protobuf (1.8x)
Sample Deserialize 329.3 478.1 Fory (1.5x)
Struct Serialize 10.3 20.1 Fory (1.9x)
Struct Deserialize 19.1 16.0 Protobuf (1.2x)

@chaokunyang chaokunyang merged commit f384e4f into apache:main Dec 2, 2025
56 checks passed
@chaokunyang chaokunyang mentioned this pull request Dec 2, 2025
17 tasks
@chaokunyang
Copy link
Collaborator Author

The benchmark is unfair, fory is actually same performance for Sample serialization, but in current benchmark, fory didn't wirte to a preallocsated buffer, it allocate a buffer every time.

This is fixed in #2963

pandalee99 added a commit that referenced this pull request Dec 2, 2025
## Why?

The previouse benchmark is not fair:
- Protobuf encode negative varint use 5 bytes, but fory may only use one
bytes. And for small varint, fory has zigzag cost. this is not a fair
compare
- When serialize Sample, Fory allocate a vector every time, but protobuf
serialize to a buffer instead.

## What does this PR do?

- Make NumericStruct contains int32 of all kinds size, and positive and
negative
- Make fory serialize to a buffer to for sample

With this fair compare, fory is similair performance as protobuf

## Related issues

#2958 
#2960

## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fory/issues/new/choose) describing the
need to do so and update the document if necessary.

Delete section if not applicable.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?

## Benchmark

| Datatype | Operation | Fory (ns) | Protobuf (ns) | Faster |
|----------|-----------|-----------|---------------|--------|
| Sample | Serialize | 345.6 | 316.4 | Protobuf (1.1x) |
| Sample | Deserialize | 1376.4 | 1374.6 | Protobuf (1.0x) |
| Struct | Serialize | 129.4 | 157.0 | Fory (1.2x) |
| Struct | Deserialize | 207.5 | 154.4 | Protobuf (1.3x) |

---------

Co-authored-by: Pan Li <1162953505@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants