Skip to content

Conversation

@chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Dec 3, 2025

Why?

Type dispatch is a hot path in serialization/deserialization. The current implementation uses absl::flat_hash_map for type lookups by compile-time type ID (uint64_t) and runtime type ID (uint32_t). While absl::flat_hash_map is a high-quality general-purpose hash map, we can achieve better performance with a specialized data structure optimized specifically for integer keys.

What does this PR do?

This PR introduces a specialized FlatIntMap data structure optimized for integer keys (uint32_t/uint64_t) to accelerate type dispatch lookups in the serialization path.

Key optimizations:

  • Fibonacci hashing: Uses the golden ratio multiplier (0x9E3779B97F4A7C15) for excellent key distribution with a single multiply operation
  • Power-of-2 sizing: Enables fast modulo via bitmasking instead of expensive division
  • Linear probing: Cache-friendly collision resolution
  • Low load factor (0.5): Trades memory for faster lookups - ideal for read-heavy workloads
  • FORY_ALWAYS_INLINE: Ensures critical lookup methods are inlined

Changes:

  • Added cpp/fory/util/flat_int_map.h with FlatIntMap<K, V> template and convenience aliases (U64PtrMap, U32PtrMap, etc.)
  • Added comprehensive tests in cpp/fory/util/flat_int_map_test.cc
  • Replaced absl::flat_hash_map<uint64_t, TypeInfo*> with U64PtrMap<TypeInfo> for:
    • type_info_by_ctid_ (compile-time type ID lookups)
    • partial_type_infos_
  • Replaced absl::flat_hash_map<uint32_t, TypeInfo*> with U32PtrMap<TypeInfo> for:
    • type_info_by_id_ (runtime type ID lookups)
  • Removed unused field_name_to_index() method from TypeResolver

Related issues

#2958

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
    • Removed TypeResolver::field_name_to_index<T>() method (appears unused)
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

Benchmarks run on Apple M2 Max (arm64), 12 cores, 48GB RAM.

After this PR

Datatype Operation Fory (ns) Protobuf (ns) Faster
Sample Serialize 283.8 317.9 Fory (1.1x)
Sample Deserialize 1328.5 1374.4 Fory (1.0x)
Struct Serialize 51.2 165.6 Fory (3.2x)
Struct Deserialize 125.8 171.8 Fory (1.4x)

Before this PR

Datatype Operation Fory (ns) Protobuf (ns) Faster
Sample Serialize 265.0 246.8 Protobuf (1.1x)
Sample Deserialize 1175.3 1038.2 Protobuf (1.1x)
Struct Serialize 51.5 141.9 Fory (2.8x)
Struct Deserialize 123.3 116.9 Protobuf (1.1x)

Analysis

The benchmark results show variance between runs (note Protobuf numbers also changed significantly between measurements), but the key observation is that the FlatIntMap provides a specialized, cache-friendly implementation for the type dispatch hot path. The Fibonacci hashing technique combined with power-of-2 sizing eliminates expensive hash computations and provides excellent key distribution.

The main benefits are:

  1. Reduced instruction count in the lookup path
  2. Better cache utilization due to linear probing
  3. Inlined lookup methods eliminate function call overhead

@chaokunyang chaokunyang changed the title perf(c++): optimize type dispatch performance by a fast open address sparse map perf(c++): optimize type dispatch performance by a fast flat-int map Dec 3, 2025
@chaokunyang chaokunyang merged commit 20183af into apache:main Dec 3, 2025
55 checks passed
@chaokunyang chaokunyang mentioned this pull request Dec 3, 2025
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants