Skip to content

Meta: SwissTable-based Hashing Improvements for ES|QL STATS #138799

@ChrisHegarty

Description

@ChrisHegarty

This meta issue tracks ongoing work to modernize and optimize the hash structures used inside the ES|QL STATS operator. The goal is to replace the legacy hash implementation with SwissTable-inspired structures that provide predictable performance, better memory locality, and improved correctness under high-cardinality workloads.

The first step in this effort is the introduction of LongSwissTable and BytesRefSwissTable, but there are several follow-up improvements, hardening steps, and integrations planned.

Goals

  • Improve STATS operator throughput and memory efficiency
  • Reduce probe lengths and worst-case behavior under high cardinality
  • Improve correctness and observability of hash-structure behavior
  • Provide a maintainable, well-tested, well-benchmarked hashing foundation for future ES|QL features

Work Breakdown

1. Integration into ES|QL STATS

  • Replace existing hash structure in STATS operator with LongSwissTable / BytesRefSwissTable
  • Validate memory accounting under circuit breakers
  • Validate that result ordering and grouping semantics match existing implementation
  • Add STATS-level tests covering edge cases (empty groups, large groups, mixed key types)

2. Benchmarking & Performance Profiling

  • Build a complete JMH benchmark suite for SwissTables vs. legacy hash
  • Measure performance across distributions: uniform, skewed , clustered, adversarial (collision-heavy)
  • Benchmark rehash costs and memory-growth patterns
  • Validate SIMD lane-size differences (16 vs. 32-lane control groups)
  • Publish baseline numbers for public reference

3. Correctness Hardening

  • Add exhaustive property-based tests for boundary mirroring
  • Add tests validating probe sequences across table wrap-around
  • Add collision-pattern fuzz tests (random + structured adversarial inputs)
  • Add rehash stability tests (IDs, iteration order, control reconstruction)

4. Implementation Improvements

  • Investigate vectorization of the key-equality phase for primitive types
  • Evaluate faster control-byte extraction (shift vs mask experiments)
  • Improve prefetching strategy for deep probe sequences
  • Explore packing control bytes and ids into a single contiguous slab
  • Explore reducing control-byte width for cache density

5. Future Enhancements

  • Explore integrating SwissTables into other high-cardinality paths (JOIN, GROUPING SETS, aggregations)
  • Evaluate further specializations: e.g. LongLongSwissTable, etc

Status

  • First implementation merged (pending)
  • Benchmark suite + STATS integration
  • Progressive hardening and optimization

relates: #137842

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions