Skip to content

[FEATURE] Add standalone query and aggregation operations for isolated performance benchmarking #726

@prudhvigodithi

Description

@prudhvigodithi

Is your feature request related to a problem?

Currently, most query operations in workloads like big5 combine multiple query types. For example:

  • range-with-metrics - combines range query with metric aggregations
  • range_field_conjunction_small_range_small_term_query - combines range and term queries

While these combined queries represent realistic workloads, they make it difficult to isolate and measure performance improvements for specific query types.

When testing intra-segment search (RFC #20202), I observed metric aggregations significant improvement and range queries regressed due to BKD tree traversal overhead. The combined operations showed mixed/unclear results because the metric aggregation improvement partially masked the range query regression. Without standalone operations, it was challenging to:

  • Identify which component improved vs regressed.
  • Accurately measure the magnitude of each change.
  • Make informed decisions about enabling/disabling features for specific query patterns.

What solution would you like?

Add separate JSON operation files organized by query/aggregation type:

operations/
├── default.json                    # existing combined operations
├── term-queries.json               # standalone term queries
├── range-queries.json              # standalone range queries
├── match-queries.json              # standalone match/full-text queries
├── bool-queries.json               # standalone boolean queries
├── metric-aggregations.json        # standalone metric aggregations
├── bucket-aggregations.json        # standalone bucket aggregations
└── ...

With this seperation:

  • Nightly benchmark coverage: Run all standalone operation files in parallel to get comprehensive per-query-type performance tracking over time.
  • Targeted development testing: Developers working on specific optimizations can target the relevant JSON file.

sample metric-aggregations.json

[
  {
    "name": "stats-agg-standalone",
    "operation-type": "search",
    "index": "{{index_name}}",
    "body": {
      "size": 0,
      "track_total_hits": false,
      "aggs": {
        "metrics_stats": { "stats": { "field": "metrics.size" } }
      }
    }
  },
  {
    "name": "sum-agg-standalone",
    "operation-type": "search",
    "index": "{{index_name}}",
    "body": {
      "size": 0,
      "track_total_hits": false,
      "aggs": {
        "total": { "sum": { "field": "metrics.size" } }
      }
    }
  },
  {
    "name": "cardinality-agg-standalone",
    "operation-type": "search",
    "index": "{{index_name}}",
    "body": {
      "size": 0,
      "track_total_hits": false,
      "aggs": {
        "unique_count": { "cardinality": { "field": "process.name" } }
      }
    }
  }
]

Do you have any additional context?

Please note this does not replace existing operations. The existing combined queries (like range-with-metrics) remain valuable for realistic end-to-end workload simulation and production representative benchmarking.

Standalone operations complement these by enabling isolated performance analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions