Is your feature request related to a problem?
Currently, most query operations in workloads like big5 combine multiple query types. For example:
range-with-metrics - combines range query with metric aggregations
range_field_conjunction_small_range_small_term_query - combines range and term queries
While these combined queries represent realistic workloads, they make it difficult to isolate and measure performance improvements for specific query types.
When testing intra-segment search (RFC #20202), I observed metric aggregations significant improvement and range queries regressed due to BKD tree traversal overhead. The combined operations showed mixed/unclear results because the metric aggregation improvement partially masked the range query regression. Without standalone operations, it was challenging to:
- Identify which component improved vs regressed.
- Accurately measure the magnitude of each change.
- Make informed decisions about enabling/disabling features for specific query patterns.
What solution would you like?
Add separate JSON operation files organized by query/aggregation type:
operations/
├── default.json # existing combined operations
├── term-queries.json # standalone term queries
├── range-queries.json # standalone range queries
├── match-queries.json # standalone match/full-text queries
├── bool-queries.json # standalone boolean queries
├── metric-aggregations.json # standalone metric aggregations
├── bucket-aggregations.json # standalone bucket aggregations
└── ...
With this seperation:
- Nightly benchmark coverage: Run all standalone operation files in parallel to get comprehensive per-query-type performance tracking over time.
- Targeted development testing: Developers working on specific optimizations can target the relevant JSON file.
sample metric-aggregations.json
[
{
"name": "stats-agg-standalone",
"operation-type": "search",
"index": "{{index_name}}",
"body": {
"size": 0,
"track_total_hits": false,
"aggs": {
"metrics_stats": { "stats": { "field": "metrics.size" } }
}
}
},
{
"name": "sum-agg-standalone",
"operation-type": "search",
"index": "{{index_name}}",
"body": {
"size": 0,
"track_total_hits": false,
"aggs": {
"total": { "sum": { "field": "metrics.size" } }
}
}
},
{
"name": "cardinality-agg-standalone",
"operation-type": "search",
"index": "{{index_name}}",
"body": {
"size": 0,
"track_total_hits": false,
"aggs": {
"unique_count": { "cardinality": { "field": "process.name" } }
}
}
}
]
Do you have any additional context?
Please note this does not replace existing operations. The existing combined queries (like range-with-metrics) remain valuable for realistic end-to-end workload simulation and production representative benchmarking.
Standalone operations complement these by enabling isolated performance analysis.
Is your feature request related to a problem?
Currently, most query operations in workloads like big5 combine multiple query types. For example:
range-with-metrics- combines range query with metric aggregationsrange_field_conjunction_small_range_small_term_query- combines range and term queriesWhile these combined queries represent realistic workloads, they make it difficult to isolate and measure performance improvements for specific query types.
When testing intra-segment search (RFC #20202), I observed metric aggregations significant improvement and range queries regressed due to BKD tree traversal overhead. The combined operations showed mixed/unclear results because the metric aggregation improvement partially masked the range query regression. Without standalone operations, it was challenging to:
What solution would you like?
Add separate JSON operation files organized by query/aggregation type:
With this seperation:
sample
metric-aggregations.jsonDo you have any additional context?
Please note this does not replace existing operations. The existing combined queries (like range-with-metrics) remain valuable for realistic end-to-end workload simulation and production representative benchmarking.
Standalone operations complement these by enabling isolated performance analysis.