Describe the bug
With search timeout set to 200ms user can end up seeing a response below which seems inconsistent since the timeout itself was set at 200ms and there would possibly be no way the took time is below the time out and yet the query times out
{
"took": 60,
"timed_out": true,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
The major issue could be wrong timeouts being enforced either pre-mature or too late based on the estimated time intervals. For eg it might timeout at 0ms or 400ms for a 200ms timeout
This happens due to the elapsed time computation which uses an optimization for System#nanoTime that caches time by 200ms by default based on the setting thread_pool.estimated_time_interval
Some latency sensitive query might see discrepancy based on theses defaults.
We need to check what is a reasonable default for the estimated time interval based on JMH benchmarks. Since today it exists as a static value, with basically no documentation on what to expect out of a search timeout. We can choose to make it dynamic with reasonable defaults and let users choose this interval within appropriate limits
Describe the bug
With search timeout set to 200ms user can end up seeing a response below which seems inconsistent since the timeout itself was set at 200ms and there would possibly be no way the
tooktime is below the time out and yet the query times outThe major issue could be wrong timeouts being enforced either pre-mature or too late based on the estimated time intervals. For eg it might timeout at 0ms or 400ms for a 200ms timeout
This happens due to the elapsed time computation which uses an optimization for System#nanoTime that caches time by 200ms by default based on the setting
thread_pool.estimated_time_intervalSome latency sensitive query might see discrepancy based on theses defaults.
We need to check what is a reasonable default for the estimated time interval based on JMH benchmarks. Since today it exists as a static value, with basically no documentation on what to expect out of a search timeout. We can choose to make it dynamic with reasonable defaults and let users choose this interval within appropriate limits
OpenSearch/server/src/main/java/org/opensearch/threadpool/ThreadPool.java
Line 632 in 996d33a