Skip to content

[BUG] Faster search queries might see a timeout discrepancy in search response due to cached time #2000

@Bukhtawar

Description

@Bukhtawar

Describe the bug
With search timeout set to 200ms user can end up seeing a response below which seems inconsistent since the timeout itself was set at 200ms and there would possibly be no way the took time is below the time out and yet the query times out

{
    "took": 60,
    "timed_out": true,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

The major issue could be wrong timeouts being enforced either pre-mature or too late based on the estimated time intervals. For eg it might timeout at 0ms or 400ms for a 200ms timeout

This happens due to the elapsed time computation which uses an optimization for System#nanoTime that caches time by 200ms by default based on the setting thread_pool.estimated_time_interval
Some latency sensitive query might see discrepancy based on theses defaults.

We need to check what is a reasonable default for the estimated time interval based on JMH benchmarks. Since today it exists as a static value, with basically no documentation on what to expect out of a search timeout. We can choose to make it dynamic with reasonable defaults and let users choose this interval within appropriate limits

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions