Skip to content

Generate date ranges in a way that is more friendly to the cache.#20115

Closed
jpountz wants to merge 3 commits intoelastic:masterfrom
jpountz:enhancement/cacheable_date_ranges
Closed

Generate date ranges in a way that is more friendly to the cache.#20115
jpountz wants to merge 3 commits intoelastic:masterfrom
jpountz:enhancement/cacheable_date_ranges

Conversation

@jpountz
Copy link
Copy Markdown
Contributor

@jpountz jpountz commented Aug 23, 2016

This commit uses rounding in order to improve the cacheability of date ranges.
For instance, [now-1M TO now] would actually be parsed as
[now-1M TO now-1M/H] OR [now-1M/H TO now/H] OR [now/H TO now]. The clause in
the middle of the disjunction has rounded bounds, which makes it more likely to
be cached. The two outer clauses are unlikely to get cached, but since the match
less than one hour of data each, they should execute quickly anyway.

In the case that the range query is already rounded at a granularity greater
than or equal to one hour, the query is left as-is. So for instance, this query:
[now-2M/d TO now/d] would be executed as-is without being split in 3
components like the previous query.

Closes #20106

This commit uses rounding in order to improve the cacheability of date ranges.
For instance, `[now-1M TO now]` would actually be parsed as
`[now-1M TO now-1M/H] OR [now-1M/H TO now/H] OR [now/H TO now]`. The clause in
the middle of the disjunction has rounded bounds, which makes it more likely to
be cached. The two outer clauses are unlikely to get cached, but since the match
less than one hour of data each, they should execute quickly anyway.

In the case that the range query is already rounded at a granularity greater
than or equal to one hour, the query is left as-is. So for instance, this query:
`[now-2M/d TO now/d]` would be executed as-as without being split in 3
components like the previous query.

Closes elastic#20106
@jpountz jpountz added >enhancement :Search/Search Search-related issues that do not fall into other categories v5.0.0-beta1 labels Aug 23, 2016
@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Aug 26, 2016

We discussed about the fact that it might slow down random ranges in FixitFriday. The plan is to first make sure the slow down is contained, and if this is the case merge the change as-is and work on a fix if we later get evidence that there are actual use-cases that this change hurts.

@jpountz
Copy link
Copy Markdown
Contributor Author

jpountz commented Sep 2, 2016

I did some benchmarking on the nyc_taxis dataset and the results are interesting. If I run the below query, then response times go from ~300ms to ~180ms with this PR:

GET nyc_taxis/_search
{
  "profile": true,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "payment_type": "3"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "pickup_datetime": {
              "gte": "2015-05-11 16:33:52",
              "lt": "now"
            }
          }
        }
      ]
    }
  }
}

However if I search for 2 (cash) instead of 3 (no charge) as a payment type, then this change actually slows things down from ~1s to ~1.2s. The reason is that it is much more frequent to pay in cash than to have nothing to pay, so in the first case, we only need to check a minority of the documents that match the range and the bottleneck is the execution of the range query, so caching helps. However in the 2nd case we consume most documents from the range and the bottleneck becomes the execution of the implicit disjunction.

Since it's generally better to speed up slow queries, I'm leaning towards not merging this PR. I'll leave it open for some time for discussion if anybody wants to.

jpountz added a commit to jpountz/elasticsearch that referenced this pull request Sep 2, 2016
 - use auto-generated ids for indexing elastic#20211
 - use rounded dates in queries elastic#20115
@jpountz jpountz closed this Sep 19, 2016
@jpountz jpountz removed the v5.0.0 label Sep 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discuss >enhancement :Search/Search Search-related issues that do not fall into other categories

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants