Improve FilterPathBasedFilter performance#79826
Conversation
nik9000
left a comment
There was a problem hiding this comment.
@elasticmachine test this please
libs/x-content/src/main/java/org/elasticsearch/xcontent/XContent.java
Outdated
Show resolved
Hide resolved
| this.doubleWildcard = (segment != null) && (segment.length() == 2) && (segment.charAt(0) == '*') && (segment.charAt(1) == '*'); | ||
| public FilterPath(String field) { | ||
| this.termsChildren = new HashMap<>(); | ||
| this.wildcardChildren = new HashMap<>(); |
There was a problem hiding this comment.
I'm not sure I like that the FilterPath is mutable. I mean, it doesn't really matter in the grand scheme of things because folks won't mutate it after building it. But I wonder if we can avoid it just to help with my paranoia.
|
|
||
| public String getSegment() { | ||
| return segment; | ||
| public boolean matches(String name, List<FilterPath> nextFilters) { |
There was a problem hiding this comment.
Can we make this package private and add a comment? It's an interesting contract. It makes sense in the context we use it. I just think we should make sure as few things as possible can call it.
libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java
Show resolved
Hide resolved
| int end = filter.length(); | ||
| for (int i = 0; i < end;) { | ||
| char c = filter.charAt(i); | ||
| if (c == '.') { |
There was a problem hiding this comment.
I think I'd be more comfortable if we found the next offset in one pass and then processed it in another. I'm not sure how I feel about return from inside of a for loop, but here I feel like it's a little surprising.
| @Override | ||
| public String toString() { | ||
| return "FilterPath [filter=" + filter + ", segment=" + segment + "]"; | ||
| return Collections.singletonList(filterPath).toArray(new FilterPath[0]); |
There was a problem hiding this comment.
It's probably worth removing the array everywhere. But this is totally the kind of thing I'd do so I could test if my changes were having any effect on performance.
There was a problem hiding this comment.
In FilterPathBasedFilter's evaluate method, It may be match more than one FilterPath. so FilterPathBasedFilter Construction method need an array.
eg: test match these patten: test , te* ,tes* .
| public FilterPath getNext() { | ||
| return next; | ||
| private boolean isEnd() { | ||
| return termsChildren.isEmpty() && wildcardChildren.isEmpty(); |
There was a problem hiding this comment.
I'm curious about what happens when you have two compile a pattern like foo.bar,foo.bar.baz - in this case I think we would return all of foo.bar - the second pattern is basically redundant. So this test here may be correct, but I think it's correct for sneaky reasons worth a comment.
| private final FilterPath next; | ||
| private final boolean simpleWildcard; | ||
| private final Map<String, FilterPath> termsChildren; | ||
| private final Map<String, FilterPath> wildcardChildren; |
There was a problem hiding this comment.
It's good to have the Map for building the children but I think iterating the HashMap is actively bad for performance when running the match operation.
|
@nik9000 , thank you very much for your review, I will improve these comments. |
…formance * upstream/master: (153 commits) [ML] update truncation default & adding field output when input is truncated (elastic#79942) [ML] stop using isAllowedByLicense for model license checks (elastic#79908) [ML] Retain built-in ML roles granting Kibana privileges (elastic#80014) [Transform] remove old mixed cluster BWC layers, not required for 8x (elastic#79927) Increase test timeout for CoordinatorTests testAllSearchesExecuted [Transform] add rolling upgrade tests for upgrade endpoint (elastic#79721) [ML] Update trained model docs for truncate parameter for bert tokenization (elastic#79652) `CoordinatorTests` sometimes needs three term bumps (elastic#79574) [ML] Account for service being triggered twice in tests (elastic#80000) SearchContext: remove unused variable (elastic#79917) Revert "Deprecate resolution loss on date field (elastic#78921)" (elastic#79914) Re-enable GeoIpDownloaderIT#testStartWithNoDatabases() (elastic#79907) Fix SnapshotBasedIndexRecoveryIT#testSeqNoBasedRecoveryIsUsedAfterPrimaryFailOver (elastic#79469) Fix RecoverySourceHandlerTests (elastic#79546) SQL: stabilize SqlSearchPageTimeoutIT (elastic#79928) Wait 3 seconds for the server to reload trust (elastic#79778) Skip automatically preserved request headers when rewriting (elastic#79973) Check whether stdout is a real console (elastic#79882) Convert remote license checker to use LicensedFeature (elastic#79876) Miscellaneous fixes for LDAP SDK v6 upgrade (elastic#79891) ... # Conflicts: # libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java # libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/FilterPathGeneratorFilteringTests.java # libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/FilterPathTests.java
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
@weizijun you asked me for another round. I think the thing I'd most like to see is for |
|
@elasticmachine update branch |
there are many case. that can't cache FilterPath.compile result , It will call FilterPath.compile every time. I test that use a builder to build the FilterPath, but the performance isn't good. I make |
|
here is the result of current commit (without builder): |
|
here is the result with a FilterPath builder: |
|
@nik9000 I test the two case, when fields grow, the builder version is slower in NewParserConfig case. |
Could you push the builder version somewhere so I can have a look at it please? I hadn't thought it'd make a difference. |
ok, I revert the code before, and I can push it again |
nik9000
left a comment
There was a problem hiding this comment.
So! I think I still like the builder version better. I think it's easier to understand which is nice. I think at ingest and search time we have a much better chance of reusing the filters. And those are the times when the performance is most important.
Even so, I think the builder version gives us more of a chance to optimize later because we can make choices based on what we've collected.
| for (int i = 0; i < end;) { | ||
| char c = filter.charAt(i); | ||
| if (c == '.') { | ||
| String field = filter.substring(0, i).replaceAll("\\\\.", "."); |
There was a problem hiding this comment.
Losing the hasEscape flag that your non-builder version had seems to really slow down the builder version.
| assertThat(filterPath.getSegment(), is(emptyString())); | ||
| assertSame(filterPath, FilterPath.EMPTY); | ||
|
|
||
| input = "w\\.0\\.0\\.t"; |
There was a problem hiding this comment.
I think it'd be nice to still have a test for escapes.
libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java
Outdated
Show resolved
Hide resolved
|
@elasticmchine, test this please |
|
@elasticmachine, update branch |
|
@elasticmachine update branch |
|
@elasticmachine test this please |
|
run elasticsearch-ci/part-2 |
|
OK! Everything finally passed. But I'm about to step away. I'll merge this in the morning. |
FilterPathBasedFilter is used for filter XContent. Old FilterPath is implemented with a linked list. When it has many patterns. The filter performance is lower.
I implement a new FilterPath. It use a Tree to filter XContent. the FilterPath Tree has two kind of children.
I write a benchmark to test the performance.
the benchmark use the monitoring index data(cluster_stats, index_stats, node_stats).
It has a three type of filter:
Here is the benchmark result:
The benchmark run in old FilterPath branch:
When filter pattern grow, The new FilterPath is more then twice faster then old one.