Improve FilterPathBasedFilter performance by weizijun · Pull Request #79826 · elastic/elasticsearch

weizijun · 2021-10-26T14:41:28Z

FilterPathBasedFilter is used for filter XContent. Old FilterPath is implemented with a linked list. When it has many patterns. The filter performance is lower.

I implement a new FilterPath. It use a Tree to filter XContent. the FilterPath Tree has two kind of children.

One is a hashmap for term pattern. It can find the match field in O(1).
Another is a hashmap for wildcard pattern.It will check all node of hashmap.

I write a benchmark to test the performance.
the benchmark use the monitoring index data(cluster_stats, index_stats, node_stats).
It has a three type of filter:

filter 10 fields
filter half fields
filter all fields

Here is the benchmark result:

Benchmark                                                  (fieldCount)  (inclusive)         (type)  Mode  Cnt       Score        Error  Units
FilterContentBenchmark.filterWithNewParserConfig               10_field         true  cluster_stats  avgt    3  125630.012 ±  43233.062  ns/op
FilterContentBenchmark.filterWithNewParserConfig               10_field         true    index_stats  avgt    3   17199.506 ±  16205.591  ns/op
FilterContentBenchmark.filterWithNewParserConfig               10_field         true     node_stats  avgt    3   18706.487 ±   8269.021  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true  cluster_stats  avgt    3  323293.870 ± 160494.748  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true    index_stats  avgt    3   30375.369 ±  13070.669  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true     node_stats  avgt    3   36620.540 ±   7108.038  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true  cluster_stats  avgt    3  462424.507 ± 136969.111  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true    index_stats  avgt    3   44915.709 ±   9761.707  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true     node_stats  avgt    3   56310.895 ±  18217.109  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true  cluster_stats  avgt    3  177093.113 ± 160724.229  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true    index_stats  avgt    3   18092.285 ±   7952.427  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true     node_stats  avgt    3   21655.977 ±   9084.962  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true  cluster_stats  avgt    3  191182.067 ± 143725.583  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true    index_stats  avgt    3   25645.334 ±   5691.638  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true     node_stats  avgt    3   29105.775 ±  11546.573  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true  cluster_stats  avgt    3  108692.473 ±  13760.244  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true    index_stats  avgt    3   13008.599 ±   3790.006  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true     node_stats  avgt    3   16625.404 ±   7738.217  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true  cluster_stats  avgt    3  195801.756 ±  67181.793  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true    index_stats  avgt    3   19406.511 ±  11415.749  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true     node_stats  avgt    3   23041.626 ±   6924.014  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true  cluster_stats  avgt    3  231902.841 ±  39345.049  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true    index_stats  avgt    3   23504.174 ±  14444.700  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true     node_stats  avgt    3   26650.139 ±  10977.090  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true  cluster_stats  avgt    3  173536.358 ±   7948.666  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true    index_stats  avgt    3   17432.681 ±   2051.736  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true     node_stats  avgt    3   21549.923 ±   2796.963  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true  cluster_stats  avgt    3  181061.389 ±  97643.301  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true    index_stats  avgt    3   22329.902 ±  12923.636  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true     node_stats  avgt    3   25704.656 ±  14194.372  ns/op

The benchmark run in old FilterPath branch:

Benchmark                                                  (fieldCount)  (inclusive)         (type)  Mode  Cnt        Score         Error  Units
FilterContentBenchmark.filterWithNewParserConfig               10_field         true  cluster_stats  avgt    3   120303.585 ±   46813.299  ns/op
FilterContentBenchmark.filterWithNewParserConfig               10_field         true    index_stats  avgt    3    22163.350 ±   20925.677  ns/op
FilterContentBenchmark.filterWithNewParserConfig               10_field         true     node_stats  avgt    3    25838.040 ±    7408.845  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true  cluster_stats  avgt    3  1052058.673 ± 4225401.302  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true    index_stats  avgt    3    99443.623 ±   63425.197  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true     node_stats  avgt    3   119383.384 ±  283769.163  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true  cluster_stats  avgt    3  1850317.142 ± 9297280.263  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true    index_stats  avgt    3   135923.737 ±   60386.603  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true     node_stats  avgt    3   177303.659 ±   63938.643  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true  cluster_stats  avgt    3   242419.000 ± 1321488.743  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true    index_stats  avgt    3    19677.926 ±   34049.387  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true     node_stats  avgt    3    21932.954 ±   13319.564  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true  cluster_stats  avgt    3   217022.657 ±  427963.480  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true    index_stats  avgt    3    33292.234 ±   36558.138  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true     node_stats  avgt    3    41748.139 ±   18534.155  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true  cluster_stats  avgt    3   120882.036 ±  156538.031  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true    index_stats  avgt    3    16474.120 ±    4239.607  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true     node_stats  avgt    3    20947.328 ±   10435.450  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true  cluster_stats  avgt    3   393069.494 ±  567368.126  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true    index_stats  avgt    3    28891.619 ±    5963.909  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true     node_stats  avgt    3    32433.385 ±    2307.440  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true  cluster_stats  avgt    3   551217.667 ±  109176.750  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true    index_stats  avgt    3    45367.850 ±    9757.989  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true     node_stats  avgt    3    55899.647 ±   19782.583  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true  cluster_stats  avgt    3   159949.383 ±    4244.783  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true    index_stats  avgt    3    15713.951 ±    3041.668  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true     node_stats  avgt    3    19171.149 ±    1728.970  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true  cluster_stats  avgt    3   172137.559 ±   25910.266  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true    index_stats  avgt    3    24635.608 ±   12618.133  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true     node_stats  avgt    3    28866.104 ±     378.672  ns/op

When filter pattern grow, The new FilterPath is more then twice faster then old one.

new：
Benchmark                                                  (fieldCount)  (inclusive)         (type)  Mode  Cnt        Score         Error  Units
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true  cluster_stats  avgt    3  231902.841 ±  39345.049  ns/op

old：
Benchmark                                                  (fieldCount)  (inclusive)         (type)  Mode  Cnt        Score         Error  Units
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true  cluster_stats  avgt    3   551217.667 ±  109176.750  ns/op

nik9000

@elasticmachine test this please

libs/x-content/src/main/java/org/elasticsearch/xcontent/XContent.java

nik9000 · 2021-10-26T18:14:28Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java

-        this.doubleWildcard = (segment != null) && (segment.length() == 2) && (segment.charAt(0) == '*') && (segment.charAt(1) == '*');
+    public FilterPath(String field) {
+        this.termsChildren = new HashMap<>();
+        this.wildcardChildren = new HashMap<>();


I'm not sure I like that the FilterPath is mutable. I mean, it doesn't really matter in the grand scheme of things because folks won't mutate it after building it. But I wonder if we can avoid it just to help with my paranoia.

nik9000 · 2021-10-26T18:23:01Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java


-    public String getSegment() {
-        return segment;
+    public boolean matches(String name, List<FilterPath> nextFilters) {


Can we make this package private and add a comment? It's an interesting contract. It makes sense in the context we use it. I just think we should make sure as few things as possible can call it.

libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java

nik9000 · 2021-10-26T18:27:39Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java

+        int end = filter.length();
+        for (int i = 0; i < end;) {
+            char c = filter.charAt(i);
+            if (c == '.') {


I think I'd be more comfortable if we found the next offset in one pass and then processed it in another. I'm not sure how I feel about return from inside of a for loop, but here I feel like it's a little surprising.

nik9000 · 2021-10-26T18:31:55Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java

-    @Override
-    public String toString() {
-        return "FilterPath [filter=" + filter + ", segment=" + segment + "]";
+        return Collections.singletonList(filterPath).toArray(new FilterPath[0]);


It's probably worth removing the array everywhere. But this is totally the kind of thing I'd do so I could test if my changes were having any effect on performance.

In FilterPathBasedFilter's evaluate method, It may be match more than one FilterPath. so FilterPathBasedFilter Construction method need an array.
eg: test match these patten: test , te* ,tes* .

nik9000 · 2021-10-26T18:32:57Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java

-    public FilterPath getNext() {
-        return next;
+    private boolean isEnd() {
+        return termsChildren.isEmpty() && wildcardChildren.isEmpty();


I'm curious about what happens when you have two compile a pattern like foo.bar,foo.bar.baz - in this case I think we would return all of foo.bar - the second pattern is basically redundant. So this test here may be correct, but I think it's correct for sneaky reasons worth a comment.

nik9000 · 2021-10-26T18:34:42Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java

-    private final FilterPath next;
-    private final boolean simpleWildcard;
+    private final Map<String, FilterPath> termsChildren;
+    private final Map<String, FilterPath> wildcardChildren;


It's good to have the Map for building the children but I think iterating the HashMap is actively bad for performance when running the match operation.

weizijun · 2021-10-27T02:00:50Z

@nik9000 , thank you very much for your review, I will improve these comments.

…ration

…formance * upstream/master: (153 commits) [ML] update truncation default & adding field output when input is truncated (elastic#79942) [ML] stop using isAllowedByLicense for model license checks (elastic#79908) [ML] Retain built-in ML roles granting Kibana privileges (elastic#80014) [Transform] remove old mixed cluster BWC layers, not required for 8x (elastic#79927) Increase test timeout for CoordinatorTests testAllSearchesExecuted [Transform] add rolling upgrade tests for upgrade endpoint (elastic#79721) [ML] Update trained model docs for truncate parameter for bert tokenization (elastic#79652) `CoordinatorTests` sometimes needs three term bumps (elastic#79574) [ML] Account for service being triggered twice in tests (elastic#80000) SearchContext: remove unused variable (elastic#79917) Revert "Deprecate resolution loss on date field (elastic#78921)" (elastic#79914) Re-enable GeoIpDownloaderIT#testStartWithNoDatabases() (elastic#79907) Fix SnapshotBasedIndexRecoveryIT#testSeqNoBasedRecoveryIsUsedAfterPrimaryFailOver (elastic#79469) Fix RecoverySourceHandlerTests (elastic#79546) SQL: stabilize SqlSearchPageTimeoutIT (elastic#79928) Wait 3 seconds for the server to reload trust (elastic#79778) Skip automatically preserved request headers when rewriting (elastic#79973) Check whether stdout is a real console (elastic#79882) Convert remote license checker to use LicensedFeature (elastic#79876) Miscellaneous fixes for LDAP SDK v6 upgrade (elastic#79891) ... # Conflicts: # libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java # libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/FilterPathGeneratorFilteringTests.java # libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/FilterPathTests.java

elasticmachine · 2021-10-28T17:38:24Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

nik9000 · 2021-11-11T18:36:15Z

@weizijun you asked me for another round. I think the thing I'd most like to see is for FilterPath to be immutable. I think there's some interesting things going on here that'd be more clear if we had a builder or something like that.

weizijun · 2021-11-12T14:30:32Z

@elasticmachine update branch

weizijun · 2021-11-12T15:50:31Z

@weizijun you asked me for another round. I think the thing I'd most like to see is for FilterPath to be immutable. I think there's some interesting things going on here that'd be more clear if we had a builder or something like that.

there are many case. that can't cache FilterPath.compile result , It will call FilterPath.compile every time. I test that use a builder to build the FilterPath, but the performance isn't good. I make insert as a private method, so only compile method can create FilterPath, and no way can modify the FilterPath's internal variable

weizijun · 2021-11-16T14:43:56Z

here is the result of current commit (without builder):

Benchmark                                                  (fieldCount)  (inclusive)         (type)  Mode  Cnt       Score        Error  Units
FilterContentBenchmark.filterWithNewParserConfig               10_field         true  cluster_stats  avgt    3  125630.012 ±  43233.062  ns/op
FilterContentBenchmark.filterWithNewParserConfig               10_field         true    index_stats  avgt    3   17199.506 ±  16205.591  ns/op
FilterContentBenchmark.filterWithNewParserConfig               10_field         true     node_stats  avgt    3   18706.487 ±   8269.021  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true  cluster_stats  avgt    3  323293.870 ± 160494.748  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true    index_stats  avgt    3   30375.369 ±  13070.669  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true     node_stats  avgt    3   36620.540 ±   7108.038  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true  cluster_stats  avgt    3  462424.507 ± 136969.111  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true    index_stats  avgt    3   44915.709 ±   9761.707  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true     node_stats  avgt    3   56310.895 ±  18217.109  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true  cluster_stats  avgt    3  177093.113 ± 160724.229  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true    index_stats  avgt    3   18092.285 ±   7952.427  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true     node_stats  avgt    3   21655.977 ±   9084.962  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true  cluster_stats  avgt    3  191182.067 ± 143725.583  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true    index_stats  avgt    3   25645.334 ±   5691.638  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true     node_stats  avgt    3   29105.775 ±  11546.573  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true  cluster_stats  avgt    3  108692.473 ±  13760.244  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true    index_stats  avgt    3   13008.599 ±   3790.006  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true     node_stats  avgt    3   16625.404 ±   7738.217  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true  cluster_stats  avgt    3  195801.756 ±  67181.793  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true    index_stats  avgt    3   19406.511 ±  11415.749  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true     node_stats  avgt    3   23041.626 ±   6924.014  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true  cluster_stats  avgt    3  231902.841 ±  39345.049  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true    index_stats  avgt    3   23504.174 ±  14444.700  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true     node_stats  avgt    3   26650.139 ±  10977.090  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true  cluster_stats  avgt    3  173536.358 ±   7948.666  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true    index_stats  avgt    3   17432.681 ±   2051.736  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true     node_stats  avgt    3   21549.923 ±   2796.963  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true  cluster_stats  avgt    3  181061.389 ±  97643.301  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true    index_stats  avgt    3   22329.902 ±  12923.636  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true     node_stats  avgt    3   25704.656 ±  14194.372  ns/op

weizijun · 2021-11-16T14:44:36Z

here is the result with a FilterPath builder:

Benchmark                                                  (fieldCount)  (inclusive)         (type)  Mode  Cnt       Score        Error  Units
FilterContentBenchmark.filterWithNewParserConfig               10_field         true  cluster_stats  avgt    3  120724.363 ±   1847.915  ns/op
FilterContentBenchmark.filterWithNewParserConfig               10_field         true    index_stats  avgt    3   25700.478 ±  35972.505  ns/op
FilterContentBenchmark.filterWithNewParserConfig               10_field         true     node_stats  avgt    3   25455.610 ±   4487.196  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true  cluster_stats  avgt    3  551504.323 ±  88932.724  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true    index_stats  avgt    3   64639.860 ±  70283.057  ns/op
FilterContentBenchmark.filterWithNewParserConfig             half_field         true     node_stats  avgt    3   78739.087 ±   6328.356  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true  cluster_stats  avgt    3  949238.665 ± 601183.001  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true    index_stats  avgt    3  102391.586 ±   7250.291  ns/op
FilterContentBenchmark.filterWithNewParserConfig              all_field         true     node_stats  avgt    3  134091.199 ±  42111.633  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true  cluster_stats  avgt    3  158402.969 ±  11278.576  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true    index_stats  avgt    3   16604.226 ±   1184.135  ns/op
FilterContentBenchmark.filterWithNewParserConfig         wildcard_field         true     node_stats  avgt    3   20135.669 ±   1274.648  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true  cluster_stats  avgt    3  186846.661 ±  17204.705  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true    index_stats  avgt    3   27895.644 ±   6466.574  ns/op
FilterContentBenchmark.filterWithNewParserConfig      10_wildcard_field         true     node_stats  avgt    3   31325.693 ±   2198.252  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true  cluster_stats  avgt    3   94980.708 ±   6867.727  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true    index_stats  avgt    3   12325.762 ±    940.023  ns/op
FilterContentBenchmark.filterWithParserConfigCreated           10_field         true     node_stats  avgt    3   14945.972 ±    377.460  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true  cluster_stats  avgt    3  182547.734 ±  40934.271  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true    index_stats  avgt    3   17536.035 ±   1086.000  ns/op
FilterContentBenchmark.filterWithParserConfigCreated         half_field         true     node_stats  avgt    3   22170.274 ±   4365.790  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true  cluster_stats  avgt    3  210148.246 ±   2786.939  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true    index_stats  avgt    3   20630.002 ±   1005.205  ns/op
FilterContentBenchmark.filterWithParserConfigCreated          all_field         true     node_stats  avgt    3   24953.435 ±    754.936  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true  cluster_stats  avgt    3  173188.846 ±  97552.919  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true    index_stats  avgt    3   17187.918 ±   2121.380  ns/op
FilterContentBenchmark.filterWithParserConfigCreated     wildcard_field         true     node_stats  avgt    3   19932.483 ±   4762.103  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true  cluster_stats  avgt    3  173138.035 ±  22105.169  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true    index_stats  avgt    3   21296.013 ±   2022.140  ns/op
FilterContentBenchmark.filterWithParserConfigCreated  10_wildcard_field         true     node_stats  avgt    3   26567.965 ±  41173.636  ns/op

weizijun · 2021-11-16T14:48:08Z

@nik9000 I test the two case, when fields grow, the builder version is slower in NewParserConfig case.

nik9000 · 2021-11-16T14:52:22Z

@nik9000 I test the two case, when fields grow, the builder version is slower in NewParserConfig case.

Could you push the builder version somewhere so I can have a look at it please? I hadn't thought it'd make a difference.

weizijun · 2021-11-16T15:04:21Z

@nik9000 I test the two case, when fields grow, the builder version is slower in NewParserConfig case.

Could you push the builder version somewhere so I can have a look at it please? I hadn't thought it'd make a difference.

ok, I revert the code before, and I can push it again

nik9000

So! I think I still like the builder version better. I think it's easier to understand which is nice. I think at ingest and search time we have a much better chance of reusing the filters. And those are the times when the performance is most important.

Even so, I think the builder version gives us more of a chance to optimize later because we can make choices based on what we've collected.

nik9000 · 2021-11-16T19:50:09Z

libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java

+            for (int i = 0; i < end;) {
+                char c = filter.charAt(i);
+                if (c == '.') {
+                    String field = filter.substring(0, i).replaceAll("\\\\.", ".");


Losing the hasEscape flag that your non-builder version had seems to really slow down the builder version.

nik9000 · 2021-11-16T19:53:21Z

libs/x-content/src/test/java/org/elasticsearch/xcontent/support/filtering/FilterPathTests.java

-        assertThat(filterPath.getSegment(), is(emptyString()));
-        assertSame(filterPath, FilterPath.EMPTY);
-
-        input = "w\\.0\\.0\\.t";


I think it'd be nice to still have a test for escapes.

libs/x-content/src/main/java/org/elasticsearch/xcontent/support/filtering/FilterPath.java

nik9000 · 2021-11-18T17:41:30Z

@elasticmchine, test this please

nik9000 · 2021-11-18T17:55:49Z

@elasticmachine, update branch

nik9000 · 2021-11-18T17:56:14Z

@elasticmachine update branch

nik9000 · 2021-11-18T18:07:48Z

@elasticmachine test this please

nik9000 · 2021-11-18T18:39:35Z

run elasticsearch-ci/part-2

nik9000 · 2021-11-18T22:00:45Z

OK! Everything finally passed. But I'm about to step away. I'll merge this in the morning.

weizijun added 3 commits October 25, 2021 23:35

improve

ea4ec7a

some improve

3a1b0e9

change name

350db73

elasticsearchmachine added v8.0.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 26, 2021

remove unused code

0fbc98d

nik9000 mentioned this pull request Oct 26, 2021

Add better support for metric data types (TSDB) #74660

Closed

nik9000 reviewed Oct 26, 2021

View reviewed changes

weizijun added 11 commits October 27, 2021 14:33

revert

a14d028

improve

06981d3

add comments

d36f01b

add comments and make the matches default scope

2f0fefb

change wildcardChildren from HashMap to LinkedHashMap to speed up Ite…

e1472ce

…ration

add some tests

9575eb0

add some benchmarks

79b061d

improve performance

364cdea

checkstyle

07d8dc7

improve

b808311

improve

22f5abb

weizijun marked this pull request as ready for review October 28, 2021 15:20

jtibshirani added the :Core/Infra/Core Core issues without another label label Oct 28, 2021

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Oct 28, 2021

weizijun added 2 commits October 29, 2021 16:36

remove the benchmark

c9a043a

revert builder

a7253ff

elasticmachine and others added 2 commits November 12, 2021 06:30

Merge branch 'master' into improve-FilterPathBasedFilter-performance

46d26ef

make insert method private

8c06b8f

revert revert builder

52d2d4c

nik9000 requested changes Nov 16, 2021

View reviewed changes

add tests

de44a3a

nik9000 approved these changes Nov 18, 2021

View reviewed changes

Merge branch 'master' into improve-FilterPathBasedFilter-performance

c67afa8

nik9000 merged commit c758d1c into elastic:master Nov 19, 2021

wchaparro assigned weizijun and unassigned weizijun Dec 16, 2021

mark-vieira added v8.0.0-rc1 and removed v8.0.0 labels Jan 12, 2022

Conversation

weizijun commented Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weizijun commented Oct 27, 2021

Uh oh!

elasticmachine commented Oct 28, 2021

Uh oh!

nik9000 commented Nov 11, 2021

Uh oh!

weizijun commented Nov 12, 2021

Uh oh!

weizijun commented Nov 12, 2021

Uh oh!

weizijun commented Nov 16, 2021

Uh oh!

weizijun commented Nov 16, 2021

Uh oh!

weizijun commented Nov 16, 2021

Uh oh!

nik9000 commented Nov 16, 2021

Uh oh!

weizijun commented Nov 16, 2021

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nik9000 commented Nov 18, 2021

Uh oh!

nik9000 commented Nov 18, 2021

Uh oh!

nik9000 commented Nov 18, 2021

Uh oh!

nik9000 commented Nov 18, 2021

Uh oh!

nik9000 commented Nov 18, 2021

Uh oh!

nik9000 commented Nov 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

weizijun commented Oct 26, 2021 •

edited

Loading