Add profiling and documentation for dfs phase by jdconrad · Pull Request #90536 · elastic/elasticsearch

jdconrad · 2022-09-29T19:55:22Z

This change adds profiling statistics for the dfs phase that look like the following:

{
    ...
    "dfs" : {
        "statistics" : {
            "type" : "statistics",
            "description" : "collect term statistics",
            "time_in_nanos" : 236955,
            "breakdown" : {
                "term_statistics" : 4815,
                "collection_statistics" : 27081,
                "collection_statistics_count" : 1,
                "create_weight" : 153278,
                "term_statistics_count" : 1,
                "rewrite_count" : 0,
                "create_weight_count" : 1,
                "rewrite" : 0
            }
        }
    }
    ...
}

This change also adds documentation for both the above dfs phase profiling and kNN profiling.

Closes #89713

github-actions · 2022-09-29T19:55:35Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2022-09-29T19:55:46Z

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine · 2022-09-29T19:58:14Z

Hi @jdconrad, I've created a changelog YAML for you.

nik9000 · 2022-09-29T22:46:22Z

docs/reference/search/profile.asciidoc

+As an example, let's first setup an index with multiple shards and index
+a pair of documents with different values on a keyword field.
+
+[source,console]


If you do [source,console,id=profile_dfs] this'll name the yaml test it makes and make an invisible but linkable anchor tag in the docs. I've been naming everything lately because it makes the error messages when the test fails nicer.

nik9000 · 2022-09-29T22:49:20Z

docs/reference/search/profile.asciidoc

+    ...
+}
+--------------------------------------------------
+// NOTCONSOLE


You can make this [source,console-result] with a bit of hacking to replace the .... Something like

// TESTRESPONSE[s/: \{\.\.\.\}/: $body.$_path/] // TESTRESPONSE[s/: (\-)?[0-9]+/: $body.$_path/]

might do it, depending on how much you are willing to fight with the test generator. The advantage of doing this is that it'll fail if the names of the results change so it'll force us to keep at least the json keys up to date.

@nik9000 Thank you for walking me through and creating the appropriate test responses with me!

nik9000 · 2022-09-29T22:51:11Z

docs/reference/search/profile.asciidoc

+[[profiling-knn]]
+===== Profiling kNN
+
+A k-nearest neighbor (kNN) search runs as part of the dfs phase. To


Maybe should link to <<approximate-knn>>?

nik9000 · 2022-09-29T22:52:35Z

docs/reference/search/profile.asciidoc

 ordinals (an internal data structure used to speed up search).
 - Profiling statistics are currently not available for suggestions,
-highlighting, `dfs_query_then_fetch`.
+highlighting


Could you remove highlighting as well? Profiling is available for highlighting as part of the fetch phase work I did a while back. I just never noticed this line.

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/370_profile.yml

nik9000 · 2022-09-29T22:54:05Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/370_profile.yml

+  - is_true: profile.shards.0.dfs.statistics.breakdown
+  - match: { profile.shards.1.dfs.statistics.type: "statistics" }
+  - match: { profile.shards.1.dfs.statistics.description: "collect term statistics" }
+  - gt: { profile.shards.1.dfs.statistics.time_in_nanos: 0 }


I think asserting that both shards have time > 0 relies on the hashing of the _ids landing on different shards, right? Maybe worth a comment, but ok, I guess.

Could this cause test flakes though, in cases where all docs happen to be hashed to the same shard?

My understanding is which shards the docs are on shouldn't matter as we still need to hit each shard to collect dfs information, so each shard will have a dfs profile even if no info is ultimately aggregated. (Am I missing some caching or shard skipping in this case?)

I have updated the test to change/add the following:

changed gt to gte for time_in_nanos

updated the test to add a no documents case where we still expect to get dfs profile info

updated the test to add a single document case where we still expect to get dfs profile info

updated the test to add a three document case where two share the same keyword and we still expect to get dfs profile info

updated the test to ensure we get no profile info if search type is query_then_fetch

👍 this makes sense. For me maybe we don't need both tests (3) and (4) since having the same keyword shouldn't affect shard routing, or whether we use DFS?

I removed 3 and 4 since they are extraneous as you mentioned.

nik9000 · 2022-09-29T22:55:43Z

server/src/main/java/module-info.java

    opens org.elasticsearch.common.logging to org.apache.logging.log4j.core;

+    exports org.elasticsearch.search.profile.dfs;
+


What's this bit for? Should this be in alphabetic order?

Not sure why spotless didn't pick this up. I moved it to the appropriate location and it mimics the behavior of the rest of the of profile package to allow for other modules to use these classes.

server/src/main/java/org/elasticsearch/search/profile/SearchProfileDfsPhaseResult.java

nik9000 · 2022-09-29T23:01:07Z

server/src/main/java/org/elasticsearch/search/profile/dfs/DfsProfiler.java

+        QueryProfileShardResult queryProfileShardResult = profiledVectorQuery
+            ? new QueryProfileShardResult(queryProfiler.getTree(), queryProfiler.getRewriteTime(), queryProfiler.getCollector())
+            : null;
+        return new SearchProfileDfsPhaseResult(dfsProfileResult, queryProfileShardResult);


Is there any way to use the "emptiness" of some object to infer this? Like, if the breakdown map is empty it's all the statistics? I don't know that that's a perfect thing, but it might make this easier for someone to read in nine months.

I agree on this being a bit confusing. I'll give it some more thought on a better way to work with this existing. It's a bit tricky because of what things are available at what times.

+1 it'd be nice to avoid these mutable booleans like profiledVectorQuery if possible (but I haven't looked into it deeply how we could do that!)

I tried to clean this up a bit. If the dfs phase runs, we always attempt to collect term stats, so there's no reason to check anything. For knn, we have to set the collector on the profiler, so I track whether or not we've set the collector in DfsProfiler to determine if we should have a knn profile section.

jtibshirani · 2022-10-03T17:13:11Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/370_profile.yml

+  - is_true: profile.shards.0.dfs.statistics.breakdown
+  - match: { profile.shards.1.dfs.statistics.type: "statistics" }
+  - match: { profile.shards.1.dfs.statistics.description: "collect term statistics" }
+  - gt: { profile.shards.1.dfs.statistics.time_in_nanos: 0 }


Could this cause test flakes though, in cases where all docs happen to be hashed to the same shard?

docs/reference/search/profile.asciidoc

jtibshirani · 2022-10-03T17:42:26Z

server/src/main/java/org/elasticsearch/search/profile/dfs/DfsProfiler.java

+        QueryProfileShardResult queryProfileShardResult = profiledVectorQuery
+            ? new QueryProfileShardResult(queryProfiler.getTree(), queryProfiler.getRewriteTime(), queryProfiler.getCollector())
+            : null;
+        return new SearchProfileDfsPhaseResult(dfsProfileResult, queryProfileShardResult);


+1 it'd be nice to avoid these mutable booleans like profiledVectorQuery if possible (but I haven't looked into it deeply how we could do that!)

jdconrad · 2022-10-03T17:58:26Z

@nik9000 @jtibshirani Thank you both for the review. I will work on addressing all of the feedback.

jdconrad · 2022-10-04T19:09:21Z

@nik9000 @jtibshirani I think I have addressed all of the given feedback, so this is ready for another round of review.

jtibshirani

This looks good to me too, I just left some tiny last comments.

jtibshirani · 2022-10-04T23:03:00Z

docs/reference/search/profile.asciidoc

+the dfs phase.
+
+The following is an example of setting `profile` to `true` on a search
+that has a knn section:


Tiny comment, we usually style the names of API parameters as literals ("knn section").

Good catch. Fixed.

jtibshirani · 2022-10-04T23:03:30Z

docs/reference/search/profile.asciidoc

+the of timings for <<query-section, query>>, <<rewrite-section, rewrite>>,
+and <<collectors-section, collector>>. Unlike many other queries, kNN
+search does the bulk of the work during the query rewrite. This means
+rewrite_time represents the time spent on kNN search.


Same comment here, we should say rewrite_time.

jtibshirani · 2022-10-04T23:04:23Z

docs/reference/search/profile.asciidoc

-highlighting, `dfs_query_then_fetch`.
+- Profiling statistics are currently not available for suggestions.
 - Profiling of the reduce phase of aggregation is currently not available.
 - The Profiler is instrumenting internals that can change from version to


Just noticed this existing comment and glad we have it -- it will let us tweak the DFS output format if needed!

jtibshirani · 2022-10-04T23:12:29Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/370_profile.yml

+  - is_true: profile.shards.0.dfs.statistics.breakdown
+  - match: { profile.shards.1.dfs.statistics.type: "statistics" }
+  - match: { profile.shards.1.dfs.statistics.description: "collect term statistics" }
+  - gt: { profile.shards.1.dfs.statistics.time_in_nanos: 0 }


👍 this makes sense. For me maybe we don't need both tests (3) and (4) since having the same keyword shouldn't affect shard routing, or whether we use DFS?

mark-vieira · 2022-10-05T15:45:11Z

@elasticmachine retest this please

* main: (150 commits) Remove ToXContent interface from ChunkedToXContent (elastic#90409) Remove extra SearchService constructor (elastic#90733) Update min version for the diagnosis yaml test (elastic#90731) Use the AggTestConfig object in testCase (elastic#90699) [DOCS] Add links to clear trained model deployment cache API (elastic#90727) Assert wildcards are not expanded as specified by request options (elastic#90641) [TEST] Fix exit snapshot restore exit condition (elastic#90696) [TEST] Change to atomic file contents save (elastic#90695) Update forbiddenapis to 3.4 (elastic#90624) [Tests] Don't use concurrent search in scripted field type tests (elastic#90712) [ML] Move scaling is possible check for starting trained model (elastic#90706) Add new base test case for chunked xcontent types (elastic#90707) Fix testRedNoBlockedIndicesAndRedAllRoleNodes (elastic#90671) Fix nullpointer in docs test setup (elastic#90660) Don't produce build logs artifact when in a composite build Fixing a race condition in EnrichCoordinatorProxyAction that can leave an item stuck in its queue (elastic#90688) docs: update fleet/agent pipeline docs (elastic#90659) [HealthAPI] Use plural consistently in resource types (elastic#90682) [Testing] Enable bwc and fix sorting for 500_date_range (elastic#90681) Add profiling and documentation for dfs phase (elastic#90536) ... # Conflicts: # x-pack/plugin/mapper-aggregate-metric/src/test/java/org/elasticsearch/xpack/aggregatemetric/mapper/AggregateDoubleMetricFieldMapperTests.java

Introduced in: #90536 Profiling for DFS has had its timing numbers looking weird, additionally, it would trigger some assertion failures because `timer.start()` was called without a `stop()` in between. The key issue was around query `weight` creation. `Weight` creation could be called recursively, thus calling `start` on the timer more than once before calling stop.

Introduced in: elastic#90536 Profiling for DFS has had its timing numbers looking weird, additionally, it would trigger some assertion failures because `timer.start()` was called without a `stop()` in between. The key issue was around query `weight` creation. `Weight` creation could be called recursively, thus calling `start` on the timer more than once before calling stop.

Introduced in: #90536 Profiling for DFS has had its timing numbers looking weird, additionally, it would trigger some assertion failures because `timer.start()` was called without a `stop()` in between. The key issue was around query `weight` creation. `Weight` creation could be called recursively, thus calling `start` on the timer more than once before calling stop.

jdconrad added 3 commits September 28, 2022 14:04

add profiling for dfs stats collection

81ca149

Add docs for dfs stats section

ee448e9

add docs for knn profile

e3dd1ad

jdconrad added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.6.0 labels Sep 29, 2022

jdconrad requested review from jtibshirani and nik9000 September 29, 2022 19:55

elasticsearchmachine added the Team:Search Meta label for search team label Sep 29, 2022

jdconrad added 2 commits September 29, 2022 12:58

Update docs/changelog/90536.yaml

0e5271e

Merge branch 'main' into dfsprofile

b74ff00

nik9000 reviewed Sep 29, 2022

View reviewed changes

jtibshirani reviewed Oct 3, 2022

View reviewed changes

jdconrad and others added 10 commits October 3, 2022 11:29

Merge branch 'main' into dfsprofile

f84d6fe

updated docs based on pr feedback

f6a83a3

clean up if we need portions of the dfs profile response

80f9d6f

clean up if/else block

cd38147

Move dfsprofile package to be alphabetical.

b8421f6

Update dfs test without knn

e89bbcf

improve dfs query then fetch profile tests

a9191e5

attempt to add docs test response

9edf462

WEEEE

4d3287b

Merge branch 'main' into dfsprofile

6fd4635

nik9000 approved these changes Oct 4, 2022

View reviewed changes

jtibshirani approved these changes Oct 4, 2022

View reviewed changes

jdconrad added 2 commits October 5, 2022 08:13

response to pr comments

2359bed

Merge branch 'main' into dfsprofile

b547b1f

jdconrad merged commit 8b0d071 into elastic:main Oct 5, 2022

benwtrent mentioned this pull request Dec 16, 2022

Fix timing bug with DFS profiling #92421

Merged

pquentin mentioned this pull request Sep 16, 2024

Fix search response types elastic/elasticsearch-specification#2893

Merged

		opens org.elasticsearch.common.logging to org.apache.logging.log4j.core;

		exports org.elasticsearch.search.profile.dfs;

Conversation

jdconrad commented Sep 29, 2022

Uh oh!

github-actions bot commented Sep 29, 2022

Uh oh!

elasticsearchmachine commented Sep 29, 2022

Uh oh!

elasticsearchmachine commented Sep 29, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdconrad Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdconrad commented Oct 3, 2022

Uh oh!

jdconrad commented Oct 4, 2022

Uh oh!

jtibshirani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdconrad Oct 4, 2022 •

edited

Loading