Using BulkProcessor2 in Deprecation Logging by masseyke · Pull Request #94211 · elastic/elasticsearch

masseyke · 2023-02-28T22:13:46Z

In #91238 we rewrote BulkProcessor to avoid deadlock that had been seen in the IlmHistoryStore. This PR ports deprecation logging over to the new BulkProcessor2 implementation. This PR is somewhat complicated because we queue up deprecation log requests until the deprecation index template and ILM policy are in place. This was previously handled by BulkProcessor in the logic added in #80406. But this has not been ported to BulkProcessor2 since it was unique to this use case. Instead, there's now a bounded queue in DeprecationIndexingComponent to handle this.

jbaiera · 2023-03-01T18:54:46Z

.../src/main/java/org/elasticsearch/xpack/deprecation/logging/DeprecationIndexingComponent.java

+             * performance (adding a periodic task to check the queue).
+             */
+            if (flushEnabled.get()) {
+                processor.add(request);


Had an idea about this. What if we drain the queue in this location as well? It shouldn't replay any previously removed items, and it will clear any items that might have snuck into the queue since the drain operation in the cluster management thread.

Might be overkill, but this is a clunky thing already without fully implementing a producer consumer model.

That's a good idea. Thanks.

…ts stuck in the buffer

elasticsearchmachine · 2023-03-01T21:16:54Z

Pinging @elastic/es-data-management (Team:Data Management)

DaveCTurner

I like it. One comment.

DaveCTurner · 2023-03-02T08:11:40Z

.../src/main/java/org/elasticsearch/xpack/deprecation/logging/DeprecationIndexingComponent.java

+     * drained and its contents are sent to the processor. The queue is unbounded because we are first going through processor::add, which
+     * starts rejecting documents if the total number of bytes in flight gets too large.
+     */
+    private final BlockingQueue<Tuple<BulkRequest, ActionListener<BulkResponse>>> requestAndListenerBuffer = new LinkedBlockingQueue<>();


I'd be inclined to make this a queue of Runnable, there's no need to capture the request and listener separately. Also maybe use a ConcurrentLinkedQueue which avoids the need to handle any InterruptedException since there won't be any blocking here anyway.

in the light of fa87b0b maybe a queue of AbstractRunnable, that way you can fail any requests that you never got around to even sending. Better that than just leaking them IMO.

DaveCTurner · 2023-03-02T12:38:05Z

server/src/main/java/org/elasticsearch/action/bulk/BulkProcessor2.java

+     */
+    public void close() {
+        try {
+            awaitClose(Integer.MAX_VALUE, TimeUnit.MILLISECONDS);


Oh yeah also I think this should not wait so long especially since it's only used in tests. A minute should be more than enough. It would be good to return something from awaitClose() to indicate whether the wait was successful or not, so that you can e.g. assertTrue(processor.awaitClose(1, TimeUnit.MINUTES)). Also not sure this needs doing here, it seems independent of the other changes?

I've actually got several related PRs going on at once. One of them (#94133, just merged to main) changes awaitClose to return a boolean. Another one (#94197) depends on having this close method. The DeprecationIndexingComponent actually calls close(), which is why I made the change here (and modified a BulkProcessor test just to make sure it had a little coverage). I've been going back and forth over what the correct behavior here ought to be -- wait forever? wait some amount of time? remove this method and make the client think about what is right for them (although that complicates the dboule try-with-resources in #94197)?

It's usual to wait a bit (~30s say) but then to proceed regardless. We don't want to block the node shutting down forever, but bailing out straight away will sometimes drop some messages even if everything is working properly.

masseyke · 2023-03-02T20:40:41Z

@elasticmachine update branch

jbaiera

LGTM pending green CI

masseyke · 2023-03-02T22:14:39Z

@elasticmachine update branch

masseyke · 2023-03-03T18:10:29Z

@elasticmachine update branch

Using BulkProcessor2 in deprecation logging

9de2ea1

masseyke added >non-issue :Data Management/Other v8.8.0 labels Feb 28, 2023

spotlessApply

95d2fde

jbaiera reviewed Mar 1, 2023

View reviewed changes

masseyke added 2 commits March 1, 2023 14:48

fixing thread safety, and decreasing the likelihood that a message ge…

8d2a6a5

…ts stuck in the buffer

using BulkProcessor2.close in an integration test

7715375

masseyke marked this pull request as ready for review March 1, 2023 21:16

elasticsearchmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Mar 1, 2023

masseyke requested a review from jbaiera March 1, 2023 21:27

masseyke marked this pull request as draft March 1, 2023 23:50

Avoiding having to use an arbitrarily-sized queue

1e15ed3

DaveCTurner reviewed Mar 2, 2023

View reviewed changes

masseyke added 3 commits March 2, 2023 08:37

Avoiding server hanging on close

fa87b0b

storing Runnables instead of Tuples

a08406e

only waiting 30s for a bulk loader to close

5976413

masseyke marked this pull request as ready for review March 2, 2023 18:00

masseyke requested a review from DaveCTurner March 2, 2023 20:37

Merge branch 'main' into using-bulkprocessor2-in-deprecation-indexing

cb926db

jbaiera approved these changes Mar 2, 2023

View reviewed changes

Merge branch 'main' into using-bulkprocessor2-in-deprecation-indexing

df72c45

Merge branch 'main' into using-bulkprocessor2-in-deprecation-indexing

e4eab48

masseyke merged commit 87bf9f2 into elastic:main Mar 6, 2023

masseyke deleted the using-bulkprocessor2-in-deprecation-indexing branch March 6, 2023 14:30

Conversation

masseyke commented Feb 28, 2023

Uh oh!

jbaiera Mar 1, 2023

Choose a reason for hiding this comment

Uh oh!

masseyke Mar 1, 2023

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Mar 1, 2023

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

masseyke Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

masseyke commented Mar 2, 2023

Uh oh!

jbaiera left a comment

Choose a reason for hiding this comment

Uh oh!

masseyke commented Mar 2, 2023

Uh oh!

masseyke commented Mar 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants