[ML] Ensure bulk requests are not over memory limit by dimitris-athanasiou · Pull Request #60219 · elastic/elasticsearch

dimitris-athanasiou · 2020-07-27T14:30:52Z

Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.

Note the limit was implemented in #58885.

Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in elastic#58885.

elasticmachine · 2020-07-27T14:30:54Z

Pinging @elastic/ml-core (:ml)

benwtrent

We should try to use this type of logic in the anomaly job result bulk indexer as well.

przemekwitek · 2020-07-28T07:54:43Z

...src/test/java/org/elasticsearch/xpack/ml/dataframe/process/AnalyticsProcessManagerTests.java

        modelLoadingService = mock(ModelLoadingService.class);
-        processManager = new AnalyticsProcessManager(client, executorServiceForJob, executorServiceForProcess, processFactory, auditor,
-            trainedModelProvider, modelLoadingService, resultsPersisterService, 1);
+        processManager = new AnalyticsProcessManager(Settings.builder().build(), client, executorServiceForJob, executorServiceForProcess,


Would Settings.EMPTY work here as well?

I wanted to change those and forgot. Thanks for spotting!

przemekwitek · 2020-07-28T07:55:14Z

.../ml/src/test/java/org/elasticsearch/xpack/ml/dataframe/process/DataFrameRowsJoinerTests.java


-        verifyNoMoreInteractions(resultsPersisterService);
+        List<BulkRequest> capturedBulkRequests = bulkRequestCaptor.getAllValues();
+        assertThat(capturedBulkRequests.size(), equalTo(1));


hasSize matcher could be used here instead.

przemekwitek · 2020-07-28T07:55:45Z

...plugin/ml/src/test/java/org/elasticsearch/xpack/ml/utils/persistence/MlBulkIndexerTests.java

+
+public class MlBulkIndexerTests extends ESTestCase {
+
+    private List<BulkRequest> executedBulkRequests = new ArrayList<>();


This is getting reinitialised for each test so it cannot be final.

przemekwitek · 2020-07-28T07:56:34Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/utils/persistence/MlBulkIndexer.java

+ * that do exceed a 1000 operations or half the available memory
+ * limit for indexing.
+ */
+public class MlBulkIndexer implements AutoCloseable {


There is no ML-specific code in this class. Would it make sense to rename it to BulkIndexer?

I wanted to avoid calling it plainly BulkIndexer as it sounds too much like a basic ES component. I decided to prefix with Ml to indicate this is a utility designed to serve purposes within the ML plugin. If we end up using it in other places then we shall move it and rename it accordingly.

dimitris-athanasiou · 2020-07-28T09:38:07Z

@elasticmachine update branch

Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in elastic#58885. Backport of elastic#60219

…0283) Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in #58885. Backport of #60219

…0284) Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in #58885. Backport of #60219

dimitris-athanasiou added >bug :ml Machine learning v8.0.0 v7.9.0 v7.10.0 labels Jul 27, 2020

benwtrent self-requested a review July 27, 2020 14:38

benwtrent approved these changes Jul 27, 2020

View reviewed changes

przemekwitek reviewed Jul 28, 2020

View reviewed changes

dimitris-athanasiou added 2 commits July 28, 2020 12:06

Address review points

195acdb

Add debug log message

9dd6221

elasticmachine and others added 2 commits July 28, 2020 03:38

Merge branch 'master' into ensure-ml-bulk-requests-are-not-over-limit

037e9f3

Rename to LimitAwareBulkIndexer

639f0fc

dimitris-athanasiou merged commit 9528f9c into elastic:master Jul 28, 2020

dimitris-athanasiou deleted the ensure-ml-bulk-requests-are-not-over-limit branch July 28, 2020 11:46

dimitris-athanasiou mentioned this pull request Jul 28, 2020

[7.x][ML] Ensure bulk requests are not over memory limit (#60219) #60283

Merged

dimitris-athanasiou mentioned this pull request Jul 28, 2020

[7.9][ML] Ensure bulk requests are not over memory limit (#60219) #60284

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Ensure bulk requests are not over memory limit#60219

[ML] Ensure bulk requests are not over memory limit#60219
dimitris-athanasiou merged 5 commits intoelastic:masterfrom
dimitris-athanasiou:ensure-ml-bulk-requests-are-not-over-limit

dimitris-athanasiou commented Jul 27, 2020

Uh oh!

elasticmachine commented Jul 27, 2020

Uh oh!

benwtrent left a comment

Uh oh!

przemekwitek Jul 28, 2020

Uh oh!

dimitris-athanasiou Jul 28, 2020

Uh oh!

przemekwitek Jul 28, 2020

Uh oh!

przemekwitek Jul 28, 2020

Uh oh!

dimitris-athanasiou Jul 28, 2020

Uh oh!

przemekwitek Jul 28, 2020

Uh oh!

dimitris-athanasiou Jul 28, 2020

Uh oh!

dimitris-athanasiou commented Jul 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		public class MlBulkIndexerTests extends ESTestCase {

		private List<BulkRequest> executedBulkRequests = new ArrayList<>();

Conversation

dimitris-athanasiou commented Jul 27, 2020

Uh oh!

elasticmachine commented Jul 27, 2020

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

przemekwitek Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

przemekwitek Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

przemekwitek Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

przemekwitek Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou Jul 28, 2020

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou commented Jul 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants