[ML] Ensure bulk requests are not over memory limit#60219
Conversation
Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in elastic#58885.
|
Pinging @elastic/ml-core (:ml) |
benwtrent
left a comment
There was a problem hiding this comment.
We should try to use this type of logic in the anomaly job result bulk indexer as well.
| modelLoadingService = mock(ModelLoadingService.class); | ||
| processManager = new AnalyticsProcessManager(client, executorServiceForJob, executorServiceForProcess, processFactory, auditor, | ||
| trainedModelProvider, modelLoadingService, resultsPersisterService, 1); | ||
| processManager = new AnalyticsProcessManager(Settings.builder().build(), client, executorServiceForJob, executorServiceForProcess, |
There was a problem hiding this comment.
Would Settings.EMPTY work here as well?
There was a problem hiding this comment.
I wanted to change those and forgot. Thanks for spotting!
|
|
||
| verifyNoMoreInteractions(resultsPersisterService); | ||
| List<BulkRequest> capturedBulkRequests = bulkRequestCaptor.getAllValues(); | ||
| assertThat(capturedBulkRequests.size(), equalTo(1)); |
There was a problem hiding this comment.
hasSize matcher could be used here instead.
|
|
||
| public class MlBulkIndexerTests extends ESTestCase { | ||
|
|
||
| private List<BulkRequest> executedBulkRequests = new ArrayList<>(); |
There was a problem hiding this comment.
This is getting reinitialised for each test so it cannot be final.
| * that do exceed a 1000 operations or half the available memory | ||
| * limit for indexing. | ||
| */ | ||
| public class MlBulkIndexer implements AutoCloseable { |
There was a problem hiding this comment.
There is no ML-specific code in this class. Would it make sense to rename it to BulkIndexer?
There was a problem hiding this comment.
I wanted to avoid calling it plainly BulkIndexer as it sounds too much like a basic ES component. I decided to prefix with Ml to indicate this is a utility designed to serve purposes within the ML plugin. If we end up using it in other places then we shall move it and rename it accordingly.
|
@elasticmachine update branch |
Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in elastic#58885. Backport of elastic#60219
…0283) Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in #58885. Backport of #60219
…0284) Data frame analytics jobs that work with very large datasets may produce bulk requests that are over the memory limit for indexing. This commit adds a helper class that bundles index requests in bulk requests that steer away from the memory limit. We then use this class both from the results joiner and the inference runner ensuring data frame analytics jobs do not generate bulk requests that are too large. Note the limit was implemented in #58885. Backport of #60219
Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.
Note the limit was implemented in #58885.