EQL: Add circuit breaker to sampling#87545
EQL: Add circuit breaker to sampling#87545luigidellaquila merged 7 commits intoelastic:feature/eql_samplesfrom
Conversation
astefan
left a comment
There was a problem hiding this comment.
I'm wondering if another code line is a better place for calculating the memory usage. Added a comment related to that.
| if (match != null) { | ||
| finalSamples.add(match); | ||
| samples.add(new Sample(sampleKeys.get(responseIndex / maxCriteria), match)); | ||
| addSample(new Sample(sampleKeys.get(responseIndex / maxCriteria), match)); |
There was a problem hiding this comment.
This may not be the best place to compute the memory usage... At this point in code, one Sample is found and added to the final results list. I'm worried that the algorithm will run multiple cycles, not finding any samples, but still use memory to go over pages and pages of results.
There was a problem hiding this comment.
This is the only place where a Page is added to the stack. The page size depends on how many join keys are returned and its max size is configurable (defaults to 1000). I'm wondering if this shouldn't be a better place for calculating the memory usage. We add a new page elements counter and every 1000 (?) new elements in pages, we calculate the memory usage.
There was a problem hiding this comment.
Yes, it makes sense, I'm adding a memory check there as well.
As a side effect, the page size becomes a constraint for circuit breaker precision (ie. the bigger the pages, the less precise the circuit breaker becomes), limiting our ability of fine tuning.
To mitigate this problem, probably it's worth having the memory check both on stack push and on sample add (the samples collection is monotonically growing and potentially unlimited, so it's a hotspot for memory usage)
costin
left a comment
There was a problem hiding this comment.
I agree that computing memory for every sample is expensive however I'm not sure why there are two sampling sizes.
I would pick only one strategy and see how well it works, for example pick one of:
a. check the memory at the end of every listener call; essentially after the results have been parsed
b. check the memory every X samples added to the stack
| protected void pushToStack(Page nextPage) { | ||
| stack.push(nextPage); | ||
| totalItemsAddedToStack += nextPage.size; | ||
| if ((totalItemsAddedToStack - lastStackSizeWhenCalculatedMemory) / CB_STACK_SIZE_PRECISION > 0) { |
There was a problem hiding this comment.
No need to do division just substraction since this checks if a threshold was met, it doesn't matter by how much:
if (totalItemsAddedToStack - lastStackSizeWhenCalculatedMemory >= CB_STACK_SIZE_PRECISION)...
| private long stackRamBytesUsed = 0; | ||
| private long totalRamBytesUsed = 0; | ||
| private long totalItemsAddedToStack = 0; | ||
| private long lastStackSizeWhenCalculatedMemory = 0; |
There was a problem hiding this comment.
Despite the long name I find the variable name non-descriptive. How about: samplesCount and lastSampleCount?
There was a problem hiding this comment.
It's not really samples count (it's intermediate data, not samples), but I see your point.
Probably we can go with totalPageSize and lastTotalPageSize, WDYT?
|
Thanks @costin, I think your comment (better having only one strategy to about memory calculation) together with what @astefan suggested (use the stack to decide when to execute the memory check) leads us to a simplification: I'm removing the check on the samples for now, and leaving only the one on the stack insertions. |
- simplify circuit breaker logic - variable naming
|
|
||
| int initialSize = samples.size(); | ||
| client.multiQuery(searches, ActionListener.wrap(r -> { | ||
| List<List<SearchHit>> finalSamples = new ArrayList<>(); |
| private long samplesRamBytesUsed = 0; | ||
| private long stackRamBytesUsed = 0; | ||
| private long totalRamBytesUsed = 0; | ||
| /** |
There was a problem hiding this comment.
I guess this might be a practice by now, with ES's heap pages being locked in RAM, but just xxxMemSize might have been a bit less assuming of where that memory is.
|
|
||
| private void testMemoryCleared(boolean fail) { | ||
| List<BreakerSettings> eqlBreakerSettings = Collections.singletonList( | ||
| new BreakerSettings( |
There was a problem hiding this comment.
Wondering if it'd be worth making EqlPlugin#getCircuitBreaker() static and feed it Settings.EMPTY to get this value.
There was a problem hiding this comment.
Good point, having a helper method for that in the test suite would save some code.
Unfortunately we cannot make EqlPlugin#getCircuitBreaker() static though, since it's part of CircuitBreakerPlugin interface contract...
| ESMockClient esClient = new ESMockClient(service.getBreaker(CIRCUIT_BREAKER_NAME)); | ||
| ) { | ||
| CircuitBreaker eqlCircuitBreaker = service.getBreaker(CIRCUIT_BREAKER_NAME); | ||
| EqlConfiguration eqlConfiguration = new EqlConfiguration( |
There was a problem hiding this comment.
Also here, wondering if it'd be worth adding a EqlTestUtils method to configure a configuration, since most of this is contained in the TEST_CFG.
There was a problem hiding this comment.
I think that's what EqlTestUtils#randomConfiguration() already does, actually. Double-checking to see if there are significant differences, but I guess we can just use that.
| Sample sample = new Sample(new SequenceKey(randomAlphaOfLength(10)), searchHits); | ||
| return sample; | ||
| } |
There was a problem hiding this comment.
"Local variable 'sample' is redundant"
|
|
||
| @Override | ||
| public void readBytes(byte[] b, int offset, int len) throws IOException { | ||
|
|
There was a problem hiding this comment.
Nit: could these empty methods be collapsed on one line?
There was a problem hiding this comment.
Unfortunately our style check rules complain about that
astefan
left a comment
There was a problem hiding this comment.
LGTM
Left two minor comments.
| /** | ||
| * total number of hits (ie. sum of page sizes) added to the stack when last memory check was executed | ||
| */ | ||
| private long lastTotalPageSize = 0; |
There was a problem hiding this comment.
previousTotalPageSize is a better name?
|
|
||
| CIRCUIT_BREAKER.startBreaking(); | ||
| iterator.pushToStack(new SampleIterator.Page(CB_STACK_SIZE_PRECISION / 2)); | ||
| iterator.pushToStack(new SampleIterator.Page(CB_STACK_SIZE_PRECISION - (CB_STACK_SIZE_PRECISION / 2) - 1)); |
There was a problem hiding this comment.
CB_STACK_SIZE_PRECISION - (CB_STACK_SIZE_PRECISION / 2) meaning CB_STACK_SIZE_PRECISION / 2. Why showing it like that? I think it's relevant enough that the first push adds half of the limit, the second push adds another half minus one, and the last step it's triggering the circuit breaker.
There was a problem hiding this comment.
I guess that's due to rounding (should the constant be changed to an odd number). But yes, I also thought it could be simplified (smth like a push with CB_STACK_SIZE_PRECISION, then +1).
There was a problem hiding this comment.
Not really needed indeed, I'm simplifying it
No description provided.