Make Releasing Aggregation Buffers Safer#76741
Make Releasing Aggregation Buffers Safer#76741original-brownbear merged 3 commits intoelastic:masterfrom original-brownbear:aggs-buffer-leaks
Conversation
Fixing a few possible leaks and making the logic more obviously correct.
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
| runningTask.compareAndSet(task, null); | ||
| onPartialMergeFailure.accept(exc); | ||
| List<MergeTask> toCancels = new ArrayList<>(); | ||
| final MergeTask task = runningTask.getAndSet(null); |
There was a problem hiding this comment.
This one seems to me to be the most likely place where something could go wrong and we should really guard against a cancel throwing in the loop.
| } | ||
| success = true; | ||
| } finally { | ||
| if (success == false) { |
There was a problem hiding this comment.
This whole thing is admittedly a little questionable, but IMO we shouldn't have an obvious leak during serialization bugs (that could always be triggered by some other bug in network buffer handling or so).
| } | ||
| success = true; | ||
| } finally { | ||
| if (success == false) { |
There was a problem hiding this comment.
I think if we want to clean the aggregations only in case of an error, wouldn't it be better to do that in a catch block, instead of creating an additional variable and tracking its status in finally?
There was a problem hiding this comment.
I figured this was the approach we used in most other places and it helps in so far (relative to catching) that we wouldn't leak here if we hit an error (most likely by tripping an assertion) and thus make test debugging easier.
There was a problem hiding this comment.
Just to add that not only does this pattern deal with the Error case without having to catch Throwable but it's also robust against future changes that introduce an early exit somewhere (e.g. a try-catch-log-and-return with smaller scope). Seems unlikely here but it's a good pattern anyway, I much prefer to signal the change of ownership explicitly like this than to do it implicitly in an exception handler.
| toCancels.add(task); | ||
| toCancels.add(task::cancel); | ||
| } | ||
| for (MergeTask mergeTask : queue) { |
There was a problem hiding this comment.
I was wondering if we could clear the queue and iterate it one go here with
while (queue.isEmpty() == false){
toCancels.add(queue.removeFirst()::cancel);
}There was a problem hiding this comment.
Sure saves a line :) I suppose we can even just pollFirst in a loop :) Will do.
| onPartialMergeFailure.accept(exc); | ||
| List<MergeTask> toCancels = new ArrayList<>(); | ||
| final MergeTask task = runningTask.getAndSet(null); | ||
| final List<Releasable> toCancels = new ArrayList<>(); |
There was a problem hiding this comment.
Does it make sense to name the list releasables instead of toCancels? It sounds to me more logical because its consists from elements that implement Relesable not Closeable
There was a problem hiding this comment.
Not sure, left the name as is for now just because it's mostly cancelling here, probably doesn't change a whole lot right? :)
server/src/main/java/org/elasticsearch/action/search/QueryPhaseResultConsumer.java
Outdated
Show resolved
Hide resolved
|
General comment: I think it would be nice to have additional tests in |
|
I tried looking at this when I woke up this morning at 6:30 and just couldn't. QueryPhaseController needs full awake, I guess. I've added this to my list to look at too. I know things about QueryPhaseController - but a lot of what I know is "don't trust your instincts, it's always more complicated than you think". |
nik9000
left a comment
There was a problem hiding this comment.
It's a little more paranoia. I like that.
Is it worth adding a unit test that we release things in the way we expect? My instinct is "yes", but if the unit it test is totally unreal it'd be more of pain than it's worth. Not sure. Depends on the mocking load, I guess.
|
Thanks Artem + Nik!
I'll try to come up with a clean follow-up that stresses the error handling of this logic a little later in the week (can't do it quickly here I'm afraid, the mocking at least with my level of understanding of this code will take a little). I think you two are def. right that we should have coverage though. |
* master: (21 commits) [Test] More robust assertions for sorting and pagination (elastic#76654) [Test] Fix filename check on Windows (elastic#76807) Upgrade build scan plugin to 3.6.4 (elastic#76784) Remove keystore initial_md5sum (elastic#76835) Don't export docker images on assemble (elastic#76817) Fix testMasterStatsOnSuccessfulUpdate (elastic#76844) AwaitsFix for elastic#76840 Make Releasing Aggregation Buffers Safer (elastic#76741) Re-enable BWC tests after backport of elastic#76771 (elastic#76839) Dispatch large bulk requests to write thread (elastic#76736) Disable BWC tests for elastic#76771 Pull down beats artifacts when performing release tests Add timing stats to publication process (elastic#76771) Fix BanFailureLoggingTests some more (elastic#76668) Mention "warn threshold" in master service slowlog (elastic#76815) Fix DockerTests.test010Install Re-enable tests affected by elastic#75097 (elastic#76814) Fix testRecoveryIsCancelledAfterDeletingTheIndex (elastic#76644) Test fix -WildcardFieldMapperTests bad test data. (elastic#76819) Updating supported version after backporting the feature (elastic#76794) ... # Conflicts: # server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
Fixing a few possible leaks and making the logic more obviously correct.
Did not label this >bug as non of these failure scenarios are confirmed.