fix: Fix various memory leaks problems#890
Conversation
| (0 until batch.numCols()).foreach { index => | ||
| batch.column(index) match { | ||
| case a: CometVector => | ||
| val valueVector = a.getValueVector | ||
|
|
||
| val provider = if (valueVector.getField.getDictionary != null) { | ||
| a.getDictionaryProvider | ||
| } else { | ||
| null | ||
| } | ||
|
|
There was a problem hiding this comment.
This is for moving the sanity check for column types prior to the actual construction of the Arrow C data structure. It is tricky to release already constructed FFI data structures before raising the exception.
| exportedVectors += arrowSchema.memoryAddress() | ||
| } | ||
|
|
||
| exportedVectors.toArray |
There was a problem hiding this comment.
You can return ExportedBatch without any above change.
There was a problem hiding this comment.
If you don't like the restructuring that moves the sanity checks, I can revert it to the original control flow.
There was a problem hiding this comment.
It makes sense for the reason https://github.com/apache/datafusion-comet/pull/890/files#r1736808325
| // The native executor should have moved the previous batch, it is safe for us to deallocate | ||
| // the ArrowSchema and ArrowArray base structures. |
There was a problem hiding this comment.
Not exactly. A native operator could possibly save batches internally.
There was a problem hiding this comment.
Yes, even though the native operator save batches internally, the batch would be moved to the native operator to be saved.
|
|
||
| out.flush() | ||
| out.close() | ||
| writer.close() |
There was a problem hiding this comment.
Closing the writer will close the dictionary provider. If the dictionary arrays are shared across batches, you will close them and empty later batches. I remember we hit the issue before.
There was a problem hiding this comment.
Hmm... I need to take further look at this to fully understand if this fix is correct or not.
There was a problem hiding this comment.
This should be correct. The dictionary provider held by the writer contains copied vectors, so closing them does not interfere with the rest parts of the system.
There was a problem hiding this comment.
I remember closing it will cause some errors in CI due the reason I mentioned. Let's see if CI can pass or not.
There was a problem hiding this comment.
The CI passes on all commits of this PR, it has run 3 rounds with no problem.
There was a problem hiding this comment.
Yea, maybe there was some other changes before. Anyway it is good to close the writer without issue.
| def close(): Unit = { | ||
| arrowSchemas.foreach { schema => | ||
| val snapshot = schema.snapshot | ||
| if (snapshot.release != 0) schema.release() |
There was a problem hiding this comment.
I don't think we should call release here. The release of exported array should be done when native side drops the imported array.
There was a problem hiding this comment.
Yes. I did this in case the native side not move away the array (by mistake maybe). This could be removed if the native side always move the array.
There was a problem hiding this comment.
Yea, I took another look. Native side moved them.
| arrowArrays.foreach { array => | ||
| val snapshot = array.snapshot | ||
| if (snapshot.release != 0) array.release() | ||
| array.close() |
There was a problem hiding this comment.
It makes sense to close the internal ArrowBuf of ArrowArray and ArrowSchema. Good catch.
83230a9 to
a0c46ee
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #890 +/- ##
============================================
- Coverage 55.16% 55.01% -0.15%
+ Complexity 857 854 -3
============================================
Files 109 110 +1
Lines 10542 10592 +50
Branches 2010 2020 +10
============================================
+ Hits 5815 5827 +12
- Misses 3714 3750 +36
- Partials 1013 1015 +2 ☔ View full report in Codecov by Sentry. |
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
|
@Kontinuation I revised the current approach in #893. So the ArrowSchema and ArrowArray are allocated in native. By doing that, I think we don't need to release the JVM structures. |
Yes. They are allocated by the native side in #893, so they should be released by the native side accordingly. |
|
Thank you @Kontinuation I will merge this first. And in #893, I will remove |
* Try to fix the JVM Unsafe memory leak issue * Fixed leaks when AQE coalesce partitions is enabled * Fixes according to reviewer's comments * Update spark/src/main/java/org/apache/comet/CometBatchIterator.java Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> --------- Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Which issue does this PR close?
Closes #884.
Rationale for this change
Please refer to the comments in #884 for details.
What changes are included in this PR?
ArrowSchemaandArrowArraybase structure leaks in JVMHow are these changes tested?
It is pretty hard to add tests for this fix, so we manually tested this and relying on existing tests to make sure that it does not break anything.