Describe the bug, including details regarding any error messages, version, and platform.
Hi team,
I've been playing recently with the Go SQL Driver For DuckDB which uses the arrow code here for some functionalities. Specifically the RegisterView method here is making use of the arrow.cdata.ExportRecordReader() function.
The issue is that in my testing I found that this method is causing a pretty big memory leak that cannot be prevented without changes in arrow-go.
I was able to fix it by forking and doing changes both in this repo and go-duckdb, however as I'm new to arrow-go could you please help me understand if I'm missing something and my fixes (described below) are actually unnecessary or they are OK and I should contribute them!
Here is the sequence of events I observed when the memory leak is occurring:
- In a function...
- I create a new
array.RecordReader (it is created with refCount = 1)
- I pass that record reader to the
go-duckdb RegisterView() method and therefore to cdata.ExportRecordReader()
cdata.ExportRecordReader() internally does a Retain() on the array.RecordReader (thus making refCount = 2)
- some work is done
- The
release() function returned by the DuckDB method is called. It only frees the memory for cdata.CArrowArrayStream (I don't observe the refCount going down, so Release() is not called to compensate for the internal Retain())
- I call
Release() on the array.RecordReader to release its memory, but because the refCount was internally increased to 2, now it is 1 and its memory is not released
- then the function is called again and again and on every call a new
array.RecordReader is created whose memory is not released ... and so on and so on until I have millions of record readers in memory.
My initial solution was just to call Release() on the array.RecordReader one more time and that fixed the memory issue right away. However, this doesn't seem like a proper solution given that not my code did the extra Retain(). Therefore I attempted to do the change in go-duckdb but initially wasn't able to figure out how.
The only example of using the cdata.ExportRecordReader() where the memory is explicitly released is this test TestExportRecordReaderStreamLifetime here. It is using the non-exported C.ArrowArrayStreamRelease() function which I guess is somehow calling the streamRelease function . Therefore I went on to see what would happen if I added in to cdata a new ReleaseCArrowArrayStream() function only calling C.ArrowArrayStreamRelease() (similarly to the ReleaseCArrowArray here) and then calling that in the release() function for RegisterView() in go-duckdb.
This worked as I expected (no visible memory leak) however I'm still not sure if I and/or go-duckdb is using the cdata.ExportRecordReader() function incorrectly and all those code changes are unnecessary.
Please advise!
Thanks,
Kalin
Component(s)
Other
Describe the bug, including details regarding any error messages, version, and platform.
Hi team,
I've been playing recently with the Go SQL Driver For DuckDB which uses the arrow code here for some functionalities. Specifically the RegisterView method here is making use of the
arrow.cdata.ExportRecordReader()function.The issue is that in my testing I found that this method is causing a pretty big memory leak that cannot be prevented without changes in
arrow-go.I was able to fix it by forking and doing changes both in this repo and
go-duckdb, however as I'm new toarrow-gocould you please help me understand if I'm missing something and my fixes (described below) are actually unnecessary or they are OK and I should contribute them!Here is the sequence of events I observed when the memory leak is occurring:
array.RecordReader(it is created withrefCount = 1)go-duckdbRegisterView()method and therefore tocdata.ExportRecordReader()cdata.ExportRecordReader()internally does aRetain()on thearray.RecordReader(thus makingrefCount = 2)release()function returned by the DuckDB method is called. It only frees the memory forcdata.CArrowArrayStream(I don't observe therefCountgoing down, soRelease()is not called to compensate for the internalRetain())Release()on thearray.RecordReaderto release its memory, but because therefCountwas internally increased to 2, now it is 1 and its memory is not releasedarray.RecordReaderis created whose memory is not released ... and so on and so on until I have millions of record readers in memory.My initial solution was just to call
Release()on thearray.RecordReaderone more time and that fixed the memory issue right away. However, this doesn't seem like a proper solution given that not my code did the extraRetain(). Therefore I attempted to do the change ingo-duckdbbut initially wasn't able to figure out how.The only example of using the
cdata.ExportRecordReader()where the memory is explicitly released is this test TestExportRecordReaderStreamLifetime here. It is using the non-exportedC.ArrowArrayStreamRelease()function which I guess is somehow calling the streamRelease function . Therefore I went on to see what would happen if I added in tocdataa newReleaseCArrowArrayStream()function only callingC.ArrowArrayStreamRelease()(similarly to the ReleaseCArrowArray here) and then calling that in therelease()function forRegisterView()ingo-duckdb.This worked as I expected (no visible memory leak) however I'm still not sure if I and/or
go-duckdbis using thecdata.ExportRecordReader()function incorrectly and all those code changes are unnecessary.Please advise!
Thanks,
Kalin
Component(s)
Other