fix(go/adbc/driver/snowflake): Records dropped on ingestion when empty batch is present#1866
Conversation
|
If you change line 344 of bulk_ingestion.go from |
|
|
||
| "github.com/apache/arrow-adbc/go/adbc" | ||
| "github.com/apache/arrow-adbc/go/adbc/driver/internal" | ||
| "github.com/apache/arrow-adbc/go/adbc/driver/snowflake" |
There was a problem hiding this comment.
this is already imported as driver on the line below this, that's why the CI is failing
There was a problem hiding this comment.
Thanks @zeroshade. I had fixed that locally but have been dealing with another issue causing tests to fail. The bufferedwriter was tripping the checkedallocator test whereas the regular writer previously wasn't.
I was able to plug the original leak and have this all working but it will require a separate PR to arrow which I will push up. I did slightly expand scope here however because as a part of the testing I enabled the checked allocator for all snowflake tests and have been chasing down a few unrelated issues as well. I'll push up some commits so you can see exactly what I mean.
There was a problem hiding this comment.
thanks for chasing all this down
There was a problem hiding this comment.
Most tests should pass once apache/arrow#41698 is merged and pulled in.
There is still just one test remaining with a memory leak, which is TestNonIntDecimalLowPrecision. It seems rely on compute kernels to cast decimal values to floats. This involves buffer preallocation in some cases which doesn't appear to get released properly. I haven't gotten to the bottom of this yet but if anyone else is more familiar with it or has the time, I could use a second set of eyes.
There was a problem hiding this comment.
Looks like all the tests passed on the last run. Do you still need a second set of eyes or is this ready?
There was a problem hiding this comment.
Hmm, running locally all the failures cleared up except for the one I mentioned above. I took a look and it seems Github Actions is skipping the snowflake tests because the SNOWFLAKE_URI is not defined.
I'll have some time to look at it today but a second set of eyes would still be a great help.
There was a problem hiding this comment.
Ok I found the issue and pushed up the fix. We were missing a Release on a fairly narrow code path for certain precisions. All snowflake tests pass locally for me now.
| } | ||
|
|
||
| func (suite *SnowflakeTests) SetupSuite() { | ||
| func (suite *SnowflakeTests) SetupTest() { |
There was a problem hiding this comment.
why the switch from SetupSuite (once for all the tests) to SetupTest (once for each test)?
There was a problem hiding this comment.
This allows all tests to use the SetupDriver and TeardownDriver methods defined on the DriverQuirks. This came to my attention because only the tests in validation.go which were using Setup/TeardownDriver caught the memory leak. This allows us to have the same test setup for the driver tests as well. Without this, the driver was getting created with one checked allocator and each test would create another one, preventing a full view of memory allocations.
zeroshade
left a comment
There was a problem hiding this comment.
In general this looks good to me, just a few nitpicks.
Reproduces and fixes: #1847
Parquet files with empty row groups are valid per the spec, but Snowflake does not currently handle them properly. To mitigate this we buffer writes to the parquet file so that a row group is not written until some amount of data has been received.
The CheckedAllocator was enabled for all tests as part of this fix, which detected a leak in the BufferWriter that was fixed in: apache/arrow#41698.
There was an unrelated test failure that surfaced once the CheckedAllocator was enabled which had to do with casting decimals of certain precision. The fix is included in this PR as well.