-
Notifications
You must be signed in to change notification settings - Fork 186
adbc_ingest() is dropping rows in Snowflake #1847
Description
What happened?
I'm trying to load 98 million rows from a set of CSV files (5 year period), but only 95 to 96 million rows are getting inserted into Snowflake uisng adbc_ingest.. The distribution of missing data is pretty random and is around ~16k records per day.
I tried passing to adbc_ingest(), a pyarrow table and record batches.. In both cases rows are being dropped..
Here's a screenshot of my notebook code..

The odd thing is that sometimes it inserts 95 million rows and other times it inserts 96 million rows.. The total sum of inserted rows matches what I'm seeing in Snowflake logs if I add up all the rows created using COPY INTO sql commands..
It looks like we're not sending all the batches across the wire..
How can we reproduce the bug?
No response
Environment/Setup
Python 3.9.10 on RedHat 8 linux with ADBC drivers 0.10.0..