-
Notifications
You must be signed in to change notification settings - Fork 186
go/adbc/driver/flightsql: Default Value (10 MB) For adbc.snowflake.rpc.ingest_target_file_size Not Used In 1.1.0 #1997
Description
What happened?
Jobs calling adbc_ingestion() failed due to memory error. Upon checking, the data were split into {number of processor} parquet files, instead of those of ~10MB like 1.0.0.
Stack Trace
adbc_driver_manager.InternalError: INTERNAL: unknown error type: cannot allocate memory
cursor.adbc_ingest(table, data, mode)
File "/usr/local/lib/python3.12/site-packages/adbc_driver_manager/dbapi.py", line 937, in adbc_ingest
return _blocking_call(self._stmt.execute_update, (), {}, self._stmt.cancel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "adbc_driver_manager/_lib.pyx", line 1569, in adbc_driver_manager._lib._blocking_call_impl
File "adbc_driver_manager/_lib.pyx", line 1562, in adbc_driver_manager._lib._blocking_call_impl
File "adbc_driver_manager/_lib.pyx", line 1295, in adbc_driver_manager._lib.AdbcStatement.execute_update
File "adbc_driver_manager/_lib.pyx", line 260, in adbc_driver_manager._lib.check_error
How can we reproduce the bug?
Unfortunately, I cannot share the data. However, it should be observed that in a four core VM, a dataset of moderate size (like 500 MB in parquet file size) will be split into four ~125MB files when adbc_ingest() is called to upload to Snowflake instead of fifty ~10MB files.
Environment/Setup
Packages:
adbc-driver-manager==1.1.0
adbc-driver-snowflake==1.1.0
Operating system: Windows/Linux
Package manager: pip