-
Notifications
You must be signed in to change notification settings - Fork 421
Closed
Description
Apache Iceberg version
0.7.0 (latest release)
Please describe the bug 🐞
When doing an override with an specific overwrite_filter sometimes I'm getting a division by zero error coming from:
iceberg-python/pyiceberg/io/pyarrow.py
Line 2263 in 255e527
| avg_row_size_bytes = tbl.nbytes / tbl.num_rows |
I cannot confirm but I think this happens only when your override filter matches all the rows in the table.
This is the code I'm using:
overwrite_filter = In(primary_key, df[primary_key].to_list())
self.table.overwrite(arrow_df, overwrite_filter=overwrite_filter)
And the traceback:
self.table.overwrite(arrow_df, overwrite_filter=overwrite_filter)
File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 1598, in overwrite
tx.overwrite(df=df, overwrite_filter=overwrite_filter, snapshot_properties=snapshot_properties)
File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 554, in overwrite
self.delete(delete_filter=overwrite_filter, snapshot_properties=snapshot_properties)
File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 628, in delete
list(
File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py", line 2353, in _dataframe_to_data_files
for batches in bin_pack_arrow_table(df, target_file_size)
File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py", line 2203, in bin_pack_arrow_table
avg_row_size_bytes = tbl.nbytes / tbl.num_rows
ZeroDivisionError: division by zero
I'll update this soon with more information since I cannot share the data used here. I'll try to generate some dummy datasets to reproduce the bug.
Metadata
Metadata
Assignees
Labels
No labels