Skip to content

Overwrite with filter division by zero error #1020

@Minfante377

Description

@Minfante377

Apache Iceberg version

0.7.0 (latest release)

Please describe the bug 🐞

When doing an override with an specific overwrite_filter sometimes I'm getting a division by zero error coming from:

avg_row_size_bytes = tbl.nbytes / tbl.num_rows

I cannot confirm but I think this happens only when your override filter matches all the rows in the table.
This is the code I'm using:

        overwrite_filter = In(primary_key, df[primary_key].to_list())
        self.table.overwrite(arrow_df, overwrite_filter=overwrite_filter)

And the traceback:

    self.table.overwrite(arrow_df, overwrite_filter=overwrite_filter)
  File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 1598, in overwrite
    tx.overwrite(df=df, overwrite_filter=overwrite_filter, snapshot_properties=snapshot_properties)
  File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 554, in overwrite
    self.delete(delete_filter=overwrite_filter, snapshot_properties=snapshot_properties)
  File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 628, in delete
    list(
  File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py", line 2353, in _dataframe_to_data_files
    for batches in bin_pack_arrow_table(df, target_file_size)
  File "/.pyenv/versions/3.9.16/envs/datalake/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py", line 2203, in bin_pack_arrow_table
    avg_row_size_bytes = tbl.nbytes / tbl.num_rows
ZeroDivisionError: division by zero

I'll update this soon with more information since I cannot share the data used here. I'll try to generate some dummy datasets to reproduce the bug.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions