Skip to content

[C++][Python][Parquet] pyarrow.lib.ArrowInvalid: Invalid number of indices: 0 when reading a parquet file #47981

@TomAugspurger

Description

@TomAugspurger

Describe the bug, including details regarding any error messages, version, and platform.

Something about this parquet file form https://github.com/Parquet/parquet-compatibility/ causes an exception while reading with pyarrow 22.0.0:

import urllib.request
import pathlib
import pyarrow.parquet as pq

p = pathlib.Path("nation.impala.parquet")
if not p.exists():
    urllib.request.urlretrieve(
        "https://github.com/Parquet/parquet-compatibility/raw/master/parquet-testdata/impala/1.1.1-NONE/nation.impala.parquet",
        p
    )

pq.read_table(p)

which raises with

Traceback (most recent call last):
  File "/Users/toaugspurger/gh/dask/dask/bug.py", line 12, in <module>
    pq.read_table(p)
  File "/Users/toaugspurger/gh/dask/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1899, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/toaugspurger/gh/dask/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1538, in read
    table = self._dataset.to_table(
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/_dataset.pyx", line 589, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 3969, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Invalid number of indices: 0

pyarrow 21.0.0 was able to read that file.

Component(s)

Parquet, Python

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions