Skip to content

fastparquet 0.8.2 causing failures #9424

@jsignell

Description

@jsignell

@martindurant there are failures cropping up in tests. Looks like 0.8.2 just made it to conda-forge an hour or so ago

=================================== FAILURES ===================================
__________________________ test_parquet[fastparquet] ___________________________
[gw1] linux -- Python 3.9.13 /usr/share/miniconda3/envs/test-environment/bin/python

engine = 'fastparquet'

    @pytest.mark.network
    @pytest.mark.parametrize("engine", ("pyarrow", "fastparquet"))
    def test_parquet(engine):
        pytest.importorskip("requests", minversion="2.21.0")
        dd = pytest.importorskip("dask.dataframe")
        pytest.importorskip(engine)
>       df = dd.read_parquet(
            [
                "https://github.com/Parquet/parquet-compatibility/raw/"
                "master/parquet-testdata/impala/1.1.1-NONE/"
                "nation.impala.parquet"
            ],
            engine=engine,
        ).compute()

dask/bytes/tests/test_http.py:179: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dask/dataframe/io/parquet/core.py:471: in read_parquet
    read_metadata_result = engine.read_metadata(
dask/dataframe/io/parquet/fastparquet.py:835: in read_metadata
    dataset_info = cls._collect_dataset_info(
dask/dataframe/io/parquet/fastparquet.py:469: in _collect_dataset_info
    pf = ParquetFile(
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/fastparquet/api.py:123: in __init__
    writer.consolidate_categories(fmd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

fmd = column_orders: null
created_by: b'impala version 1.2-INTERNAL (build a462ec42e550c75fccbff98c720f37f3ee9d55a3)'
encryp...ent
  num_children: null
  precision: null
  repetition_type: 1
  scale: null
  type: 6
  type_length: null
version: 1


    def consolidate_categories(fmd):
>       key_value = [k for k in fmd.key_value_metadata
                     if k.key == b'pandas']
E       TypeError: 'NoneType' object is not iterable

/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/fastparquet/writer.py:1331: TypeError

ref: https://github.com/dask/dask/runs/7998537202?check_suite_focus=true

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions