read_parquet with filters and arrow engine returns index in the columns

**What happened**:

When using `dd.read_parquet` with `filters`, the original index ends up in the columns

**What you expected to happen**:

**Minimal Complete Verifiable Example**:

```python
import numpy as np
import pandas as pd
import dask.dataframe as dd
import dask.datasets
import shutil

N = 1_000
P = 4
b = np.arange(P).repeat(N // P)

df = pd.DataFrame({
    "A": np.random.random(size=N),
    "B": pd.Categorical(b)
})
ddf = dd.from_pandas(df, P)

shutil.rmtree("parquet-filters.parquet", ignore_errors=True)
ddf.to_parquet("parquet-filters.parquet", partition_on="B", engine="pyarrow")

dd.read_parquet("parquet-filters.parquet", engine="pyarrow", filters=[[("B", "=", 1)]]).compute()
```

```python
Out[1]:
            A  index  B
0    0.410034    250  1
1    0.313085    251  1
2    0.848964    252  1
3    0.910770    253  1
4    0.922110    254  1
..        ...    ... ..
245  0.028301    495  1
246  0.392158    496  1
247  0.759204    497  1
248  0.122554    498  1
249  0.339143    499  1

[250 rows x 3 columns]
```

**Anything else we need to know?**:

Haven't debugged yet, but for the fastparquet engine the index is in the index, but the outupt is not filtered

```python
   ...: ddf.to_parquet("parquet-filters.parquet", partition_on="B", engine="fastparquet")
   ...:
   ...: dd.read_parquet("parquet-filters.parquet", engine="fastparquet", filters=[[("B", "=", 1)]]).compute()

              A  B
index
0      0.304616  0
1      0.626666  0
2      0.986920  0
3      0.585114  0
4      0.663830  0
...         ... ..
995    0.999620  3
996    0.223592  3
997    0.796024  3
998    0.063832  3
999    0.621195  3

[1000 rows x 2 columns]
```

**Environment**:

- Dask version: master
- Python version: 3.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

read_parquet with filters and arrow engine returns index in the columns #6277

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

read_parquet with filters and arrow engine returns index in the columns #6277

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions