set_index with sorted=True can drop rows in dataframes with empty partitions

**What happened**: when setting a new index on a Dask DataFrame through the `set_index` method, in some cases of DataFrame with empty partitions, some rows are dropped in the process. This only seems to happen when using the argument  `sorted=True`.

**What you expected to happen**: `set_index` should not drop rows.

**Minimal Complete Verifiable Example**:

```python
import pandas as pd
import dask.dataframe as dd

# Create dataframe with empty partitions
data1 = dd.from_pandas(
    pd.DataFrame(
        index=[0, 1, 2],
        data={
            "datA": ["A1", "A2", "A3"],
            "datB": ["B1", "B2", "B3"],
            "datC": ["C1", "C2", "C3"],
            "new_index": [1, 2, 3]
        }
    ),
    npartitions=1
)
data_dummy = dd.from_pandas(  # These data will be removed to create empty partitions
    pd.DataFrame(
        index=[9999],
        data={
            "datA": ["xxx"],
            "datB": ["xxx"],
            "datC": ["xxx"],
            "new_index": [9999]
        }
    ),
    npartitions=1
)
data2 = dd.from_pandas(
    pd.DataFrame(
        index=[3],
        data={
            "datA": ["A4"],
            "datB": ["B4"],
            "datC": ["C4"],
            "new_index": [4]
        }
    ),
    npartitions=1
)
ddf = dd.concat([data1, data_dummy, data2, data_dummy])
ddf = ddf[ddf["datA"] != "xxx"]

# Set index, the last row gets dropped in the result
print(ddf.set_index("new_index", sorted=True).compute())
```

The test Dataframe `ddf` looks like

```
  datA datB datC  new_index
0   A1   B1   C1          1
1   A2   B2   C2          2
2   A3   B3   C3          3
4   A4   B4   C4          4
```

with partitions:
* Partition 0: indices `0`, `1`  and `2`
* Partition 1: empty
* Partition 2: index  `3`
* Partition 3: empty

After `set_index`, the resultant DataFrame is

```
          datA datB datC
new_index               
1           A1   B1   C1
2           A2   B2   C2
3           A3   B3   C3
```
where the last row has been dropped.

This does not happen if `sorted=True` is not used:

```python
print(ddf.set_index("new_index").compute())
```

```
          datA datB datC
new_index               
1           A1   B1   C1
2           A2   B2   C2
3           A3   B3   C3
4           A4   B4   C4
```

**Environment**:

- Dask version: 2022.01.0
- Python version: 3.9.7
- Operating System: Ubuntu 16.04.6 LTS
- Install method (conda, pip, source): pip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

set_index with sorted=True can drop rows in dataframes with empty partitions #8735

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

set_index with sorted=True can drop rows in dataframes with empty partitions #8735

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions