[C++][Python] Segfault when calling groupby on table with misaligned chunks

### Describe the bug, including details regarding any error messages, version, and platform.


- I have a table with ~100k records and 4 chunks of various sizes `[32768, 32768, 29599, 16692]`
- key1 has got high cardinality (30k)
- key2 has got low cardinality (10)


| key1       | key2    |     value |
|:-----------|:--------|----------:|
| KEY1_12226 | KEY1_08 | 0.348599  |
| KEY1_10214 | KEY1_08 | 0.954173  |
| KEY1_26821 | KEY1_09 | 0.416615  |
| KEY1_24557 | KEY1_06 | 0.883226  |
| KEY1_27823 | KEY1_08 | 0.0127225 |

I'm trying to do a groupby on key1, key2 to get the sum of value. It works fine in general. But when I preprocess the data I misalign the chunks in my table and it fails.

```
import numpy as np
import pyarrow as pa

KEYS_1 = [f"KEY1_{i:05d}" for i in range(30_000)]
KEYS_2 = [f"KEY1_{i:02d}" for i in range(10)]
SIDE = ["LEFT", "RIGHT"]


def generate_table(sizes):
    batches = [
        pa.record_batch(
            [
                np.random.choice(KEYS_1, size),
                np.random.choice(KEYS_2, size),
                np.random.rand(size),
            ],
            ["key1", "key2", "value"],
        )
        for size in sizes
    ]
    return pa.Table.from_batches(batches)


table = generate_table([32768, 32768, 29599, 16692])

# This works well:
pa.TableGroupBy(table, ["key1", "key2"]).aggregate(
    [
        ["value", "sum"],
    ]
)

# This misaligns the chunks
table = table.set_column(
    table.schema.get_field_index("value"),
    "value",
    table["value"].combine_chunks(),
)

print("HERE")
pa.TableGroupBy(table, ["key1", "key2"]).aggregate(
    [
        ["value", "sum"],
    ]
)  # segfault :-(
print("NEVER THERE")

```

It took me a while to go to the bottom of the problem. The size of the chunks and the cardinality of the keys seem to play an important factor in weather it fails or not.

The short term solution is for me to call combine_chunks before the groupby.

This is on pyarrow 11.0.0, python 3.9

### Component(s)

Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++][Python] Segfault when calling groupby on table with misaligned chunks #34238

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

key1	key2	value
KEY1_12226	KEY1_08	0.348599
KEY1_10214	KEY1_08	0.954173
KEY1_26821	KEY1_09	0.416615
KEY1_24557	KEY1_06	0.883226
KEY1_27823	KEY1_08	0.0127225

[C++][Python] Segfault when calling groupby on table with misaligned chunks #34238

Description

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions