Improve IMPORT performance for larger unsorted data sets of

**Is your feature request related to a problem? Please describe.**

When attempting to run IMPORT using un-sorted Avro data, the IMPORT runs very slowly for larger data sets (e.g., > 1 TiB). 

For example, running IMPORT on a 16 x 16vCPU cluster with 64 x 24 GiB un-sorted Avro files (1.5 TiB) takes about 30 hours, which averages ~2.8 MiB/node/sec.

Smaller IMPORTs, such as running IMPORT on a 10 x 16vCPU cluster with 20 x 5.7 GiB un-sorted Avro files (114 GiB) takes about 35 minutes, which average ~17 MiB/node/sec .

**Describe the solution you'd like**

Ideally, there would be higher throughput and predictable scaling for larger un-sorted Avro data sets.

**Describe alternatives you've considered**

Sorting the Avro data within and across files was considered but does not fit into the existing workflow / datasource.

**Additional context**

https://github.com/cockroachlabs/support/issues/1464

Jira issue: CRDB-14945

---

### Work in progress

- [x] https://github.com/cockroachdb/cockroach/pull/79967
  - [x] https://github.com/cockroachdb/cockroach/pull/80386
- [x] https://github.com/cockroachdb/cockroach/pull/81062
  - [x] https://github.com/cockroachdb/cockroach/pull/82746

#### Investigations & prior art

- https://github.com/cockroachdb/cockroach/pull/73514 (mostly reverted below, considering resuscitating)
- https://github.com/cockroachdb/cockroach/pull/73981 (mostly reverts above)

Epic CRDB-16237

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve IMPORT performance for larger unsorted data sets of #79615

Work in progress

Investigations & prior art

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve IMPORT performance for larger unsorted data sets of #79615

Description

Work in progress

Investigations & prior art

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions