Skip to content

Shuffle spilled_bytes metric is incorrect #1437

@andygrove

Description

@andygrove

Describe the bug

Example:

CometExchange

shuffle records written: 65,254,713
number of spills: 17,160
spilled bytes: 16,134,291,652,608
shuffle bytes written total (min, med, max )
5.6 GiB (2.1 MiB, 8.7 MiB, 9.2 MiB )

The number of spilled bytes should never substantially exceed the shuffle bytes written. In the above example we are reporting that 16 TB was spilled for final output of 5.6 GB (compressed).

In ShuffleWriterExec, we are writing incorrect data for spilled_bytes. We are adding the size of the current memory reservation rather than the number of bytes written to the shuffle file.

        let offsets = spill_into(
            &mut self.buffered_partitions,
            spillfile.path(),
            self.num_output_partitions,
            &self.metrics,
        )?;

        let mut spills = self.spills.lock().await;
        let used = self.reservation.size();
        self.metrics.spill_count.add(1);
        self.metrics.spilled_bytes.add(used);

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions