Skip to content

Improve data density visualization by sampling dense chunks#11766

Merged
oxkitsune merged 8 commits into
mainfrom
gijs/sample-from-chunk
Nov 4, 2025
Merged

Improve data density visualization by sampling dense chunks#11766
oxkitsune merged 8 commits into
mainfrom
gijs/sample-from-chunk

Conversation

@oxkitsune

@oxkitsune oxkitsune commented Nov 3, 2025

Copy link
Copy Markdown
Member

Related

What

Rendering the data density graph now uniformly samples rows from chunks with many rows, instead of skipping them.
This does not solve the problem and there's a better, albeit more involved solution discussed in #7200.

Benchmarks (M3 Pro):

For chunks with 5k (around the threshold) I get:

// 5k
sampling/5000/sample_0        time:   [2.1716 µs 2.2432 µs 2.3375 µs]
sampling/5000/sample_4000 time:   [57.836 µs 57.937 µs 58.033 µs]
sampling/5000/sample_8000 time:   [62.420 µs 68.867 µs 76.966 µs]

So around the threshold, we see that for 5k rows with max_sampled_events_per_chunk=4000, we sample 4k events, which is actually faster than sampling the full chunk at max_sampled_events_per_chunk=8000.

For chunks with 20k-100k rows I get the following:

// 20k
sampling/20000/sample_0 time:           [2.1293 µs 2.1467 µs 2.1631 µs]
sampling/20000/sample_4000 time:    [130.47 µs 131.00 µs 131.39 µs]
sampling/20000/sample_8000 time:    [159.53 µs 159.68 µs 159.78 µs]

// 50k
sampling/50000/sample_0        time:    [2.1483 µs 2.2347 µs 2.3770 µs]
sampling/50000/sample_4000 time:    [278.60 µs 279.56 µs 281.00 µs]
sampling/50000/sample_8000 time:    [307.62 µs 317.30 µs 345.24 µs]

// 100k
sampling/100000/sample_0        time:   [2.1437 µs 2.2881 µs 2.6224 µs]
sampling/100000/sample_4000 time:   [521.40 µs 522.86 µs 525.53 µs]
sampling/100000/sample_8000 time:   [552.32 µs 553.91 µs 555.89 µs]

here sample_0 is the original behavior. We'd be able to go with 8000 samples without much of a performance hit.

@oxkitsune oxkitsune added 📺 re_viewer affects re_viewer itself include in changelog labels Nov 3, 2025
@github-actions

github-actions Bot commented Nov 3, 2025

Copy link
Copy Markdown
Contributor

Web viewer built successfully.

Result Commit Link Manifest
cf6707c https://rerun.io/viewer/pr/11766 +nightly +main

View image diff on kitdiff.

Note: This comment is updated whenever you push a commit.

@emilk emilk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

Please make sure crates/viewer/re_time_panel/benches/bench_density_graph.rs covers this, and share some numbers of what effect it has 🙏


// When chunks are too large to render all events, sample this many events uniformly
// to create a good enough density estimate.
max_sampled_events_per_chunk: 4_000,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation behind this particular number?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit is configured at 8k for unsorted chunks, so I figured lets take half. However, after doing some benchmarking, it seems we can use 8k without much of a loss.

For chunks with 5k (around the threshold) I get:

// 5k
sampling/5000/sample_0        time:   [2.1716 µs 2.2432 µs 2.3375 µs]
sampling/5000/sample_4000 time:   [57.836 µs 57.937 µs 58.033 µs]
sampling/5000/sample_8000 time:   [62.420 µs 68.867 µs 76.966 µs]

So around the threshold, we see that for 5k rows with max_sampled_events_per_chunk=4000, we sample 4k events, which is actually faster than sampling the full chunk at max_sampled_events_per_chunk=8000.

For chunks with 20k-100k rows I get the following:

// 20k
sampling/20000/sample_0 time:           [2.1293 µs 2.1467 µs 2.1631 µs]
sampling/20000/sample_4000 time:    [130.47 µs 131.00 µs 131.39 µs]
sampling/20000/sample_8000 time:    [159.53 µs 159.68 µs 159.78 µs]

// 50k
sampling/50000/sample_0        time:    [2.1483 µs 2.2347 µs 2.3770 µs]
sampling/50000/sample_4000 time:    [278.60 µs 279.56 µs 281.00 µs]
sampling/50000/sample_8000 time:    [307.62 µs 317.30 µs 345.24 µs]

// 100k
sampling/100000/sample_0        time:   [2.1437 µs 2.2881 µs 2.6224 µs]
sampling/100000/sample_4000 time:   [521.40 µs 522.86 µs 525.53 µs]
sampling/100000/sample_8000 time:   [552.32 µs 553.91 µs 555.89 µs]

here sample_0 is the original behavior. We'd be able to go with 8000 samples without much of a performance hit.

Comment thread crates/viewer/re_time_panel/src/data_density_graph.rs Outdated
Comment thread crates/viewer/re_time_panel/src/data_density_graph.rs Outdated
Comment thread crates/viewer/re_time_panel/src/data_density_graph.rs Outdated
@oxkitsune oxkitsune force-pushed the gijs/sample-from-chunk branch from c5bb0b5 to fa872db Compare November 4, 2025 13:14
Comment thread crates/viewer/re_time_panel/src/data_density_graph.rs Outdated
@oxkitsune oxkitsune merged commit 13470a2 into main Nov 4, 2025
69 of 70 checks passed
@oxkitsune oxkitsune deleted the gijs/sample-from-chunk branch November 4, 2025 15:37
IsseW added a commit that referenced this pull request Nov 12, 2025
IsseW added a commit that referenced this pull request Nov 13, 2025
oxkitsune added a commit that referenced this pull request Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sample rows from a chunk uniformly instead of giving up and filling the chunk

2 participants