bench performance and iter_custom

Recently I had an opportunity to try out the new and *much-appreciated* bench feature in wasm-bindgen-test (#4812 #4823) to investigate a performance issue in wasm in my codebase.

What I learned is: nothing about my actual performance issue.  But I learned the browsers in their wisdom have decided to drastically change their performance depending on if I run the benchmark myself (NO_HEADLESS=1) vs with a driver (```CHROMEDRIVER=`which chromedriver`, SAFARIDRIVER=...```). This took me a long time to figure out.  Since the performance issue happens in my browser (NO_HEADLESS=1) the benchmark results are a lot less insightful for how my code actually performs, than I had hoped.  

It occurs to me that I might be able to at least try to work around this and see if that works (or even sheds new light on the problem) like so
```rust
  #[wasm_bindgen_bench]
  async fn bench(c: &mut Criterion) {
      c.bench_async_function("my_bench", |b| {
          Box::pin(b.iter_custom_future(|iters| async move {
              let mut total = Duration::ZERO;
              for _ in 0..iters {
                  // Wait for awhile before starting the test proper
                  // try various strategies (wall time, internal metrics)
                  magic_wait().await;

                  let start = performance_now(); //begin test proper
                  operation_to_benchmark().await;
                  total += Duration::from_secs_f64((performance_now() - start) / 1000.0);
              }
              total
          }))
      }).await;
  }
```

This would require adding the hypothetical `iter_custom_future` function to support this type of benchmark.  The main idea is that, by waiting before each timed iteration (but excluding that wait from the measurement), I might be able to get results that are more like those in real world (NO_HEADLESS=1) scenarios.

I don't know if this will actually work in my situation, but this seems like the kinda thing people benchmarking their code will want to try to rule out other causes, and when enough people try things, a more robust solution can be proposed.

Meanwhile, is it worth PRing this design, or something along these lines?  Or would it be more expedient to write a custom benchmarking harness that is more suited to my situation?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench performance and iter_custom #4840

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

bench performance and iter_custom #4840

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions