Skip to content

Optimize SlidingTimeWindowMovingAverages sumBuckets#4936

Merged
joschi merged 4 commits intodropwizard:release/4.2.xfrom
schlosna:davids/optimize-sliding-time-window-averages
Sep 15, 2025
Merged

Optimize SlidingTimeWindowMovingAverages sumBuckets#4936
joschi merged 4 commits intodropwizard:release/4.2.xfrom
schlosna:davids/optimize-sliding-time-window-averages

Conversation

@schlosna
Copy link
Contributor

@schlosna schlosna commented Sep 3, 2025

SlidingTimeWindowMovingAverages sumBuckets() method could be optimized to perform indexed list access and remove allocations as it currently allocates a LongAdder as well as one or two streams. If one is using the 1, 5, or 15 minute rates on hot paths, these can be unnecessary expensive allocations as well as less optimized computations.

We can avoid the allocations completely and accumulate the sum directly to a long via optimized direct indexed list access.

MovingAverageBenchmarks demonstrates the difference in implementations in both execution time (22x faster) and allocations:

Benchmark                                             (recordings)                                    (type)  Mode  Cnt     Score     Error   Units
MovingAverageBenchmarks.getM1Rate                               10           SlidingTimeWindowMovingAverages  avgt   20  1364.969 ±  12.040   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm            10           SlidingTimeWindowMovingAverages  avgt   20   688.037 ±   0.001    B/op
MovingAverageBenchmarks.getM1Rate                               10  OptimizedSlidingTimeWindowMovingAverages  avgt   20    59.182 ±   0.343   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm            10  OptimizedSlidingTimeWindowMovingAverages  avgt   20     0.002 ±   0.001    B/op
MovingAverageBenchmarks.getM1Rate                             1000           SlidingTimeWindowMovingAverages  avgt   20  1401.864 ± 107.134   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm          1000           SlidingTimeWindowMovingAverages  avgt   20   688.038 ±   0.003    B/op
MovingAverageBenchmarks.getM1Rate                             1000  OptimizedSlidingTimeWindowMovingAverages  avgt   20    61.157 ±   1.393   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm          1000  OptimizedSlidingTimeWindowMovingAverages  avgt   20     0.002 ±   0.001    B/op

@schlosna schlosna requested review from a team as code owners September 3, 2025 18:39
@github-actions github-actions bot added this to the 4.2.37 milestone Sep 3, 2025
schlosna added a commit to palantir/tritium that referenced this pull request Sep 3, 2025
`SlidingTimeWindowMovingAverages` `sumBuckets()` method could be optimized to
perform indexed list access and remove allocations as it currently allocates a
`LongAdder` as well as one or two streams. If one is using the 1, 5, or 15
minute rates on hot paths, these can be unnecessary expensive allocations as
well as less optimized computations.

We can avoid the allocations completely and accumulate the sum directly to a
`long` via optimized direct indexed list access.

See upstream PR dropwizard/metrics#4936
* time window (i.e. 15 minutes, see TIME_WINDOW_DURATION_MINUTES)
*/
private ArrayList<LongAdder> buckets;
private final List<LongAdder> buckets;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fixed size and could just be an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, updated

* time window (i.e. 15 minutes, see TIME_WINDOW_DURATION_MINUTES)
*/
private ArrayList<LongAdder> buckets;
private final LongAdder[] buckets;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a separate optimization would be to avoid the LongAdders and use AtomicLongArray

Suggested change
private final LongAdder[] buckets;
private final AtomicLongArray buckets;

Copy link
Contributor Author

@schlosna schlosna Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using AtomicLongArray cuts optimized getM1Rate from ~60ns to ~45ns, but there's additional overhead on the concurrent update side so going to hold off on that change.

Benchmark                          (recordings)                                    (type)  Mode  Cnt     Score    Error  Units
MovingAverageBenchmarks.getM1Rate            10           SlidingTimeWindowMovingAverages  avgt   20  1366.241 ± 24.341  ns/op
MovingAverageBenchmarks.getM1Rate            10  OptimizedSlidingTimeWindowMovingAverages  avgt   20    43.180 ±  0.153  ns/op
MovingAverageBenchmarks.getM1Rate          1000           SlidingTimeWindowMovingAverages  avgt   20  1377.545 ± 65.159  ns/op
MovingAverageBenchmarks.getM1Rate          1000  OptimizedSlidingTimeWindowMovingAverages  avgt   20    43.967 ±  2.058  ns/op

@ash211
Copy link

ash211 commented Sep 5, 2025

Thanks for opening this PR @schlosna !

The reasoning about doing fewer allocations and more performant array indexing makes sense, and the benchmarks show a clear performance improvement. It does not seem like a reduction in code readability either. I also checked if LongAdder does anything special for long overflow and didn't see anything, so I believe this is behavior-equivalent.

Requesting review from maintainer @joschi

Copy link
Member

@joschi joschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schlosna Thank you for your contribution and providing benchmarks for your change!

@joschi joschi merged commit b4a19b7 into dropwizard:release/4.2.x Sep 15, 2025
4 checks passed
joschi pushed a commit that referenced this pull request Sep 15, 2025
`SlidingTimeWindowMovingAverages` `sumBuckets()` method could be optimized to perform indexed list access and remove allocations as it currently allocates a `LongAdder` as well as one or two streams. If one is using the 1, 5, or 15 minute rates on hot paths, these can be unnecessary expensive allocations as well as less optimized computations.

We can avoid the allocations completely and accumulate the sum directly to a `long` via optimized direct indexed list access.


[MovingAverageBenchmarks](https://github.com/palantir/tritium/blob/davids/OptimizedSlidingTimeWindowMovingAverages/tritium-jmh/src/jmh/java/com/palantir/tritium/microbenchmarks/MovingAverageBenchmarks.java) demonstrates the difference in implementations in both execution time (22x faster) and allocations:


```
Benchmark                                             (recordings)                                    (type)  Mode  Cnt     Score     Error   Units
MovingAverageBenchmarks.getM1Rate                               10           SlidingTimeWindowMovingAverages  avgt   20  1364.969 ±  12.040   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm            10           SlidingTimeWindowMovingAverages  avgt   20   688.037 ±   0.001    B/op
MovingAverageBenchmarks.getM1Rate                               10  OptimizedSlidingTimeWindowMovingAverages  avgt   20    59.182 ±   0.343   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm            10  OptimizedSlidingTimeWindowMovingAverages  avgt   20     0.002 ±   0.001    B/op
MovingAverageBenchmarks.getM1Rate                             1000           SlidingTimeWindowMovingAverages  avgt   20  1401.864 ± 107.134   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm          1000           SlidingTimeWindowMovingAverages  avgt   20   688.038 ±   0.003    B/op
MovingAverageBenchmarks.getM1Rate                             1000  OptimizedSlidingTimeWindowMovingAverages  avgt   20    61.157 ±   1.393   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm          1000  OptimizedSlidingTimeWindowMovingAverages  avgt   20     0.002 ±   0.001    B/op
```
joschi added a commit that referenced this pull request Sep 15, 2025
`SlidingTimeWindowMovingAverages` `sumBuckets()` method could be optimized to perform indexed list access and remove allocations as it currently allocates a `LongAdder` as well as one or two streams. If one is using the 1, 5, or 15 minute rates on hot paths, these can be unnecessary expensive allocations as well as less optimized computations.

We can avoid the allocations completely and accumulate the sum directly to a `long` via optimized direct indexed list access.


[MovingAverageBenchmarks](https://github.com/palantir/tritium/blob/davids/OptimizedSlidingTimeWindowMovingAverages/tritium-jmh/src/jmh/java/com/palantir/tritium/microbenchmarks/MovingAverageBenchmarks.java) demonstrates the difference in implementations in both execution time (22x faster) and allocations:


```
Benchmark                                             (recordings)                                    (type)  Mode  Cnt     Score     Error   Units
MovingAverageBenchmarks.getM1Rate                               10           SlidingTimeWindowMovingAverages  avgt   20  1364.969 ±  12.040   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm            10           SlidingTimeWindowMovingAverages  avgt   20   688.037 ±   0.001    B/op
MovingAverageBenchmarks.getM1Rate                               10  OptimizedSlidingTimeWindowMovingAverages  avgt   20    59.182 ±   0.343   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm            10  OptimizedSlidingTimeWindowMovingAverages  avgt   20     0.002 ±   0.001    B/op
MovingAverageBenchmarks.getM1Rate                             1000           SlidingTimeWindowMovingAverages  avgt   20  1401.864 ± 107.134   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm          1000           SlidingTimeWindowMovingAverages  avgt   20   688.038 ±   0.003    B/op
MovingAverageBenchmarks.getM1Rate                             1000  OptimizedSlidingTimeWindowMovingAverages  avgt   20    61.157 ±   1.393   ns/op
MovingAverageBenchmarks.getM1Rate:gc.alloc.rate.norm          1000  OptimizedSlidingTimeWindowMovingAverages  avgt   20     0.002 ±   0.001    B/op
```

Co-authored-by: David Schlosnagle <davids@palantir.com>
@schlosna schlosna deleted the davids/optimize-sliding-time-window-averages branch September 16, 2025 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants