Skip to content

[RLlib] Add ability to compute percentiles to MetricsLogger/Stats#52963

Merged
ArturNiederfahrenhorst merged 43 commits intoray-project:masterfrom
ArturNiederfahrenhorst:percentiles
Jun 12, 2025
Merged

[RLlib] Add ability to compute percentiles to MetricsLogger/Stats#52963
ArturNiederfahrenhorst merged 43 commits intoray-project:masterfrom
ArturNiederfahrenhorst:percentiles

Conversation

@ArturNiederfahrenhorst
Copy link
Copy Markdown
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst commented May 13, 2025

Why are these changes needed?

Today, it's not possible to report percentiles of metrics. This PR introduces optional tracking of percentiles in RLlib's metrics reporting. Even after this change, we don't compute percentiles by default because of the considerable overhead.
We rather keep this option for powerusers who are able to modify the relevant RLlib components (for example, change the relevant MetricsLogger.log_value() call.

I did some benchmarking to validate that this would not eat up too much runtime.
https://gist.github.com/ArturNiederfahrenhorst/5ded71ebb5ac28d24d1d63c37c4600f2

Sorting Performance (average times):

1,000 values:
Stats peek(compile=False): 77.8 μs
Stats peek(compile=True): 67.2 μs
Pure list.sort(): 72.8 μs

10,000 values:
Stats peek(compile=False): 951.1 μs
Stats peek(compile=True): 916.2 μs
Pure list.sort(): 929.2 μs

1,000,000 values:
Stats peek(compile=False): 174.7 ms
Stats peek(compile=True): 157.5 ms
Pure list.sort(): 158.2 ms

Merging Performance (average times):
(These assume merging 10 stats objects.)

~1,000 total values:
Stats merge_in_parallel(): 229.5 μs
Pure heapq.merge(): 177.5 μs

~10,000 total values:
Stats merge_in_parallel(): 2.1 ms
Pure heapq.merge(): 1.8 ms

~1,000,000 total values:
Stats merge_in_parallel(): 302.9 ms
Pure heapq.merge(): 192.1 ms

With the above in mind, we are efficiently using heapq and list.sort() to implement distributed sorting to compute exact percentiles. In the future, we may look to approximate methods of computing percentiles. Since we only have to reduce once per Algorithm iteration, the above values don't seem overly expensive.

@ArturNiederfahrenhorst ArturNiederfahrenhorst requested a review from a team as a code owner May 22, 2025 03:29
Copy link
Copy Markdown
Contributor

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @ArturNiederfahrenhorst ! Just a few nits and a question of why not using numpy.percentile to compute percentiles?

reduce: Optional[str] = "mean",
window: Optional[Union[int, float]] = None,
ema_coeff: Optional[float] = None,
percentiles: Union[List[int], bool] = False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

for p in percentiles:
index = (p / 100) * (n - 1)

if index.is_integer():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: why not using here numpy.percentile?

Copy link
Copy Markdown
Contributor Author

@ArturNiederfahrenhorst ArturNiederfahrenhorst Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy.percentiles does not assume sorted values. So it will end up sorting with at best n log n.
So we could just not sort values when aggregating and us np.percentiles in the end.
But if we train at scale and have 1000 parallel env runners, it will be much quicker to sort on env runners first so that we can use heapq to merge the already sorted lists in linear time after aggregation.

check(nan_stats3.values, stats_with_values3.values)


def test_percentiles():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :)

@ArturNiederfahrenhorst ArturNiederfahrenhorst enabled auto-merge (squash) June 12, 2025 14:35
@github-actions github-actions bot disabled auto-merge June 12, 2025 14:35
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jun 12, 2025
@ArturNiederfahrenhorst ArturNiederfahrenhorst merged commit 87b2fe5 into ray-project:master Jun 12, 2025
7 checks passed
elliot-barn pushed a commit that referenced this pull request Jun 18, 2025
…2963)

## Why are these changes needed?

Today, it's not possible to report percentiles of metrics. This PR
introduces optional tracking of percentiles in RLlib's metrics
reporting. Even after this change, we don't compute percentiles by
default because of the considerable overhead.
We rather keep this option for powerusers who are able to modify the
relevant RLlib components (for example, change the relevant
`MetricsLogger.log_value()` call.

I did some benchmarking to validate that this would not eat up too much
runtime.

https://gist.github.com/ArturNiederfahrenhorst/5ded71ebb5ac28d24d1d63c37c4600f2

Sorting Performance (average times):

1,000 values:
  Stats peek(compile=False): 77.8 μs
  Stats peek(compile=True):  67.2 μs
  Pure list.sort():          72.8 μs

10,000 values:
  Stats peek(compile=False): 951.1 μs
  Stats peek(compile=True):  916.2 μs
  Pure list.sort():          929.2 μs

1,000,000 values:
  Stats peek(compile=False): 174.7 ms
  Stats peek(compile=True):  157.5 ms
  Pure list.sort():          158.2 ms

Merging Performance (average times):
(These assume merging 10 stats objects.)

~1,000 total values:
  Stats merge_in_parallel(): 229.5 μs
  Pure heapq.merge():        177.5 μs

~10,000 total values:
  Stats merge_in_parallel(): 2.1 ms
  Pure heapq.merge():        1.8 ms

~1,000,000 total values:
  Stats merge_in_parallel(): 302.9 ms
  Pure heapq.merge():        192.1 ms

With the above in mind, we are efficiently using heapq and list.sort()
to implement distributed sorting to compute exact percentiles. In the
future, we may look to approximate methods of computing percentiles.
Since we only have to reduce once per Algorithm iteration, the above
values don't seem overly expensive.

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Jul 2, 2025
…2963)

## Why are these changes needed?

Today, it's not possible to report percentiles of metrics. This PR
introduces optional tracking of percentiles in RLlib's metrics
reporting. Even after this change, we don't compute percentiles by
default because of the considerable overhead.
We rather keep this option for powerusers who are able to modify the
relevant RLlib components (for example, change the relevant
`MetricsLogger.log_value()` call.

I did some benchmarking to validate that this would not eat up too much
runtime.

https://gist.github.com/ArturNiederfahrenhorst/5ded71ebb5ac28d24d1d63c37c4600f2

Sorting Performance (average times):

1,000 values:
  Stats peek(compile=False): 77.8 μs
  Stats peek(compile=True):  67.2 μs
  Pure list.sort():          72.8 μs

10,000 values:
  Stats peek(compile=False): 951.1 μs
  Stats peek(compile=True):  916.2 μs
  Pure list.sort():          929.2 μs

1,000,000 values:
  Stats peek(compile=False): 174.7 ms
  Stats peek(compile=True):  157.5 ms
  Pure list.sort():          158.2 ms

Merging Performance (average times):
(These assume merging 10 stats objects.)

~1,000 total values:
  Stats merge_in_parallel(): 229.5 μs
  Pure heapq.merge():        177.5 μs

~10,000 total values:
  Stats merge_in_parallel(): 2.1 ms
  Pure heapq.merge():        1.8 ms

~1,000,000 total values:
  Stats merge_in_parallel(): 302.9 ms
  Pure heapq.merge():        192.1 ms

With the above in mind, we are efficiently using heapq and list.sort()
to implement distributed sorting to compute exact percentiles. In the
future, we may look to approximate methods of computing percentiles.
Since we only have to reduce once per Algorithm iteration, the above
values don't seem overly expensive.

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants