[RLlib] Add ability to compute percentiles to MetricsLogger/Stats by ArturNiederfahrenhorst · Pull Request #52963 · ray-project/ray

ArturNiederfahrenhorst · 2025-05-13T10:01:45Z

Why are these changes needed?

Today, it's not possible to report percentiles of metrics. This PR introduces optional tracking of percentiles in RLlib's metrics reporting. Even after this change, we don't compute percentiles by default because of the considerable overhead.
We rather keep this option for powerusers who are able to modify the relevant RLlib components (for example, change the relevant MetricsLogger.log_value() call.

I did some benchmarking to validate that this would not eat up too much runtime.
https://gist.github.com/ArturNiederfahrenhorst/5ded71ebb5ac28d24d1d63c37c4600f2

Sorting Performance (average times):

1,000 values:
Stats peek(compile=False): 77.8 μs
Stats peek(compile=True): 67.2 μs
Pure list.sort(): 72.8 μs

10,000 values:
Stats peek(compile=False): 951.1 μs
Stats peek(compile=True): 916.2 μs
Pure list.sort(): 929.2 μs

1,000,000 values:
Stats peek(compile=False): 174.7 ms
Stats peek(compile=True): 157.5 ms
Pure list.sort(): 158.2 ms

Merging Performance (average times):
(These assume merging 10 stats objects.)

~1,000 total values:
Stats merge_in_parallel(): 229.5 μs
Pure heapq.merge(): 177.5 μs

~10,000 total values:
Stats merge_in_parallel(): 2.1 ms
Pure heapq.merge(): 1.8 ms

~1,000,000 total values:
Stats merge_in_parallel(): 302.9 ms
Pure heapq.merge(): 192.1 ms

With the above in mind, we are efficiently using heapq and list.sort() to implement distributed sorting to compute exact percentiles. In the future, we may look to approximate methods of computing percentiles. Since we only have to reduce once per Algorithm iteration, the above values don't seem overly expensive.

simonsays1980

Awesome work @ArturNiederfahrenhorst ! Just a few nits and a question of why not using numpy.percentile to compute percentiles?

simonsays1980 · 2025-06-12T14:01:08Z

rllib/utils/metrics/metrics_logger.py

        reduce: Optional[str] = "mean",
        window: Optional[Union[int, float]] = None,
        ema_coeff: Optional[float] = None,
+        percentiles: Union[List[int], bool] = False,


rllib/utils/metrics/metrics_logger.py

simonsays1980 · 2025-06-12T14:04:03Z

rllib/utils/metrics/stats.py

+    for p in percentiles:
+        index = (p / 100) * (n - 1)
+
+        if index.is_integer():


Dumb question: why not using here numpy.percentile?

numpy.percentiles does not assume sorted values. So it will end up sorting with at best n log n.
So we could just not sort values when aggregating and us np.percentiles in the end.
But if we train at scale and have 1000 parallel env runners, it will be much quicker to sort on env runners first so that we can use heapq to merge the already sorted lists in linear time after aggregation.

rllib/utils/metrics/stats.py

simonsays1980 · 2025-06-12T14:05:32Z

rllib/utils/metrics/tests/test_stats.py

    check(nan_stats3.values, stats_with_values3.values)


+def test_percentiles():


…2963) ## Why are these changes needed? Today, it's not possible to report percentiles of metrics. This PR introduces optional tracking of percentiles in RLlib's metrics reporting. Even after this change, we don't compute percentiles by default because of the considerable overhead. We rather keep this option for powerusers who are able to modify the relevant RLlib components (for example, change the relevant `MetricsLogger.log_value()` call. I did some benchmarking to validate that this would not eat up too much runtime. https://gist.github.com/ArturNiederfahrenhorst/5ded71ebb5ac28d24d1d63c37c4600f2 Sorting Performance (average times): 1,000 values: Stats peek(compile=False): 77.8 μs Stats peek(compile=True): 67.2 μs Pure list.sort(): 72.8 μs 10,000 values: Stats peek(compile=False): 951.1 μs Stats peek(compile=True): 916.2 μs Pure list.sort(): 929.2 μs 1,000,000 values: Stats peek(compile=False): 174.7 ms Stats peek(compile=True): 157.5 ms Pure list.sort(): 158.2 ms Merging Performance (average times): (These assume merging 10 stats objects.) ~1,000 total values: Stats merge_in_parallel(): 229.5 μs Pure heapq.merge(): 177.5 μs ~10,000 total values: Stats merge_in_parallel(): 2.1 ms Pure heapq.merge(): 1.8 ms ~1,000,000 total values: Stats merge_in_parallel(): 302.9 ms Pure heapq.merge(): 192.1 ms With the above in mind, we are efficiently using heapq and list.sort() to implement distributed sorting to compute exact percentiles. In the future, we may look to approximate methods of computing percentiles. Since we only have to reduce once per Algorithm iteration, the above values don't seem overly expensive. Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

ArturNiederfahrenhorst added 6 commits May 9, 2025 19:53

initial

c5341be

typo

4c77a1e

Add to BUILD

ceba1bb

init

b5e62e9

merge master

98bbaac

initial stats

8583d12

ArturNiederfahrenhorst requested review from simonsays1980 and sven1977 as code owners May 13, 2025 10:01

ArturNiederfahrenhorst added 20 commits May 13, 2025 18:04

Merge branch 'reduce_per_index_on_merge' into percentiles

697e6a3

wip

78e383b

wip

3ad56d5

Merge branch 'reduce_per_index_on_merge' into percentiles

d91fe8a

wip

f599257

fix from_state

1e3345a

wip

cc38d96

Delete some metrics logger tests that are better covered by stats tests

58d64bc

add back ASCII image of hierarchical metrics system test

554d4fd

most tests succeeding

5b9e57a

Rest of tests and fixes

46aff30

Fix usage of logging dicts in torch meta learner

538df4c

Fix merging empty stats into empty stats

1b57982

Fix log_dict usage

8b74c3d

Add some more docstrings to Stats and MetricsLogger to clarify usage

5e40af9

Rename merge_and_log_n_dicts to aggregate

af5901a

Merge branch 'reduce_per_index_on_merge' into percentiles

d00a5ca

Fix doctest

a7a1a48

fix docstring

6ee4873

Merge branch 'reduce_per_index_on_merge' into percentiles

7ef5eea

ArturNiederfahrenhorst requested review from a team and maxpumperla as code owners May 21, 2025 10:50

ArturNiederfahrenhorst added 2 commits May 21, 2025 18:32

Add API annotation

4bd4659

Remove ASCII stuff from docstring

83a830d

hainesmichaelc added the community-backlog label May 22, 2025

Merge branch 'reduce_per_index_on_merge' into percentiles

aa6ae41

ArturNiederfahrenhorst requested a review from a team as a code owner May 22, 2025 03:29

ArturNiederfahrenhorst added 3 commits May 22, 2025 18:55

Sven's comments

c50c38c

Sven's comments

4d1ae37

Add deprecation warning

3b4de24

hainesmichaelc removed the community-backlog label May 22, 2025

ArturNiederfahrenhorst added 9 commits May 23, 2025 01:01

move Sven's negative throughput fix to new merge method

311f9c8

Merge branch 'reduce_per_index_on_merge' into percentiles

e71857d

Merge branch 'master' into percentiles

4994891

wip

2516955

wip

fea3ff0

wip

4981470

complete percentiles calculation when compiling

355e8dc

Add some more info in docstrings

724006f

add annotation to compute_percentiles

bf5e746

ArturNiederfahrenhorst assigned sven1977 Jun 12, 2025

simonsays1980 approved these changes Jun 12, 2025

View reviewed changes

ArturNiederfahrenhorst and others added 2 commits June 12, 2025 16:35

simon's remarks

4108c14

Merge branch 'master' into percentiles

0e32f3a

ArturNiederfahrenhorst enabled auto-merge (squash) June 12, 2025 14:35

github-actions bot disabled auto-merge June 12, 2025 14:35

github-actions bot added the go add ONLY when ready to merge, run all tests label Jun 12, 2025

ArturNiederfahrenhorst merged commit 87b2fe5 into ray-project:master Jun 12, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Add ability to compute percentiles to MetricsLogger/Stats#52963

[RLlib] Add ability to compute percentiles to MetricsLogger/Stats#52963
ArturNiederfahrenhorst merged 43 commits intoray-project:masterfrom
ArturNiederfahrenhorst:percentiles

ArturNiederfahrenhorst commented May 13, 2025 •

edited

Loading

Uh oh!

simonsays1980 left a comment

Uh oh!

simonsays1980 Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonsays1980 Jun 12, 2025

Uh oh!

ArturNiederfahrenhorst Jun 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

simonsays1980 Jun 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		check(nan_stats3.values, stats_with_values3.values)


		def test_percentiles():

Conversation

ArturNiederfahrenhorst commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Uh oh!

simonsays1980 left a comment

Choose a reason for hiding this comment

Uh oh!

simonsays1980 Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonsays1980 Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

ArturNiederfahrenhorst Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

simonsays1980 Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ArturNiederfahrenhorst commented May 13, 2025 •

edited

Loading

ArturNiederfahrenhorst Jun 12, 2025 •

edited

Loading