Add zero_std all_zero_ratio and all_one_ratio metrics#1034
Merged
Conversation
The existing `zero_std/count_{reward}` metrics report the raw number of
GRPO groups where every sample shares the same reward (and therefore
contributes zero gradient under group-normalized advantages). Comparing
these counts across runs with different rollout batch sizes, or
understanding them without first looking up
`rollout_batch_size` / `n_samples_per_prompt`, is awkward.
Add two derived ratios over the total number of groups in the rollout:
- `zero_std/all_zero_ratio` - fraction of groups where every sample
scored 0.0 (binary reward: "too hard", no attempt succeeded).
- `zero_std/all_one_ratio` - fraction of groups where every sample
scored 1.0 ("too easy", every attempt succeeded).
Both are meaningful only for binary rewards; for non-binary reward
schemes the individual `count_{reward}` buckets remain available.
added 2 commits
April 21, 2026 22:27
Per review: the metric names are more intuitive as "percentage" than "ratio", so rename and multiply by 100 so the values match the name (0-100 instead of 0-1). - zero_std/all_zero_ratio -> zero_std/all_zero_percentage - zero_std/all_one_ratio -> zero_std/all_one_percentage
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the reward logging in miles/ray/rollout.py to include ratios for all-zero and all-one reward groups relative to the total number of groups. This change allows for better comparison across runs with different rollout batch sizes. I have no feedback to provide.
maocheng23
approved these changes
Apr 23, 2026
fzyzcjy
added a commit
that referenced
this pull request
May 5, 2026
Note: rollout_ft/0 was already merged into main as PR #897 and deleted from origin, so the cascade starts from main → rollout_ft/1. Conflict 1: miles/ray/rollout.py (modify/delete) - HEAD (rollout_ft/1) deleted the file as part of a mechanical split: commit d3fc26e "mechanically move" splits the original 1298-line miles/ray/rollout.py into the directory miles/ray/rollout/{addr_allocator, metrics,observability,rollout_manager,rollout_server,router_manager, server_group}.py. - origin/main modified the file (4 commits since merge-base): a772c33 zero_std all_zero_ratio/all_one_ratio metrics (#1034) 41615af weight staleness control for fully async rollout (#958) c198efa consistent hashing routing policy (#891) eaa36a2 heartbeat and id to session server (#866) Resolution: removed miles/ray/rollout.py (preserve mechanical split) and re-applied main's 7 hunks to the new directory files: - miles/ray/rollout/router_manager.py: + import uuid + router_args.policy = args.sglang_router_policy (in start_router) + args.session_server_instance_id = uuid.uuid4().hex (in start_session_server) - miles/ray/rollout/train_data_conversion.py: + train_data["weight_versions"] population (after multimodal block) + "weight_versions" added to per-DP-split key whitelist - miles/ray/rollout/metrics.py: + oldest_weight_version statistics + mixed_version_ratio in _compute_metrics_from_samples + zero_std/all_zero_percentage and all_one_percentage in _compute_zero_std_metrics Verified: all dependent symbols (Sample.weight_versions, Sample.oldest_weight_version, args.sglang_router_policy, args.session_server_instance_id) are already present in the merged tree from cleanly-merged peer files (miles/utils/types.py, miles/backends/sglang_utils/arguments.py, etc.). Verified: merge_diff_check.py shows the only "lost" deviations are PR #897 skill files (already in main) — no real loss.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
The raw counts are hard to compare across runs with different `rollout_batch_size` / `n_samples_per_prompt` configs, and they require cross-referencing the launcher to compute a percentage. Having both `all_zero_ratio` and `all_one_ratio` in wandb makes "dead" GRPO groups directly trackable.
Meaningful only for binary rewards; for non-binary schemes the per-bucket `count_{reward}` metrics remain the source of truth.
Test plan