Add zero_std all_zero_ratio and all_one_ratio metrics by Shi-Dong · Pull Request #1034 · radixark/miles

Shi-Dong · 2026-04-22T05:24:51Z

Summary

Add two derived metrics inside `_compute_zero_std_metrics` so you don't need to know the rollout batch size to interpret them:
- `zero_std/all_zero_ratio` — fraction of GRPO groups where every sample scored 0.0 ("too hard").
- `zero_std/all_one_ratio` — fraction where every sample scored 1.0 ("too easy").
The existing `zero_std/count_{reward}` counters are unchanged.

Why

The raw counts are hard to compare across runs with different `rollout_batch_size` / `n_samples_per_prompt` configs, and they require cross-referencing the launcher to compute a percentage. Having both `all_zero_ratio` and `all_one_ratio` in wandb makes "dead" GRPO groups directly trackable.

Meaningful only for binary rewards; for non-binary schemes the per-bucket `count_{reward}` metrics remain the source of truth.

Test plan

Touches one function; no call-site signature change; logs additional keys only when the existing GRPO condition already fired.
Eyeball the new keys in the first rollout of a live run.

The existing `zero_std/count_{reward}` metrics report the raw number of GRPO groups where every sample shares the same reward (and therefore contributes zero gradient under group-normalized advantages). Comparing these counts across runs with different rollout batch sizes, or understanding them without first looking up `rollout_batch_size` / `n_samples_per_prompt`, is awkward. Add two derived ratios over the total number of groups in the rollout: - `zero_std/all_zero_ratio` - fraction of groups where every sample scored 0.0 (binary reward: "too hard", no attempt succeeded). - `zero_std/all_one_ratio` - fraction of groups where every sample scored 1.0 ("too easy", every attempt succeeded). Both are meaningful only for binary rewards; for non-binary reward schemes the individual `count_{reward}` buckets remain available.

Per review: the metric names are more intuitive as "percentage" than "ratio", so rename and multiply by 100 so the values match the name (0-100 instead of 0-1). - zero_std/all_zero_ratio -> zero_std/all_zero_percentage - zero_std/all_one_ratio -> zero_std/all_one_percentage

gemini-code-assist

Code Review

This pull request updates the reward logging in miles/ray/rollout.py to include ratios for all-zero and all-one reward groups relative to the total number of groups. This change allows for better comparison across runs with different rollout batch sizes. I have no feedback to provide.

Note: rollout_ft/0 was already merged into main as PR #897 and deleted from origin, so the cascade starts from main → rollout_ft/1. Conflict 1: miles/ray/rollout.py (modify/delete) - HEAD (rollout_ft/1) deleted the file as part of a mechanical split: commit d3fc26e "mechanically move" splits the original 1298-line miles/ray/rollout.py into the directory miles/ray/rollout/{addr_allocator, metrics,observability,rollout_manager,rollout_server,router_manager, server_group}.py. - origin/main modified the file (4 commits since merge-base): a772c33 zero_std all_zero_ratio/all_one_ratio metrics (#1034) 41615af weight staleness control for fully async rollout (#958) c198efa consistent hashing routing policy (#891) eaa36a2 heartbeat and id to session server (#866) Resolution: removed miles/ray/rollout.py (preserve mechanical split) and re-applied main's 7 hunks to the new directory files: - miles/ray/rollout/router_manager.py: + import uuid + router_args.policy = args.sglang_router_policy (in start_router) + args.session_server_instance_id = uuid.uuid4().hex (in start_session_server) - miles/ray/rollout/train_data_conversion.py: + train_data["weight_versions"] population (after multimodal block) + "weight_versions" added to per-DP-split key whitelist - miles/ray/rollout/metrics.py: + oldest_weight_version statistics + mixed_version_ratio in _compute_metrics_from_samples + zero_std/all_zero_percentage and all_one_percentage in _compute_zero_std_metrics Verified: all dependent symbols (Sample.weight_versions, Sample.oldest_weight_version, args.sglang_router_policy, args.session_server_instance_id) are already present in the merged tree from cleanly-merged peer files (miles/utils/types.py, miles/backends/sglang_utils/arguments.py, etc.). Verified: merge_diff_check.py shows the only "lost" deviations are PR #897 skill files (already in main) — no real loss.

Shi-Dong requested review from fzyzcjy, maocheng23 and yueming-yuan as code owners April 22, 2026 05:24

Shi Dong added 2 commits April 21, 2026 22:27

Keep zero_std percentage values on 0.0-1.0 scale

13e86da

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

yushengsu-thu added the run-ci-megatron label Apr 22, 2026

maocheng23 approved these changes Apr 23, 2026

View reviewed changes

Shi-Dong merged commit a772c33 into main Apr 23, 2026
83 of 86 checks passed

Shi-Dong deleted the shi/zero-std-ratios branch April 23, 2026 05:46

fzyzcjy mentioned this pull request May 5, 2026

Refactor rollout.py by file decompositions #899

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zero_std all_zero_ratio and all_one_ratio metrics#1034

Add zero_std all_zero_ratio and all_one_ratio metrics#1034
Shi-Dong merged 3 commits intomainfrom
shi/zero-std-ratios

Shi-Dong commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Shi-Dong commented Apr 22, 2026

Summary

Why

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants