Skip to content

Add zero_std all_zero_ratio and all_one_ratio metrics#1034

Merged
Shi-Dong merged 3 commits intomainfrom
shi/zero-std-ratios
Apr 23, 2026
Merged

Add zero_std all_zero_ratio and all_one_ratio metrics#1034
Shi-Dong merged 3 commits intomainfrom
shi/zero-std-ratios

Conversation

@Shi-Dong
Copy link
Copy Markdown
Contributor

Summary

  • Add two derived metrics inside `_compute_zero_std_metrics` so you don't need to know the rollout batch size to interpret them:
    • `zero_std/all_zero_ratio` — fraction of GRPO groups where every sample scored 0.0 ("too hard").
    • `zero_std/all_one_ratio` — fraction where every sample scored 1.0 ("too easy").
  • The existing `zero_std/count_{reward}` counters are unchanged.

Why

The raw counts are hard to compare across runs with different `rollout_batch_size` / `n_samples_per_prompt` configs, and they require cross-referencing the launcher to compute a percentage. Having both `all_zero_ratio` and `all_one_ratio` in wandb makes "dead" GRPO groups directly trackable.

Meaningful only for binary rewards; for non-binary schemes the per-bucket `count_{reward}` metrics remain the source of truth.

Test plan

  • Touches one function; no call-site signature change; logs additional keys only when the existing GRPO condition already fired.
  • Eyeball the new keys in the first rollout of a live run.

The existing `zero_std/count_{reward}` metrics report the raw number of
GRPO groups where every sample shares the same reward (and therefore
contributes zero gradient under group-normalized advantages). Comparing
these counts across runs with different rollout batch sizes, or
understanding them without first looking up
`rollout_batch_size` / `n_samples_per_prompt`, is awkward.

Add two derived ratios over the total number of groups in the rollout:
- `zero_std/all_zero_ratio` - fraction of groups where every sample
  scored 0.0 (binary reward: "too hard", no attempt succeeded).
- `zero_std/all_one_ratio` - fraction of groups where every sample
  scored 1.0 ("too easy", every attempt succeeded).

Both are meaningful only for binary rewards; for non-binary reward
schemes the individual `count_{reward}` buckets remain available.
Shi Dong added 2 commits April 21, 2026 22:27
Per review: the metric names are more intuitive as "percentage" than
"ratio", so rename and multiply by 100 so the values match the name
(0-100 instead of 0-1).

- zero_std/all_zero_ratio -> zero_std/all_zero_percentage
- zero_std/all_one_ratio  -> zero_std/all_one_percentage
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the reward logging in miles/ray/rollout.py to include ratios for all-zero and all-one reward groups relative to the total number of groups. This change allows for better comparison across runs with different rollout batch sizes. I have no feedback to provide.

@Shi-Dong Shi-Dong merged commit a772c33 into main Apr 23, 2026
83 of 86 checks passed
@Shi-Dong Shi-Dong deleted the shi/zero-std-ratios branch April 23, 2026 05:46
fzyzcjy added a commit that referenced this pull request May 5, 2026
Note: rollout_ft/0 was already merged into main as PR #897 and deleted
from origin, so the cascade starts from main → rollout_ft/1.

Conflict 1: miles/ray/rollout.py (modify/delete)
  - HEAD (rollout_ft/1) deleted the file as part of a mechanical split:
    commit d3fc26e "mechanically move" splits the original 1298-line
    miles/ray/rollout.py into the directory miles/ray/rollout/{addr_allocator,
    metrics,observability,rollout_manager,rollout_server,router_manager,
    server_group}.py.
  - origin/main modified the file (4 commits since merge-base):
    a772c33 zero_std all_zero_ratio/all_one_ratio metrics (#1034)
    41615af weight staleness control for fully async rollout (#958)
    c198efa consistent hashing routing policy (#891)
    eaa36a2 heartbeat and id to session server (#866)

  Resolution: removed miles/ray/rollout.py (preserve mechanical split)
  and re-applied main's 7 hunks to the new directory files:

  - miles/ray/rollout/router_manager.py:
      + import uuid
      + router_args.policy = args.sglang_router_policy (in start_router)
      + args.session_server_instance_id = uuid.uuid4().hex (in
        start_session_server)
  - miles/ray/rollout/train_data_conversion.py:
      + train_data["weight_versions"] population (after multimodal block)
      + "weight_versions" added to per-DP-split key whitelist
  - miles/ray/rollout/metrics.py:
      + oldest_weight_version statistics + mixed_version_ratio in
        _compute_metrics_from_samples
      + zero_std/all_zero_percentage and all_one_percentage in
        _compute_zero_std_metrics

  Verified: all dependent symbols (Sample.weight_versions,
  Sample.oldest_weight_version, args.sglang_router_policy,
  args.session_server_instance_id) are already present in the merged
  tree from cleanly-merged peer files (miles/utils/types.py,
  miles/backends/sglang_utils/arguments.py, etc.).

  Verified: merge_diff_check.py shows the only "lost" deviations are
  PR #897 skill files (already in main) — no real loss.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants