Skip to content

Add mechanical refactor verify skills#897

Merged
fzyzcjy merged 4 commits intomainfrom
rollout_ft/0
May 5, 2026
Merged

Add mechanical refactor verify skills#897
fzyzcjy merged 4 commits intomainfrom
rollout_ft/0

Conversation

@fzyzcjy
Copy link
Copy Markdown
Collaborator

@fzyzcjy fzyzcjy commented Apr 6, 2026

No description provided.

@fzyzcjy fzyzcjy changed the title Refactor rollout.py to make it separate of concerns Refactor rollout.py for file decompositions Apr 6, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the rollout logic by decomposing a large monolithic implementation into several modular files, including dedicated modules for address allocation, metrics logging, server management, and router orchestration. This restructuring significantly improves the maintainability and organization of the codebase. My feedback focuses on a hardcoded port value that overlaps with Ray's internal port range, which could lead to deployment conflicts.

Comment thread miles/ray/rollout/server_group.py Outdated
@fzyzcjy fzyzcjy changed the title Refactor rollout.py for file decompositions Refactor rollout.py by file decompositions Apr 6, 2026
@fzyzcjy fzyzcjy closed this Apr 6, 2026
@fzyzcjy fzyzcjy reopened this Apr 6, 2026
@fzyzcjy fzyzcjy changed the title Refactor rollout.py by file decompositions Add mechanical refactor verify skills Apr 6, 2026
@fzyzcjy fzyzcjy merged commit b4d30fc into main May 5, 2026
18 checks passed
@fzyzcjy fzyzcjy deleted the rollout_ft/0 branch May 5, 2026 13:22
fzyzcjy added a commit that referenced this pull request May 5, 2026
Note: rollout_ft/0 was already merged into main as PR #897 and deleted
from origin, so the cascade starts from main → rollout_ft/1.

Conflict 1: miles/ray/rollout.py (modify/delete)
  - HEAD (rollout_ft/1) deleted the file as part of a mechanical split:
    commit d3fc26e "mechanically move" splits the original 1298-line
    miles/ray/rollout.py into the directory miles/ray/rollout/{addr_allocator,
    metrics,observability,rollout_manager,rollout_server,router_manager,
    server_group}.py.
  - origin/main modified the file (4 commits since merge-base):
    a772c33 zero_std all_zero_ratio/all_one_ratio metrics (#1034)
    41615af weight staleness control for fully async rollout (#958)
    c198efa consistent hashing routing policy (#891)
    eaa36a2 heartbeat and id to session server (#866)

  Resolution: removed miles/ray/rollout.py (preserve mechanical split)
  and re-applied main's 7 hunks to the new directory files:

  - miles/ray/rollout/router_manager.py:
      + import uuid
      + router_args.policy = args.sglang_router_policy (in start_router)
      + args.session_server_instance_id = uuid.uuid4().hex (in
        start_session_server)
  - miles/ray/rollout/train_data_conversion.py:
      + train_data["weight_versions"] population (after multimodal block)
      + "weight_versions" added to per-DP-split key whitelist
  - miles/ray/rollout/metrics.py:
      + oldest_weight_version statistics + mixed_version_ratio in
        _compute_metrics_from_samples
      + zero_std/all_zero_percentage and all_one_percentage in
        _compute_zero_std_metrics

  Verified: all dependent symbols (Sample.weight_versions,
  Sample.oldest_weight_version, args.sglang_router_policy,
  args.session_server_instance_id) are already present in the merged
  tree from cleanly-merged peer files (miles/utils/types.py,
  miles/backends/sglang_utils/arguments.py, etc.).

  Verified: merge_diff_check.py shows the only "lost" deviations are
  PR #897 skill files (already in main) — no real loss.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants