Separate OutputSplitter._locality_hints from actor_locality_enabled & locality_with_output by srinathk10 · Pull Request #52005 · ray-project/ray

srinathk10 · 2025-04-05T00:36:52Z

Why are these changes needed?

OutputSplitter._locality_hints semantics needs to be kept separate from locality_with_output and actor_locality_enabled, so each of these can be set explicitly. To achieve this,

Make the streaming_split_locality an argument in ray.train.DataConfig. If it's true, then pass in the Train worker nodes as locality hints to streaming_split.
Remove all usage of locality_with_output and actor_locality_enabled in the OutputSplitter / Ray Train DataConfig code.

locality_with_output is now a config that ONLY affects TaskPoolMapOperator.
actor_locality_enabled is now a config that ONLY affects ActorPoolMapOperator.
Also, maybe in the future, individual map operators can be configured separately from each other, since right now it affects all operators globally.

Setting streaming_split_locality = True, and locality_with_output = True guarantees that ALL map tasks outputs end up on one of the Train workers, since every map task gets scheduled on the Train worker nodes. But this also means that all other nodes will be underutilized. It is probably not great to affect the scheduling of ALL map operators globally.
Setting streaming_split_locality = True, and locality_with_output = False will have a lower "hit rate," and a locality hit just depends on the pipeline's last map task getting randomly scheduled on a Train worker. (All map tasks before the last one don't matter for streaming split locality.) However, there will be better cluster utilization due to spreading out tasks across nodes with no Train worker.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

… locality hints Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

srinathk10 · 2025-04-05T05:20:44Z

Shows locality hints is wired through.

Test Program

import os
import tempfile
import time
import pandas as pd
import numpy as np
import torch
from torch import nn
from torch.nn.parallel import DistributedDataParallel

import ray
from ray.train import Checkpoint, CheckpointConfig, RunConfig, ScalingConfig
from ray.train.torch import TorchTrainer


use_gpu = False
num_workers = 4


class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.layer1 = nn.Linear(1, 32)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(32, 1)

    def forward(self, input):
        return self.layer2(self.relu(self.layer1(input)))


def train_loop_per_worker(config):
    import logging
    logging.basicConfig(level=logging.INFO)
    logger = logging.getLogger(__name__)

    lr = config["lr"]
    batch_size = config["batch_size"]
    num_epochs = config["num_epochs"]

    logger.info(f"Worker started with config: {config}")

    train_dataset_shard = ray.train.get_dataset_shard("train")
    logger.info("Obtained dataset shard for training")

    model = NeuralNetwork()
    model = ray.train.torch.prepare_model(model)

    loss_fn = nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)

    dataloader = train_dataset_shard.iter_torch_batches(
        batch_size=batch_size, dtypes=torch.float, device=model.device
    )

    for epoch in range(num_epochs):
        epoch_start = time.time()
        total_loss = 0
        total_samples = 0

        logger.info(f"Epoch {epoch} started")

        for i, batch in enumerate(dataloader):
            # Ensure inputs have the shape (batch_size, 1)
            inputs = batch["input"].view(-1, 1)  # Reshape to (batch_size, 1)
            labels = batch["label"].view(-1, 1)  # Ensure labels have shape (batch_size, 1)

            output = model(inputs)
            loss = loss_fn(output, labels)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            total_loss += loss.item() * len(inputs)
            total_samples += len(inputs)

            if i % 50 == 0:
                logger.info(f"Epoch {epoch}, batch {i}, batch loss = {loss.item():.4f}")

        epoch_duration = time.time() - epoch_start
        throughput = total_samples / epoch_duration if epoch_duration > 0 else 0

        base_model = model.module if isinstance(model, DistributedDataParallel) else model
        checkpoint_dir = tempfile.mkdtemp()
        torch.save(
            {"model_state_dict": base_model.state_dict()},
            os.path.join(checkpoint_dir, "model.pt"),
        )
        checkpoint = Checkpoint.from_directory(checkpoint_dir)

        logger.info(
            f"[Epoch {epoch}] Avg loss: {total_loss / total_samples:.4f}, "
            f"Samples: {total_samples}, Duration: {epoch_duration:.2f}s, "
            f"Throughput: {throughput:.2f} samples/sec"
        )

        ray.train.report(
            {
                "epoch": epoch,
                "avg_loss": total_loss / total_samples,
                "throughput_samples_per_sec": throughput,
                "epoch_duration_sec": epoch_duration,
            },
            checkpoint=checkpoint,
        )

def create_dataset(n):
    return pd.DataFrame({
        "input": np.random.rand(n).astype(np.float32),
        "label": np.random.randint(0, 2, size=n).astype(np.float32),
    })

train_loop_config = {
    "num_epochs": 5,
    "lr": 0.01,
    "batch_size": 32
}
scaling_config = ScalingConfig(num_workers=num_workers, use_gpu=use_gpu)
run_config = RunConfig(checkpoint_config=CheckpointConfig(num_to_keep=1))

print("Shutting down existing Ray instance if any...")
ray.shutdown()

print("Initializing Ray...")
ray.init(ignore_reinit_error=True, dashboard_host=None)

print("Generating dataset...")
train_dataset = ray.data.from_pandas(create_dataset(100_000))
print(f"Dataset schema: {train_dataset.schema()}, Count: {train_dataset.count()}")

datasets = {"train": train_dataset}

print("Creating trainer...")
ray_data_execution_options = ray.train.DataConfig.default_ingest_options()
ray_data_execution_options.locality_with_output = True
ray_data_execution_options.actor_locality_enabled = True

trainer = TorchTrainer(
    train_loop_per_worker=train_loop_per_worker,
    train_loop_config=train_loop_config,
    scaling_config=scaling_config,
    dataset_config=ray.train.DataConfig(
        datasets_to_split="all",
        execution_options=ray_data_execution_options,
    ),
    run_config=run_config,
    datasets=datasets
)

print("Starting training...")
result = trainer.fit()
print("Training complete.")

print("Final loss:", result.metrics["avg_loss"])

print("Shutting down Ray...")
ray.shutdown()

Test Output

ec2-user@ip-172-31-41-154$ python  ./test.py
Shutting down existing Ray instance if any...
Initializing Ray...
...
2025-04-04 21:50:44,102	INFO worker.py:1858 -- Started a local Ray instance.
Generating dataset...
Dataset schema: Column  Type
------  ----
input   float
label   float, Count: 100000
Creating trainer...
Starting training...

View detailed results here: /home/ec2-user/ray_results/TorchTrainer_2025-04-04_21-50-45
To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2025-04-04_21-50-23_331031_30293/artifacts/2025-04-04_21-50-45/TorchTrainer_2025-04-04_21-50-45/driver_artifacts`

Training started with configuration:
╭─────────────────────────────────────╮
│ Training config                     │
├─────────────────────────────────────┤
│ train_loop_config/batch_size     32 │
│ train_loop_config/lr           0.01 │
│ train_loop_config/num_epochs      5 │
╰─────────────────────────────────────╯
(RayTrainWorker pid=31369) Setting up process group for: env:// [rank=0, world_size=4]
(TorchTrainer pid=31308) Started distributed worker processes:
(TorchTrainer pid=31308) - (node_id=66183672d5dff6ed2b61946f2238b747d0ee116357fed24146dae054, ip=172.31.41.154, pid=31369) world_rank=0, local_rank=0, node_rank=0
(TorchTrainer pid=31308) - (node_id=66183672d5dff6ed2b61946f2238b747d0ee116357fed24146dae054, ip=172.31.41.154, pid=31366) world_rank=1, local_rank=1, node_rank=0
(TorchTrainer pid=31308) - (node_id=66183672d5dff6ed2b61946f2238b747d0ee116357fed24146dae054, ip=172.31.41.154, pid=31367) world_rank=2, local_rank=2, node_rank=0
(TorchTrainer pid=31308) - (node_id=66183672d5dff6ed2b61946f2238b747d0ee116357fed24146dae054, ip=172.31.41.154, pid=31368) world_rank=3, local_rank=3, node_rank=0
(RayTrainWorker pid=31369) INFO:__main__:Worker started with config: {'num_epochs': 5, 'lr': 0.01, 'batch_size': 32}
(RayTrainWorker pid=31369) INFO:__main__:Obtained dataset shard for training
(RayTrainWorker pid=31369) Moving model to device: cpu
(RayTrainWorker pid=31369) Wrapping provided model in DistributedDataParallel.
(RayTrainWorker pid=31369) INFO:__main__:Epoch 0 started
(pid=31553) ✔️  Dataset execution finished in 0.02 seconds: : 25.0k row [00:00, 60.3M row/s]

(pid=31553) - split(4, equal=True): Tasks: 0; Queued blocks: 0; Resources: 0.0 CPU, 781.4KB object store; [all objects local]: (pid=31553) - split(4, equal=True): Tasks: 0; Queued blocks: 0; Resources: 0.0 CPU, 781.4KB object store; [all objects local]: (pid=31553) - split(4, equal=True): Tasks: 0; Queued blocks: 0; Resources: 0.0 CPU, 781.4KB object store; [all objects local]: : 100k row [00:00, 174M row/s]
(SplitCoordinator pid=31553) Starting execution of Dataset. Full logs are in /tmp/ray/session_2025-04-04_21-50-23_331031_30293/logs/ray-data
(SplitCoordinator pid=31553) Execution plan of Dataset: InputDataBuffer[Input] -> OutputSplitter[split(4, equal=True)]
(RayTrainWorker pid=31369) INFO:__main__:Epoch 0, batch 0, batch loss = 0.3055

Training finished iteration 1 at 2025-04-04 21:50:54. Total running time: 8s
╭────────────────────────────────────────────────╮
│ Training result                                │
├────────────────────────────────────────────────┤
│ checkpoint_dir_name          checkpoint_000000 │
│ time_this_iter_s                       6.49972 │
│ time_total_s                           6.49972 │
│ training_iteration                           1 │
│ avg_loss                               0.25023 │
│ epoch                                        0 │
│ epoch_duration_sec                     2.97126 │
│ throughput_samples_per_sec          8413.93192 │
╰────────────────────────────────────────────────╯
Training saved a checkpoint for iteration 1 at: (local)/home/ec2-user/ray_results/TorchTrainer_2025-04-04_21-50-45/TorchTrainer_8907d_00000_0_2025-04-04_21-50-45/checkpoint_000000
(RayTrainWorker pid=31369) INFO:__main__:[Epoch 0] Avg loss: 0.2502, Samples: 25000, Duration: 2.97s, Throughput: 8413.93 samples/sec
(RayTrainWorker pid=31369) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/home/ec2-user/ray_results/TorchTrainer_2025-04-04_21-50-45/TorchTrainer_8907d_00000_0_2025-04-04_21-50-45/checkpoint_000000)
(SplitCoordinator pid=31553) Starting execution of Dataset. Full logs are in /tmp/ray/session_2025-04-04_21-50-23_331031_30293/logs/ray-data
(SplitCoordinator pid=31553) Execution plan of Dataset: InputDataBuffer[Input] -> OutputSplitter[split(4, equal=True)]
(pid=31553) ✔️  Dataset execution finished in 0.01 seconds: : 25.0k row [00:00, 242k row/s]

(pid=31553) - split(4, equal=True): Tasks: 0; Queued blocks: 0; Resources: 0.0 CPU, 781.4KB object store; [all objects local]: (pid=31553) - split(4, equal=True): Tasks: 0; Queued blocks: 0; Resources: 0.0 CPU, 781.4KB object store; [all objects local]: (pid=31553) - split(4, equal=True): Tasks: 0; Queued blocks: 0; Resources: 0.0 CPU, 781.4KB object store; [all objects local]: (pid=31553) - split(4, equal=True): Tasks: 0; Queued blocks: 0; Resources: 0.0 CPU, 781.4KB object store; [all objects local]: : 100k row [00:00, 964k row/s]

Training finished iteration 2 at 2025-04-04 21:50:55. Total running time: 10s
Training saved a checkpoint for iteration 5 at: (local)/home/ec2-user/ray_results/TorchTrainer_2025-04-04_21-50-45/TorchTrainer_8907d_00000_0_2025-04-04_21-50-45/checkpoint_000004
2025-04-04 21:51:00,258	WARNING experiment_state.py:206 -- Experiment state snapshotting has been triggered multiple times in the last 5.0 seconds and may become a bottleneck. A snapshot is forced if `CheckpointConfig(num_to_keep)` is set, and a trial has checkpointed >= `num_to_keep` times since the last snapshot.
You may want to consider increasing the `CheckpointConfig(num_to_keep)` or decreasing the frequency of saving checkpoints.
You can suppress this warning by setting the environment variable TUNE_WARN_EXCESSIVE_EXPERIMENT_CHECKPOINT_SYNC_THRESHOLD_S to a smaller value than the current threshold (5.0). Set it to 0 to completely suppress this warning.

Training completed after 5 iterations at 2025-04-04 21:51:01. Total running time: 15s
2025-04-04 21:51:01,320	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/home/ec2-user/ray_results/TorchTrainer_2025-04-04_21-50-45' in 0.0028s.

Training complete.
Final loss: 0.24995868369579316
Shutting down Ray...
(RayTrainWorker pid=31368) INFO:__main__:Epoch 4 started [repeated 8x across cluster]
(RayTrainWorker pid=31368) INFO:__main__:Epoch 4, batch 750, batch loss = 0.2489 [repeated 102x across cluster]
(RayTrainWorker pid=31368) INFO:__main__:[Epoch 4] Avg loss: 0.2502, Samples: 25000, Duration: 1.49s, Throughput: 16763.69 samples/sec [repeated 4x across cluster]
(RayTrainWorker pid=31368) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/home/ec2-user/ray_results/TorchTrainer_2025-04-04_21-50-45/TorchTrainer_8907d_00000_0_2025-04-04_21-50-45/checkpoint_000004) [repeated 4x across cluster]

Ray Data Log

ec2-user@ip-172-31-41-154$
ec2-user@ip-172-31-41-154$ cat /tmp/ray/session_2025-04-04_21-50-23_331031_30293/logs/ray-data/ray-data.log  | grep locality
2025-04-04 21:50:51,989	DEBUG streaming_executor.py:111 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf), exclude_resources=ExecutionResources(cpu=5.0, gpu=0.0, object_store_memory=0.0B), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=True)
2025-04-04 21:50:54,293	DEBUG streaming_executor.py:111 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf), exclude_resources=ExecutionResources(cpu=5.0, gpu=0.0, object_store_memory=0.0B), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=True)
2025-04-04 21:50:55,788	DEBUG streaming_executor.py:111 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf), exclude_resources=ExecutionResources(cpu=5.0, gpu=0.0, object_store_memory=0.0B), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=True)
2025-04-04 21:50:57,276	DEBUG streaming_executor.py:111 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf), exclude_resources=ExecutionResources(cpu=5.0, gpu=0.0, object_store_memory=0.0B), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=True)
2025-04-04 21:50:58,762	DEBUG streaming_executor.py:111 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf), exclude_resources=ExecutionResources(cpu=5.0, gpu=0.0, object_store_memory=0.0B), locality_with_output=True, preserve_order=False, actor_locality_enabled=True, verbose_progress=True)
ec2-user@ip-172-31-41-154$

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

…on (#52000)   ## Why are these changes needed?  Ray Train Release test: Add locality_with_output, actor_locality_enabled option ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com> Signed-off-by: srinathk10 <68668616+srinathk10@users.noreply.github.com>

raulchen

Can you also add a test to check locality_hints are properly set for streaming_split when actor_locality is disabled.

python/ray/data/_internal/iterator/stream_split_iterator.py

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

justinvyu

I think we should:

Make the "streaming_split_locality" an argument in ray.train.DataConfig. If it's true, then pass in the Train worker nodes as locality hints to streaming_split.
Remove all usage of locality_with_output and actor_locality_enabled in the OutputSplitter / Ray Train DataConfig code.
- locality_with_output is now a config that ONLY affects TaskPoolMapOperator.
- actor_locality_enabled is now a config that ONLY affects ActorPoolMapOperator.
- Also, maybe in the future, individual map operators can be configured separately from each other, since right now it affects all operators globally.

Setting streaming_split_locality = True, and locality_with_output = True guarantees that ALL map tasks outputs end up on one of the Train workers, since every map task gets scheduled on the Train worker nodes. But this also means that all other nodes will be underutilized. It is probably not great to affect the scheduling of ALL map operators globally.

Setting streaming_split_locality = True, and locality_with_output = False will have a lower "hit rate," and a locality hit just depends on the pipeline's last map task getting randomly scheduled on a Train worker. (All map tasks before the last one don't matter for streaming split locality.) However, there will be better cluster utilization due to spreading out tasks across nodes with no Train worker.

python/ray/data/_internal/iterator/stream_split_iterator.py

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

python/ray/data/_internal/execution/operators/output_splitter.py

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

python/ray/train/_internal/data_config.py

python/ray/air/tests/test_new_dataset_config.py

python/ray/data/_internal/iterator/stream_split_iterator.py

release/train_tests/benchmark/config.py

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

python/ray/data/_internal/execution/operators/output_splitter.py

Signed-off-by: Hao Chen <chenh1024@gmail.com>

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

python/ray/train/constants.py

justinvyu · 2025-04-08T22:05:03Z

python/ray/train/_internal/data_config.py

+            streaming_split_locality: If it's true, then pass in the Train worker nodes
+                as locality hints to streaming_split operations. On by default.


Suggested change

streaming_split_locality: If it's true, then pass in the Train worker nodes

as locality hints to streaming_split operations. On by default.

streaming_split_locality: If it's true, then pass in the Train worker nodes

as locality hints to streaming_split operations,

which prefers assigning data shards to Train workers

located on the same node as the ready data.

On by default.

@matthewdeng What do you think about the name? Users may not know about streaming_split details, but maybe that's ok since this is also a more advanced config.

How about we just prefix it with an underscore, and add [Advanced] to the start of the docstring?

good point. The current name and doc probably have too much implementation details. what about this:

enable_split_locality: If true, when splitting the datasets across Train workers, locality will be considered to minimize cross-node data transfer.

enable_shard_locality is also a good name.

release/train_tests/benchmark/config.py

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

python/ray/train/_internal/data_config.py

python/ray/data/_internal/planner/plan_udf_map_op.py

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

… locality_with_output (ray-project#52005) ## Why are these changes needed? `OutputSplitter._locality_hints` semantics needs to be kept separate from `locality_with_output` and `actor_locality_enabled`, so each of these can be set explicitly. To achieve this, 1. Make the `streaming_split_locality` an argument in `ray.train.DataConfig`. If it's true, then pass in the Train worker nodes as locality hints to `streaming_split`. 2. Remove all usage of `locality_with_output` and `actor_locality_enabled` in the OutputSplitter / Ray Train DataConfig code. - `locality_with_output` is now a config that ONLY affects `TaskPoolMapOperator`. - `actor_locality_enabled` is now a config that ONLY affects `ActorPoolMapOperator`. - Also, maybe in the future, individual map operators can be configured separately from each other, since right now it affects all operators globally. 3. Setting `streaming_split_locality` = True, and `locality_with_output` = True guarantees that ALL map tasks outputs end up on one of the Train workers, since every map task gets scheduled on the Train worker nodes. But this also means that all other nodes will be underutilized. It is probably not great to affect the scheduling of ALL map operators globally. 4. Setting `streaming_split_locality` = True, and `locality_with_output` = False will have a lower "hit rate," and a locality hit just depends on the pipeline's last map task getting randomly scheduled on a Train worker. (All map tasks before the last one don't matter for streaming split locality.) However, there will be better cluster utilization due to spreading out tasks across nodes with no Train worker.  Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com> Signed-off-by: srinathk10 <68668616+srinathk10@users.noreply.github.com> Signed-off-by: Hao Chen <chenh1024@gmail.com> Signed-off-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Hao Chen <chenh1024@gmail.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Steve Han <stevehan2001@gmail.com>

actor_locality_enabled should not be used to configure OutputSplitter…

2279fc7

… locality hints Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

srinathk10 changed the base branch from master to srinathk10-train_benchmark_locality_hints April 5, 2025 04:57

srinathk10 changed the base branch from srinathk10-train_benchmark_locality_hints to master April 5, 2025 04:57

Merge branch 'master' into srinathk10-train_cleanup_locality_hints

ec5d226

srinathk10 added actor-based-usecase go add ONLY when ready to merge, run all tests labels Apr 5, 2025

srinathk10 marked this pull request as ready for review April 5, 2025 04:59

srinathk10 requested a review from a team as a code owner April 5, 2025 04:59

srinathk10 and others added 4 commits April 6, 2025 05:34

Fixes

2964c03

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

Merge branch 'master' into srinathk10-train_cleanup_locality_hints

1f2b769

Merge branch 'master' into srinathk10-train_cleanup_locality_hints

d0bd26e

raulchen reviewed Apr 7, 2025

View reviewed changes

python/ray/data/_internal/iterator/stream_split_iterator.py Show resolved Hide resolved

srinathk10 added 2 commits April 8, 2025 16:54

Addressed review comments

fde914a

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

Lint

db12254

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

justinvyu reviewed Apr 8, 2025

View reviewed changes

raulchen reviewed Apr 8, 2025

View reviewed changes

python/ray/data/_internal/iterator/stream_split_iterator.py Outdated Show resolved Hide resolved

Addressed review comments

8538c87

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

srinathk10 requested review from hongpeng-guo, matthewdeng and woshiyyya as code owners April 8, 2025 19:24

srinathk10 changed the title ~~actor_locality_enabled should not be used to configure OutputSplitter.locality_hints~~ Separate OutputSplitter._locality_hints from actor_locality_enabled & locality_with_output Apr 8, 2025

justinvyu reviewed Apr 8, 2025

View reviewed changes

python/ray/data/_internal/execution/operators/output_splitter.py Show resolved Hide resolved

Addressed review comments

5dbe78d

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

raulchen approved these changes Apr 8, 2025

View reviewed changes

srinathk10 and others added 3 commits April 8, 2025 21:33

Addressed review comments

f9794b6

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

Addressed review comments

0edd31f

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

Merge branch 'master' into srinathk10-train_cleanup_locality_hints

2e72a15

raulchen approved these changes Apr 8, 2025

View reviewed changes

python/ray/data/_internal/execution/operators/output_splitter.py Outdated Show resolved Hide resolved

raulchen and others added 2 commits April 8, 2025 14:52

Update python/ray/data/_internal/execution/operators/output_splitter.py

ca33f2d

Signed-off-by: Hao Chen <chenh1024@gmail.com>

Addressed review comments

ec1aeb6

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

raulchen enabled auto-merge (squash) April 8, 2025 21:57

justinvyu reviewed Apr 8, 2025

View reviewed changes

Addressed review comments

47f6f81

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

github-actions bot disabled auto-merge April 8, 2025 22:50

Merge branch 'master' into srinathk10-train_cleanup_locality_hints

30ea0d0

justinvyu approved these changes Apr 8, 2025

View reviewed changes

python/ray/train/_internal/data_config.py Outdated Show resolved Hide resolved

python/ray/data/_internal/planner/plan_udf_map_op.py Show resolved Hide resolved

justinvyu and others added 6 commits April 8, 2025 16:23

Update python/ray/train/_internal/data_config.py

1cf0138

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

Addressed review comments

2aabfff

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

Merge branch 'master' into srinathk10-train_cleanup_locality_hints

0545a12

Addressed review comments

352ea27

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

Addressed comments

3d1b80a

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>

Merge branch 'master' into srinathk10-train_cleanup_locality_hints

13fb443

raulchen enabled auto-merge (squash) April 9, 2025 21:50

raulchen merged commit 66de974 into master Apr 9, 2025
6 checks passed

raulchen deleted the srinathk10-train_cleanup_locality_hints branch April 9, 2025 22:14

cszhu added the data Ray Data-related issues label Apr 11, 2025

hainesmichaelc added the community-backlog label May 22, 2025

justinvyu mentioned this pull request Feb 17, 2026

[Data] Remove locality_with_output #61044

Merged

		streaming_split_locality: If it's true, then pass in the Train worker nodes
		as locality hints to streaming_split operations. On by default.

Conversation

srinathk10 commented Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

srinathk10 commented Apr 5, 2025

Uh oh!

raulchen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justinvyu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

justinvyu Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

justinvyu Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

matthewdeng Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

raulchen Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

srinathk10 Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

srinathk10 commented Apr 5, 2025 •

edited

Loading

justinvyu left a comment •

edited

Loading