Skip to content

[Data] Fix ProgressBar with use_ray_tqdm#59996

Merged
bveeramani merged 4 commits intomasterfrom
srinathk10/fix_progress_bar_with_use_ray_tqdm
Jan 13, 2026
Merged

[Data] Fix ProgressBar with use_ray_tqdm#59996
bveeramani merged 4 commits intomasterfrom
srinathk10/fix_progress_bar_with_use_ray_tqdm

Conversation

@srinathk10
Copy link
Copy Markdown
Contributor

@srinathk10 srinathk10 commented Jan 9, 2026

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

[Data] Fix ProgressBar with use_ray_tqdm

  • Fix ProgressBar to honor use_ray_tqdm in DataContext.
  • Note that tqdm_ray is designed to work in non-interactive contexts (workers/actors) by sending JSON progress updates to the driver.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
@srinathk10 srinathk10 requested a review from a team as a code owner January 9, 2026 06:47
@srinathk10 srinathk10 added the go add ONLY when ready to merge, run all tests label Jan 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes the ProgressBar to honor the use_ray_tqdm setting from DataContext, especially in non-interactive terminals. The logic change is sound and the accompanying test updates are thorough. I've added a few comments to further improve code quality and maintainability, including a suggestion to move an import to the top level for better code organization, adding a missing assertion in a test for completeness, and refactoring duplicated logic in the tests to make them more maintainable.


self._use_logging = False

from ray.data.context import DataContext
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To align with PEP 8 and improve code organization, it's best practice to place all imports at the top of the file. Moving from ray.data.context import DataContext to the top-level imports would make the code cleaner and also resolve the redundant import of the same module on line 95.

assert pb._use_logging is False
else:
# Without tqdm_ray, falls back to logging
assert pb._bar is None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This assertion is incomplete. When use_ray_tqdm is False in a non-interactive terminal, _use_logging is set to True in the ProgressBar constructor. You should also assert this behavior to make the test more robust.

Suggested change
assert pb._bar is None
assert pb._bar is None
assert pb._use_logging is True

Comment on lines +165 to +182
if enable_tqdm_ray:
# tqdm_ray works in non-interactive contexts
assert pb._bar is not None
assert pb._use_logging is False
else:
# Without tqdm_ray, falls back to logging
assert pb._bar is None
assert pb._use_logging is True

# Reset mock to clear the "progress bar disabled" log call
mock_logger.info.reset_mock()

# Update progress - should log
pb.update(5)
# Verify logger.info was called exactly twice with expected messages
assert mock_logger.info.call_count == 2
mock_logger.info.assert_any_call("=== Ray Data Progress {test} ===")
mock_logger.info.assert_any_call("test: Progress Completed 5 / 10")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test logic is very similar to test_progress_bar_logging_in_non_interactive_terminal_without_total. There's a lot of duplicated code between these two tests due to the if/else structure for enable_tqdm_ray. To improve maintainability and reduce redundancy, consider refactoring this shared logic. You could either combine them into a single test parametrized with total and the expected log string, or extract the common testing logic into a helper function.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Jan 9, 2026
Copy link
Copy Markdown
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR will reintroduce the issue where Ray Data writes garbled outputs to log files.

Rather than determining what type of progress bar to use in ProgressBar (which could be constructed in an actor), an alternative could be to choose the progress bar type on the driver, and propagate that. I'm not sure how complicated that would be to currently implement, though

Comment on lines +60 to +61
# Exception: tqdm_ray is designed to work in non-interactive contexts
# (workers/actors) by sending JSON progress updates to the driver.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true.

tqdm_ray is designed to not spam the terminal you use a task or actor. It doesn't work with non-interactive terminals (e.g., log files).

For example, here's what I see when I write the output to a log file using this PR:

2026-01-09 10:16:07,305	INFO worker.py:2001 -- Started a local Ray instance.
/Users/balaji/ray/python/ray/_private/worker.py:2040: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
2026-01-09 10:16:08,726	INFO logging.py:397 -- Registered dataset logger for dataset dataset_2_0
2026-01-09 10:16:08,734	INFO streaming_executor.py:182 -- Starting execution of Dataset dataset_2_0. Full logs are in /tmp/ray/session_2026-01-09_10-16-05_275216_89693/logs/ray-data
2026-01-09 10:16:08,734	INFO streaming_executor.py:183 -- Execution plan of Dataset dataset_2_0: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(sleep)->Project] -> AggregateNumRows[AggregateNumRows]
2026-01-09 10:16:08,865	WARNING resource_manager.py:134 -- ⚠️  Ray's object store is configured to use only 4.7% of available memory (2.0GiB out of 42.6GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable.
2026-01-09 10:16:08,865	INFO streaming_executor.py:661 -- [dataset]: A new progress UI is available. To enable, set `ray.data.DataContext.get_current().enable_rich_progress_bars = True` and `ray.data.DataContext.get_current().use_ray_tqdm = False`.

Running Dataset dataset_2_0.: 0.00 row [00:00, ? row/s]

- ReadRange->Map(sleep)->Project:   0%|          | 0.00/1.00 [00:00<?, ? row/s]�[A


- AggregateNumRows:   0%|          | 0.00/1.00 [00:00<?, ? row/s]�[A�[A2026-01-09 10:16:08,883	WARNING resource_manager.py:791 -- Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadRange->Map(sleep)->Project]. The job may hang forever unless the cluster scales up.
2026-01-09 10:16:08,905	WARNING utils.py:33 -- Truncating long operator name to 100 characters. To disable this behavior, set `ray.data.DataContext.get_current().DEFAULT_ENABLE_PROGRESS_BAR_NAME_TRUNCATION = False`.

Running Dataset: dataset_2_0. Active & requested resources: 2/10 CPU, 768.0MiB/1.0GiB object store: : 0.00 row [00:01, ? row/s]
Running Dataset: dataset_2_0. Active & requested resources: 2/10 CPU, 768.0MiB/1.0GiB object store: : 0.00 row [00:01, ? row/s]

- ReadRange->...->Project: Tasks: 2 [backpressured:tasks(ConcurrencyCap)]; Actors: 0; Queued blocks: 98 (0.0B); Resources: 2.0 CPU, 768.0MiB object store:   0%|          | 0.00/1.00 [00:01<?, ? row/s]�[A

- ReadRange->...->Project: Tasks: 2 [backpressured:tasks(ConcurrencyCap)]; Actors: 0; Queued blocks: 98 (0.0B); Resources: 2.0 CPU, 768.0MiB object store:   0%|          | 0.00/1.00 [00:01<?, ? row/s]�[A


- AggregateNumRows: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store:   0%|          | 0.00/1.00 [00:01<?, ? row/s]�[A�[A


- AggregateNumRows: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store:   0%|          | 0.00/1.00 [00:01<?, ? row/s]�[A�[A
Running Dataset: dataset_2_0. Active & requested resources: 10/10 CPU, 80.0B/1.0GiB object store: : 0.00 row [00:02, ? row/s]  
Running Dataset: dataset_2_0. Active & requested resources: 10/10 CPU, 80.0B/1.0GiB object store: : 0.00 row [00:02, ? row/s]

- ReadRange->...->Project: Tasks: 10 [backpressured:tasks(ResourceBudget)]; Actors: 0; Queued blocks: 88 (0.0B); Resources: 10.0 CPU, 80.0B object store:   0%|          | 0.00/1.00 [00:02<?, ? row/s] �[A

- ReadRange->...->Project: Tasks: 10 [backpressured:tasks(ResourceBudget)]; Actors: 0; Queued blocks: 88 (0.0B); Resources: 10.0 CPU, 80.0B object store:   0%|          | 0.00/100 [00:02<?, ? row/s] �[A

- ReadRange->...->Project: Tasks: 10 [backpressured:tasks(ResourceBudget)]; Actors: 0; Queued blocks: 88 (0.0B); Resources: 10.0 CPU, 80.0B object store:   2%|▏         | 2.00/100 [00:02<01:43, 1.06s/ row]�[A

- ReadRange->...->Project: Tasks: 10 [backpressured:tasks(ResourceBudget)]; Actors: 0; Queued blocks: 88 (0.0B); Resources: 10.0 CPU, 80.0B object store:   2%|▏         | 2.00/100 [00:02<01:43, 1.06s/ row]�[A

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Issue was the progress bar broke when when dataset execution was driver from SplitCoordinator. With this PR, it works now.

ec2-user@ip-172-31-41-154$ pytest -s test_data_integration.py::test_parquet_file_shuffle_with_executions
======================================================== test session starts ========================================================
platform linux -- Python 3.10.19, pytest-8.3.3, pluggy-1.5.0
rootdir: /home/ec2-user/myworkspace/ray
configfile: pytest.ini
plugins: anyio-4.6.0, lazy-fixtures-1.4.0
collected 2 items

test_data_integration.py 2026-01-10 00:48:24,254        INFO worker.py:1992 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
Parquet dataset sampling: 100%|████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 5.01 file/s]
2026-01-10 00:48:25,039 INFO parquet_datasource.py:1046 -- Estimated parquet encoding ratio is 0.235.
2026-01-10 00:48:25,039 INFO parquet_datasource.py:1106 -- Estimated parquet reader batch size at 4007393 rows
Parquet dataset sampling: 100%|█████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 501 file/s]
2026-01-10 00:48:25,459 INFO parquet_datasource.py:1046 -- Estimated parquet encoding ratio is 0.235.
2026-01-10 00:48:25,459 INFO parquet_datasource.py:1106 -- Estimated parquet reader batch size at 4007393 rows
(TrainController pid=2204180) Attempting to start training worker group of size 2 with the following resources: [{'CPU': 1}] * 2
(TrainController pid=2204180) Started training worker group of size 2:
(TrainController pid=2204180) - (ip=172.31.41.154, pid=2204315) world_rank=0, local_rank=0, node_rank=0
(TrainController pid=2204180) - (ip=172.31.41.154, pid=2204316) world_rank=1, local_rank=1, node_rank=0
(SplitCoordinator pid=2204466) Registered dataset logger for dataset train1_2_0
(SplitCoordinator pid=2204466) Starting execution of Dataset train1_2_0. Full logs are in /tmp/ray/session_2026-01-10_00-48-21_356543_2203409/logs/ray-data
(SplitCoordinator pid=2204466) Execution plan of Dataset train1_2_0: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2204466) ⚠️  Ray's object store is configured to use only 42.9% of available memory (17.6GiB out of 41.1GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable.
(SplitCoordinator pid=2204466) [dataset]: A new progress UI is available. To enable, set `ray.data.DataContext.get_current().enable_rich_progress_bars = True` and `ray.data.DataContext.get_current().use_ray_tqdm = False`.
(SplitCoordinator pid=2204466) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2204466) ✔️  Dataset train1_2_0 execution finished in 0.40 seconds: 100%|█████████████████████| 150/150 [00:00<00:00, 369 row/s]
(pid=2204466) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204466) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 972.0B object store; [localit
(SplitCoordinator pid=2204466) ✔️  Dataset train1_2_0 execution finished in 0.40 seconds
(SplitCoordinator pid=2204466) Registered dataset logger for dataset train1_2_1
(SplitCoordinator pid=2204466) Starting execution of Dataset train1_2_1. Full logs are in /tmp/ray/session_2026-01-10_00-48-21_356543_2203409/logs/ray-data
(SplitCoordinator pid=2204466) Execution plan of Dataset train1_2_1: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2204466) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2204466) ✔️  Dataset train1_2_1 execution finished in 0.11 seconds: 100%|███████████████████| 135/135 [00:00<00:00, 1.39k row/s]
(pid=2204466) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204466) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 3.5KiB object store; [localit
(SplitCoordinator pid=2204466) ✔️  Dataset train1_2_1 execution finished in 0.11 seconds
(SplitCoordinator pid=2204466) Registered dataset logger for dataset train1_2_2
(SplitCoordinator pid=2204466) Starting execution of Dataset train1_2_2. Full logs are in /tmp/ray/session_2026-01-10_00-48-21_356543_2203409/logs/ray-data
(SplitCoordinator pid=2204466) Execution plan of Dataset train1_2_2: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2204466) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2204466) ✔️  Dataset train1_2_2 execution finished in 0.12 seconds: 100%|███████████████████| 150/150 [00:00<00:00, 1.68k row/s]
(pid=2204466) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204466) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 982.0B object store; [localit
(SplitCoordinator pid=2204466) ✔️  Dataset train1_2_2 execution finished in 0.12 seconds
(SplitCoordinator pid=2204466) Registered dataset logger for dataset train1_2_3
(SplitCoordinator pid=2204466) Starting execution of Dataset train1_2_3. Full logs are in /tmp/ray/session_2026-01-10_00-48-21_356543_2203409/logs/ray-data
(SplitCoordinator pid=2204466) Execution plan of Dataset train1_2_3: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2204466) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2204466) ✔️  Dataset train1_2_3 execution finished in 0.48 seconds: 100%|█████████████████████| 150/150 [00:00<00:00, 302 row/s]
(pid=2204466) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204466) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 1.6KiB object store; [localit
(SplitCoordinator pid=2204466) ✔️  Dataset train1_2_3 execution finished in 0.48 seconds
(SplitCoordinator pid=2204466) Registered dataset logger for dataset train1_2_4
(SplitCoordinator pid=2204466) Starting execution of Dataset train1_2_4. Full logs are in /tmp/ray/session_2026-01-10_00-48-21_356543_2203409/logs/ray-data
(SplitCoordinator pid=2204466) Execution plan of Dataset train1_2_4: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2204466) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2204466) ✔️  Dataset train1_2_4 execution finished in 0.06 seconds: 100%|███████████████████| 75.0/75.0 [00:00<00:00, 795 row/s]
(pid=2204466) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204466) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 3.5KiB object store; [localit
(SplitCoordinator pid=2204466) ✔️  Dataset train1_2_4 execution finished in 0.06 seconds
(pid=2204465) ✔️  Dataset train2_4_0 execution finished in 0.09 seconds: 100%|███████████████████| 150/150 [00:00<00:00, 1.46k row/s]
(pid=2204465) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204465) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 972.0B object store; [localit
(pid=2204465) ✔️  Dataset train2_4_1 execution finished in 0.07 seconds: 100%|███████████████████| 75.0/75.0 [00:00<00:00, 784 row/s]
(pid=2204465) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204465) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
(pid=2204465) ✔️  Dataset train2_4_2 execution finished in 0.07 seconds: 100%|███████████████████| 135/135 [00:00<00:00, 1.38k row/s]
(pid=2204465) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204465) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 3.5KiB object store; [localit
(pid=2204465) ✔️  Dataset train2_4_3 execution finished in 0.07 seconds: 100%|██████████████████| 75.0/75.0 [00:00<00:00, 113k row/s]
(pid=2204465) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204465) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
(pid=2204465) ✔️  Dataset train2_4_4 execution finished in 0.07 seconds: 100%|████████████████████| 135/135 [00:00<00:00, 174k row/s]
(pid=2204465) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2204465) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
.(SplitCoordinator pid=2204465) Registered dataset logger for dataset train2_4_4 [repeated 5x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(SplitCoordinator pid=2204465) Starting execution of Dataset train2_4_4. Full logs are in /tmp/ray/session_2026-01-10_00-48-21_356543_2203409/logs/ray-data [repeated 5x across cluster]
(SplitCoordinator pid=2204465) Execution plan of Dataset train2_4_4: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)] [repeated 5x across cluster]
(SplitCoordinator pid=2204465) ⚠️  Ray's object store is configured to use only 42.9% of available memory (17.6GiB out of 41.1GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable.
(SplitCoordinator pid=2204465) [dataset]: A new progress UI is available. To enable, set `ray.data.DataContext.get_current().enable_rich_progress_bars = True` and `ray.data.DataContext.get_current().use_ray_tqdm = False`.
(SplitCoordinator pid=2204465) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up. [repeated 5x across cluster]
(SplitCoordinator pid=2204465) ✔️  Dataset train2_4_4 execution finished in 0.07 seconds [repeated 5x across cluster]
2026-01-10 00:48:37,551 INFO worker.py:1992 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
Parquet dataset sampling: 100%|████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 5.07 file/s]
2026-01-10 00:48:38,325 INFO parquet_datasource.py:1046 -- Estimated parquet encoding ratio is 0.235.
2026-01-10 00:48:38,326 INFO parquet_datasource.py:1106 -- Estimated parquet reader batch size at 4007393 rows
Parquet dataset sampling: 100%|█████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 491 file/s]
2026-01-10 00:48:38,733 INFO parquet_datasource.py:1046 -- Estimated parquet encoding ratio is 0.235.
2026-01-10 00:48:38,733 INFO parquet_datasource.py:1106 -- Estimated parquet reader batch size at 4007393 rows
(TrainController pid=2205761) Attempting to start training worker group of size 2 with the following resources: [{'CPU': 1}] * 2
(TrainController pid=2205761) Started training worker group of size 2:
(TrainController pid=2205761) - (ip=172.31.41.154, pid=2205896) world_rank=0, local_rank=0, node_rank=0
(TrainController pid=2205761) - (ip=172.31.41.154, pid=2205895) world_rank=1, local_rank=1, node_rank=0
(SplitCoordinator pid=2206046) Registered dataset logger for dataset train1_2_0
(SplitCoordinator pid=2206046) Starting execution of Dataset train1_2_0. Full logs are in /tmp/ray/session_2026-01-10_00-48-34_686687_2203409/logs/ray-data
(SplitCoordinator pid=2206046) Execution plan of Dataset train1_2_0: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2206046) ⚠️  Ray's object store is configured to use only 42.9% of available memory (17.6GiB out of 41.1GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable.
(SplitCoordinator pid=2206046) [dataset]: A new progress UI is available. To enable, set `ray.data.DataContext.get_current().enable_rich_progress_bars = True` and `ray.data.DataContext.get_current().use_ray_tqdm = False`.
(SplitCoordinator pid=2206046) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2206046) ✔️  Dataset train1_2_0 execution finished in 0.46 seconds: 100%|█████████████████████| 150/150 [00:00<00:00, 368 row/s]
(pid=2206046) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206046) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 982.0B object store; [localit
(SplitCoordinator pid=2206046) ✔️  Dataset train1_2_0 execution finished in 0.46 seconds
(SplitCoordinator pid=2206046) Registered dataset logger for dataset train1_2_1
(SplitCoordinator pid=2206046) Starting execution of Dataset train1_2_1. Full logs are in /tmp/ray/session_2026-01-10_00-48-34_686687_2203409/logs/ray-data
(SplitCoordinator pid=2206046) Execution plan of Dataset train1_2_1: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2206046) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2206046) ✔️  Dataset train1_2_1 execution finished in 0.12 seconds: 100%|███████████████████| 150/150 [00:00<00:00, 1.79k row/s]
(pid=2206046) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206046) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 982.0B object store; [localit
(SplitCoordinator pid=2206046) ✔️  Dataset train1_2_1 execution finished in 0.12 seconds
(SplitCoordinator pid=2206046) Registered dataset logger for dataset train1_2_2
(SplitCoordinator pid=2206046) Starting execution of Dataset train1_2_2. Full logs are in /tmp/ray/session_2026-01-10_00-48-34_686687_2203409/logs/ray-data
(SplitCoordinator pid=2206046) Execution plan of Dataset train1_2_2: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2206046) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2206046) ✔️  Dataset train1_2_2 execution finished in 0.49 seconds: 100%|█████████████████████| 135/135 [00:00<00:00, 332 row/s]
(pid=2206046) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206046) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
(SplitCoordinator pid=2206046) ✔️  Dataset train1_2_2 execution finished in 0.49 seconds
(SplitCoordinator pid=2206046) Registered dataset logger for dataset train1_2_3
(SplitCoordinator pid=2206046) Starting execution of Dataset train1_2_3. Full logs are in /tmp/ray/session_2026-01-10_00-48-34_686687_2203409/logs/ray-data
(SplitCoordinator pid=2206046) Execution plan of Dataset train1_2_3: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2206046) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2206046) ✔️  Dataset train1_2_3 execution finished in 0.07 seconds: 100%|███████████████████| 75.0/75.0 [00:00<00:00, 789 row/s]
(pid=2206046) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206046) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
(SplitCoordinator pid=2206046) ✔️  Dataset train1_2_3 execution finished in 0.07 seconds
(SplitCoordinator pid=2206046) Registered dataset logger for dataset train1_2_4
(SplitCoordinator pid=2206046) Starting execution of Dataset train1_2_4. Full logs are in /tmp/ray/session_2026-01-10_00-48-34_686687_2203409/logs/ray-data
(SplitCoordinator pid=2206046) Execution plan of Dataset train1_2_4: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)]
(SplitCoordinator pid=2206046) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up.
(pid=2206046) ✔️  Dataset train1_2_4 execution finished in 0.07 seconds: 100%|███████████████████| 75.0/75.0 [00:00<00:00, 763 row/s]
(pid=2206046) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206046) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
(SplitCoordinator pid=2206046) ✔️  Dataset train1_2_4 execution finished in 0.07 seconds
(pid=2206045) ✔️  Dataset train2_4_0 execution finished in 0.08 seconds: 100%|███████████████████| 150/150 [00:00<00:00, 1.46k row/s]
(pid=2206045) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206045) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 982.0B object store; [localit
(pid=2206045) ✔️  Dataset train2_4_1 execution finished in 0.07 seconds: 100%|██████████████████| 75.0/75.0 [00:00<00:00, 128k row/s]
(pid=2206045) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206045) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
(pid=2206045) ✔️  Dataset train2_4_2 execution finished in 0.07 seconds: 100%|███████████████████| 135/135 [00:00<00:00, 1.30k row/s]
(pid=2206045) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206045) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
(pid=2206045) ✔️  Dataset train2_4_3 execution finished in 0.07 seconds: 100%|██████████████████| 75.0/75.0 [00:00<00:00, 105k row/s]
(pid=2206045) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206045) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
(pid=2206045) ✔️  Dataset train2_4_4 execution finished in 0.07 seconds: 100%|████████████████████| 150/150 [00:00<00:00, 199k row/s]
(pid=2206045) - ReadParquet: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|█| 150/150 [00
(pid=2206045) - split(2, equal=True): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 2.9KiB object store; [localit
.(SplitCoordinator pid=2206045) Registered dataset logger for dataset train2_4_4 [repeated 5x across cluster]
(SplitCoordinator pid=2206045) Starting execution of Dataset train2_4_4. Full logs are in /tmp/ray/session_2026-01-10_00-48-34_686687_2203409/logs/ray-data [repeated 5x across cluster]
(SplitCoordinator pid=2206045) Execution plan of Dataset train2_4_4: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> OutputSplitter[split(2, equal=True)] [repeated 5x across cluster]
(SplitCoordinator pid=2206045) ⚠️  Ray's object store is configured to use only 42.9% of available memory (17.6GiB out of 41.1GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable.
(SplitCoordinator pid=2206045) [dataset]: A new progress UI is available. To enable, set `ray.data.DataContext.get_current().enable_rich_progress_bars = True` and `ray.data.DataContext.get_current().use_ray_tqdm = False`.
(SplitCoordinator pid=2206045) Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadParquet]. The job may hang forever unless the cluster scales up. [repeated 5x across cluster]
(SplitCoordinator pid=2206045) ✔️  Dataset train2_4_4 execution finished in 0.07 seconds [repeated 5x across cluster]


======================================================== 2 passed in 26.76s =========================================================
ec2-user@ip-172-31-41-154$

@bveeramani
Copy link
Copy Markdown
Member

cc @kyuds since this is adjacent to some code you have context on

@kyuds
Copy link
Copy Markdown
Member

kyuds commented Jan 9, 2026

@bveeramani for my part: the part about accessing use_ray_tqdm was implemented before I started touching the codebase. I took everything from the original implementation when there was no "progress manager", etc. The isatty() part was introduced by another person to log progress instead of using interactive elements; this is going to be removed in the future anyways (will be creating a separate LoggingExecutionProgressManager), so I'm good on that part.

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
@bveeramani bveeramani merged commit 0b12788 into master Jan 13, 2026
6 checks passed
@bveeramani bveeramani deleted the srinathk10/fix_progress_bar_with_use_ray_tqdm branch January 13, 2026 07:56
rushikeshadhav pushed a commit to rushikeshadhav/ray that referenced this pull request Jan 14, 2026
- Fix ProgressBar to honor `use_ray_tqdm` in `DataContext`. 
- Note that `tqdm_ray` is designed to work in non-interactive contexts
(workers/actors) by sending JSON progress updates to the driver.

---------

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
jeffery4011 pushed a commit to jeffery4011/ray that referenced this pull request Jan 20, 2026
- Fix ProgressBar to honor `use_ray_tqdm` in `DataContext`.
- Note that `tqdm_ray` is designed to work in non-interactive contexts
(workers/actors) by sending JSON progress updates to the driver.

---------

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
Signed-off-by: jeffery4011 <jefferyshen1015@gmail.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Feb 3, 2026
- Fix ProgressBar to honor `use_ray_tqdm` in `DataContext`. 
- Note that `tqdm_ray` is designed to work in non-interactive contexts
(workers/actors) by sending JSON progress updates to the driver.

---------

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
- Fix ProgressBar to honor `use_ray_tqdm` in `DataContext`.
- Note that `tqdm_ray` is designed to work in non-interactive contexts
(workers/actors) by sending JSON progress updates to the driver.

---------

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
- Fix ProgressBar to honor `use_ray_tqdm` in `DataContext`.
- Note that `tqdm_ray` is designed to work in non-interactive contexts
(workers/actors) by sending JSON progress updates to the driver.

---------

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

3 participants