Skip to content

[core] Fix error info timestamp unit#48763

Merged
rynewang merged 14 commits intoray-project:masterfrom
dentiny:hjiang/fix-error-info-ts-unit
Nov 26, 2024
Merged

[core] Fix error info timestamp unit#48763
rynewang merged 14 commits intoray-project:masterfrom
dentiny:hjiang/fix-error-info-ts-unit

Conversation

@dentiny
Copy link
Copy Markdown
Contributor

@dentiny dentiny commented Nov 15, 2024

Addresses #48760

Reading through the initial PR (https://github.com/ray-project/ray/pull/2256/files), the timestamp unit for error information is actually seconds.
After refactor PR (#5024), it's refactored from python class into protobuf.

In this PR, I tried to use absl::Time instead of integer as argument, to avoid confusion on units.

Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny requested review from jjyao and rynewang November 15, 2024 23:40
@dentiny dentiny requested a review from a team as a code owner November 15, 2024 23:40
@dentiny dentiny added the go add ONLY when ready to merge, run all tests label Nov 15, 2024
Signed-off-by: dentiny <dentinyhao@gmail.com>
@rynewang
Copy link
Copy Markdown
Contributor

Where do you find the original timestamp is in seconds? I find this and I think this meant to be ms.

https://github.com/ray-project/ray/pull/2256/files#diff-d2f22b8f1bf5f9be47dacae8b467a72ee94629df12ffcc18b13447192ff3dbcfR395

@dentiny
Copy link
Copy Markdown
Contributor Author

dentiny commented Nov 16, 2024

Where do you find the original timestamp is in seconds? I find this and I think this meant to be ms.

https://github.com/ray-project/ray/pull/2256/files#diff-d2f22b8f1bf5f9be47dacae8b467a72ee94629df12ffcc18b13447192ff3dbcfR395

ray/python/ray/utils.py

Lines 80 to 81 in 16a6e45

worker.local_scheduler_client.push_error(
ray.ObjectID(driver_id), error_type, message, time.time())

time.time returns timestamp in seconds.

@dentiny
Copy link
Copy Markdown
Contributor Author

dentiny commented Nov 16, 2024

@rynewang Somehow I think the purpose for the fix is more of "fixate a unit of timestamp" rather than "get the original intention for what timestamp unit we use".

Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny requested a review from rynewang November 16, 2024 07:08
Signed-off-by: dentiny <dentinyhao@gmail.com>
@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Nov 18, 2024
@dentiny
Copy link
Copy Markdown
Contributor Author

dentiny commented Nov 19, 2024

@rynewang Do you have any comments on this PR? Thanks!

@dentiny dentiny requested a review from rynewang November 19, 2024 07:47
@dayshah
Copy link
Copy Markdown
Contributor

dayshah commented Nov 20, 2024

some cpp tests are failing on this

} else {
error_info_ptr->set_error_message(error_msg);
}
error_info_ptr->set_timestamp(absl::ToUnixSeconds(timestamp));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this limits time resolution to seconds, which is not acceptable. let's use milliseconds, or ToDoubleSeconds. Either way, please annotate the proto on what unit it uses.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this limits time resolution to seconds, which is not acceptable.

But I think from the code pointer I referenced, it's intended to be seconds at the very beginning.
Do you mean we should change the semantics, whatever it is now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or ToDoubleSeconds

I'm not sure what do you mean here.
It's made for absl::Duration, instead of absl::Time.
https://github.com/abseil/abseil-cpp/blob/2fcebef792241578e8c9dcfbba5745f834509dbb/absl/time/time.h#L647

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to millisecond as you wish, also add a comment for unit in the protobuf.
But checking the code base, if we want to update the field name, there're way too many places to update in the PR, I leave a TODO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has already been over a week, I strongly prefer to merge it if no other issues.

Signed-off-by: hjiang <hjiang@anyscale.com>
Signed-off-by: hjiang <hjiang@anyscale.com>
Signed-off-by: hjiang <hjiang@anyscale.com>
@dentiny dentiny requested a review from rynewang November 25, 2024 08:55
@rynewang rynewang merged commit 5896b3f into ray-project:master Nov 26, 2024
@dentiny dentiny deleted the hjiang/fix-error-info-ts-unit branch November 26, 2024 21:04
jecsand838 pushed a commit to jecsand838/ray that referenced this pull request Dec 4, 2024
Addresses ray-project#48760

Reading through the initial PR
(https://github.com/ray-project/ray/pull/2256/files), the timestamp unit
for error information is actually seconds.
After refactor PR (ray-project#5024), it's
refactored from python class into protobuf.

In this PR, I tried to use `absl::Time` instead of integer as argument,
to avoid confusion on units.

---------

Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: hjiang <hjiang@anyscale.com>
Co-authored-by: Ruiyang Wang <56065503+rynewang@users.noreply.github.com>
Signed-off-by: Connor Sanders <connor@elastiflow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants