-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdataRay Data-related issuesRay Data-related issues
Description
What happened + What you expected to happen
When running Dataset.random_sample() multiple times with the same seed, the resulting dataset is not consistent. We would expect that with a fixed seed, the output dataset is reproducible and deterministic.
Versions / Dependencies
ray master (ray 2.38)
Reproduction script
import ray
ds = ray.data.range(1219)
ds = ds.random_sample(0.1, seed=0)
check1 = ds.count()
print(f"=== Check 1: {check1}")
check2 = ds.count()
print(f"=== Check 2 {check2}")
assert check1 == check2, f"{check1=} vs. {check2=}"
Without the
Issue Severity
Medium: It is a significant difficulty but I can work around it.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdataRay Data-related issuesRay Data-related issues