Skip to content

[Ray Data] Dataset.random_sample() does not return deterministic results even when seed is set  #40406

@keerthanvasist

Description

@keerthanvasist

What happened + What you expected to happen

When I used ray.data.Dataset.random_sample with seed, I expect deterministic results.

d.random_sample(0.2, seed=1234).count()
Out[8]: 2525
d.random_sample(0.2, seed=1234).count()
Out[9]: 2492
d.random_sample(0.2, seed=1234).count()
Out[10]: 2502
d.random_sample(0.2, seed=1234).count()
Out[11]: 2529
d.random_sample(0.2, seed=1234).count()
Out[12]: 2474

Versions / Dependencies

Python 3.10.12
Ray 2.6.3 (Also checked with 2.7.1)

Reproduction script

Shared above.

Issue Severity

High

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tdataRay Data-related issuesray-2.11

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions