Skip to content

Suspicious JournalFileStorage behavior with GrpcProxyStorage in a multi-threading setup #6172

@gen740

Description

@gen740

Expected behavior

The enqueued trial number and trial id number should be the same.

Environment

  • Optuna version: 4.4.0.dev
  • Python version: 3.12.10
  • OS: Linux-6.14.10-orbstack-00291-g1b252bd3edea-aarch64-with-glibc2.40
  • (Optional) Other libraries and their versions:

Error messages, stack traces, or logs

num_enqueued = 50,  len(trial_id_set) = 47
Traceback (most recent call last):
  File "/Users/gen/Projects/Optuna/optuna/workdir/test.py", line 27, in <module>
    assert len(trial_id_set) == num_enqueued
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Steps to reproduce

Following is the modified version of the study test.

from optuna import create_study
from optuna.storages import GrpcStorageProxy
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completed

num_enqueued = 50
study = create_study(
    storage=GrpcStorageProxy(
        host="localhost",
        port=13000,
    )
)
for i in range(num_enqueued):
    study.enqueue_trial({"i": i})

trial_id_set = set()

with ThreadPoolExecutor(100) as pool:
    futures = []
    for i in range(num_enqueued):
        future = pool.submit(study._pop_waiting_trial_id)
        futures.append(future)

    for future in as_completed(futures):
        trial_id_set.add(future.result())
print(f"{num_enqueued = },  {len(trial_id_set) = }")
assert len(trial_id_set) == num_enqueued

This test seems fine on the CI, but increasing the ThreadPool size, it fails.
This happened only on Linux. I cannot reproduce this on Mac.

Additional context (optional)

This issue emerged when I tried to parallelize the CI test in #6170.
For now, I have skipped this test on grpc_journal_file in #6170.

I think this issue would be related to #6084.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions