[Bugfix] Fix flaky entrypoint logitproc test forced to spawn - CI failures#37484
[Bugfix] Fix flaky entrypoint logitproc test forced to spawn - CI failures#37484wojciech-wais wants to merge 3 commits into
Conversation
The test_custom_logitsprocs[ENTRYPOINT] test patches importlib.metadata.entry_points and relies on fork to propagate the patch to worker processes. However, _maybe_force_spawn() overrides to spawn when CUDA is already initialized (common in CI after earlier tests), so the patch is lost in the spawned workers and the logit processor is never applied. Fix by running the entrypoint test in-process mode instead. The entrypoint discovery mechanism is identical in-process — the FQCN and CLASS source tests already validate multi-process propagation. Signed-off-by: Wojciech Wais <wojciech.wais@gmail.com>
There was a problem hiding this comment.
Code Review
The changes effectively resolve the flakiness in the test_custom_logitsprocs[ENTRYPOINT] test. By setting VLLM_ENABLE_V1_MULTIPROCESSING to "0" and disabling the environment variable cache, the test is forced to run in in-process mode. This correctly addresses the issue where the importlib.metadata.entry_points patch was lost when _maybe_force_spawn() overrode the multiprocessing method to spawn in CI environments with initialized CUDA. The solution is direct and appropriate for the identified problem.
|
This pull request has merge conflicts that must be resolved before it can be |
|
Closing as superseded: main's #42040 (df2636a) already fixes this exact flaky LOGITPROC_SOURCE_ENTRYPOINT test, using a more robust spawn-compatible dist-info registration (setup_fake_entrypoint) that also works on XPU/ROCm. This PR's in-process/fork-avoidance approach is mutually exclusive and no longer needed. |
The test_custom_logitsprocs[ENTRYPOINT] test patches importlib.metadata.entry_points and relies on fork to propagate the patch to worker processes. However, _maybe_force_spawn() overrides to spawn when CUDA is already initialized (common in CI after earlier tests), so the patch is lost in the spawned workers and the logit processor is never applied.
Fix by running the entrypoint test in-process mode instead. The entrypoint discovery mechanism is identical in-process — the FQCN and CLASS source tests already validate multi-process propagation.
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.