Skip to content

[Bugfix] Fix flaky entrypoint logitproc test forced to spawn - CI failures#37484

Closed
wojciech-wais wants to merge 3 commits into
vllm-project:mainfrom
wojciech-wais:fix/custom-logitproc-entrypoint-test
Closed

[Bugfix] Fix flaky entrypoint logitproc test forced to spawn - CI failures#37484
wojciech-wais wants to merge 3 commits into
vllm-project:mainfrom
wojciech-wais:fix/custom-logitproc-entrypoint-test

Conversation

@wojciech-wais

@wojciech-wais wojciech-wais commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

The test_custom_logitsprocs[ENTRYPOINT] test patches importlib.metadata.entry_points and relies on fork to propagate the patch to worker processes. However, _maybe_force_spawn() overrides to spawn when CUDA is already initialized (common in CI after earlier tests), so the patch is lost in the spawned workers and the logit processor is never applied.

Fix by running the entrypoint test in-process mode instead. The entrypoint discovery mechanism is identical in-process — the FQCN and CLASS source tests already validate multi-process propagation.

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

wojciech-wais and others added 2 commits March 18, 2026 22:37
The test_custom_logitsprocs[ENTRYPOINT] test patches
importlib.metadata.entry_points and relies on fork to propagate
the patch to worker processes. However, _maybe_force_spawn()
overrides to spawn when CUDA is already initialized (common in
CI after earlier tests), so the patch is lost in the spawned
workers and the logit processor is never applied.

Fix by running the entrypoint test in-process mode instead. The
entrypoint discovery mechanism is identical in-process — the FQCN
and CLASS source tests already validate multi-process propagation.

Signed-off-by: Wojciech Wais <wojciech.wais@gmail.com>
@mergify mergify Bot added v1 bug Something isn't working labels Mar 18, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The changes effectively resolve the flakiness in the test_custom_logitsprocs[ENTRYPOINT] test. By setting VLLM_ENABLE_V1_MULTIPROCESSING to "0" and disabling the environment variable cache, the test is forced to run in in-process mode. This correctly addresses the issue where the importlib.metadata.entry_points patch was lost when _maybe_force_spawn() overrode the multiprocessing method to spawn in CI environments with initialized CUDA. The solution is direct and appropriate for the identified problem.

ivanium added a commit to ivanium/vllm that referenced this pull request Apr 2, 2026
ivanium added a commit to ivanium/vllm that referenced this pull request Apr 6, 2026
@mergify

mergify Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wojciech-wais.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 23, 2026
huangyibo pushed a commit to huangyibo/vllm that referenced this pull request Jun 4, 2026
@wojciech-wais

Copy link
Copy Markdown
Contributor Author

Closing as superseded: main's #42040 (df2636a) already fixes this exact flaky LOGITPROC_SOURCE_ENTRYPOINT test, using a more robust spawn-compatible dist-info registration (setup_fake_entrypoint) that also works on XPU/ROCm. This PR's in-process/fork-avoidance approach is mutually exclusive and no longer needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant