test: Add dynamo serve TRTLLM example to pytest#1417
Conversation
WalkthroughThe changes introduce a mechanism to selectively skip tests marked with Changes
Sequence Diagram(s)sequenceDiagram
participant Pytest as Pytest Runner
participant Conftest as pytest_collection_modifyitems
participant Test as Test Item
Pytest->>Conftest: Collect test items
Conftest->>Test: Check for 'tensorrtllm' marker
alt '-m tensorrtllm' not specified
Conftest->>Test: Mark as skipped with message
else '-m tensorrtllm' specified
Conftest->>Test: No change
end
sequenceDiagram
participant Test as test_serve_deployment
participant DeploymentGraph as DeploymentGraph
participant ManagedProcess as ManagedProcess
Test->>DeploymentGraph: Select deployment config (may include delayed_start)
Test->>ManagedProcess: Start process with delayed_start
ManagedProcess->>ManagedProcess: Start process
ManagedProcess->>ManagedProcess: Sleep for delayed_start seconds
ManagedProcess->>ManagedProcess: Perform health checks
Test->>ManagedProcess: Send request to endpoint (chat or completions)
ManagedProcess->>Test: Return response
Test->>Test: Validate response
loop repeat_count - 1 times
Test->>ManagedProcess: Send repeated request
ManagedProcess->>Test: Return response
Test->>Test: Validate response
end
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/utils/managed_process.py (1)
63-64: Use proper logging instead of print statement.The delayed start implementation is correct, but consider using the existing logger instead of print for better consistency with the rest of the codebase.
- print(f"delaying start: {self.delayed_start}") + self._logger.info("Delaying start: %s seconds", self.delayed_start)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/conftest.py(1 hunks)tests/serve/test_dynamo_serve.py(5 hunks)tests/utils/deployment_graph.py(1 hunks)tests/utils/managed_process.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: Build and Test - vllm
🔇 Additional comments (7)
tests/utils/deployment_graph.py (1)
32-32: LGTM!The
delayed_startfield addition is well-implemented with appropriate type annotation and sensible default value.tests/conftest.py (1)
35-46: Excellent implementation of selective test execution.The pytest hook correctly implements the intended behavior to skip TensorRT-LLM tests unless explicitly requested via
-m tensorrtllm. The logic is sound and follows pytest conventions.tests/serve/test_dynamo_serve.py (5)
152-199: Verify the delayed start times are intentional.The new TensorRT-LLM deployment graphs are well-configured with appropriate marks and directories. However, the delayed start times of 60 and 120 seconds seem quite long.
Please confirm that these extended delayed start times are necessary for TensorRT-LLM router variants:
trtllm_agg_router: 60 secondstrtllm_disagg_router: 120 secondsThese delays significantly extend test execution time, so verification that they're truly required would be valuable.
226-226: LGTM!Correctly passes the
delayed_startattribute from the deployment graph to the managed process, completing the integration chain.
248-257: LGTM!The new pytest parameters are properly marked with
tensorrtllmand appropriate GPU requirements, enabling selective execution via the pytest hook.
282-290: Excellent refactoring with the helper function.The
check_responsefunction effectively consolidates response validation logic, improving code reusability and maintainability. The implementation correctly handles status code validation, content extraction, and expected response verification.
344-353: Good addition for comprehensive validation.The second request validates that the server remains healthy after the initial request and ensures router mode activation. This enhances test coverage and reliability.
nnshah1
left a comment
There was a problem hiding this comment.
left comment about moving repeat request as a parameter
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/serve/test_dynamo_serve.py (1)
408-414: Verify repeat count testing logicThe repeat request loop correctly uses
payload.repeat_countbut may not provide optimal coverage. Consider logging the iteration count or response variations to ensure the repeated requests are actually exercising different code paths.- for _ in range(payload.repeat_count): + for i in range(payload.repeat_count): response = requests.post( url, json=request_body, timeout=deployment_graph.timeout - elapsed, ) + logger.info("Repeat request %d/%d completed", i+1, payload.repeat_count) check_response(response, response_handler)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/serve/test_dynamo_serve.py(13 hunks)tests/utils/deployment_graph.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/serve/test_dynamo_serve.py (2)
tests/utils/deployment_graph.py (4)
chat_completions_response_handler(49-60)completions_response_handler(63-73)Payload(37-46)DeploymentGraph(21-33)tests/utils/managed_process.py (1)
ManagedProcess(30-232)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (13)
tests/utils/deployment_graph.py (4)
29-32: Enhancement: Multi-endpoint support with delayed start capabilityThe transition from single
endpoint/response_handlerto lists enables testing multiple API endpoints per deployment graph, which aligns well with the PR objective of comprehensive TensorRT-LLM testing. Thedelayed_startfield addresses the race condition mentioned in the past review comments for KV router scenarios.
42-46: Enhanced payload structure for dual API supportThe separation of
payload_chatandpayload_completionsproperly supports both OpenAI chat completions and completions API formats. Therepeat_countfield addresses the past review comment requesting the ability to control request repetition for robustness testing.
49-61: Clear function naming for chat completions APIRenaming to
chat_completions_response_handlerimproves clarity about which API format this function handles, distinguishing it from the new completions handler below.
63-73: Robust completions API response validationThe new handler correctly validates the completions API response structure, which differs from chat completions by expecting
textin choices rather thanmessage.content. This addresses the past review comment about having sanity tests forv1/completions.tests/serve/test_dynamo_serve.py (9)
27-28: Updated imports reflect function renameThe import correctly references the renamed
chat_completions_response_handlerfunction, maintaining consistency with the changes indeployment_graph.py.
35-57: Enhanced multimodal payload structureThe updated structure properly separates chat and completions payloads while adding repeat count control. The payload content remains appropriate for multimodal testing.
72-82: Comprehensive payload support for both API formatsExcellent addition of
payload_completionswith appropriate parameters. Therepeat_count=10provides good robustness testing for the text scenario, addressing the past review comment about request repetition control.
183-196: TensorRT-LLM aggregated deployment configurationThe new deployment graph correctly uses the TensorRT-LLM directory and appropriate markers. The configuration aligns with existing patterns for aggregated deployments.
197-214: Router deployment with delayed start workaroundThe
delayed_start=60with detailed FIXME comment properly addresses the KV router race condition mentioned in past review comments. This temporary solution prevents the non-recoverable state until the underlying issue is resolved.
229-246: Disaggregated router with extended delayThe longer
delayed_start=120for disaggregated router configuration suggests this scenario needs more time for endpoint registration, which makes sense given the distributed nature of disaggregated deployments.
273-273: Proper delayed start parameter propagationThe
delayed_startparameter is correctly passed through to theManagedProcessbase class, ensuring the timing workaround functions as intended.
329-337: Extract response validation into helper functionGood refactoring that centralizes response validation logic, improving maintainability and consistency across multiple endpoint testing.
338-407: Robust multi-endpoint testing with comprehensive error handlingThe refactored test logic properly handles multiple endpoints with appropriate payload selection, retry mechanisms for various error conditions, and timeout management. The specific handling of "no instances" and "Model not found" errors demonstrates good understanding of the deployment failure modes.
Overview:
Updates pytest to test tensorrt_llm examples.
As next step, would add the test to be run on gitlab CI.
Important note: will run TensorRT-LLM specific tests only when tensorrtllm marker is specified.
Details:
Within Dynamo + TensorRT-LLM container on current main:
Summary by CodeRabbit