Checklist
Describe the bug
1. Overview
Through refactoring OpenAI-Compatible Server with @CatherineSue. We found that with SGLang v0.4.7.post1, requests including a custom rid field fail under two specific conditions:
Current issue oriented from inconsistencies between the server's adapter and the low-level TokenizerManager. Specifically, the adapter currently forwards rid as a single string, whereas the internal batching logic expects rid as a list of strings matching the batch size.
We temporarily addressed this issue by disabling rid handling in both serving_base and client payloads. However, a comprehensive fix is required, involving adjustments in tokenizer_manager.py and io_struct.py. That's why we open this issue track for future fix.
Additionally, within openai_api/protocol.py, only a single string is accepted for rid. This design choice originally aimed to serve enterprise clients who use internal correlation IDs (e.g., X-Request-ID) for tracking requests via logs and metrics. However, during batch processing, TokenizerManager inherently expects a List[str] for rid, leading to runtime crashes when a scalar is mistakenly treated as a list.
2. Expected Behavior
-
rid should function purely as an opaque correlation ID, analogous to headers such as X-OpenAI-Request-ID from the official API, without affecting batch logic.
-
Both /v1/chat/completions and /v1/embeddings endpoints should function correctly regardless of the value of n or the length of the input.
3. Temporary Workaround
Disable rid in client payloads. Internal testing confirms that removing this field restores normal endpoint functionality.
4. Protential Fixes
| Approach |
Solution |
Pros |
Cons |
| A |
Adapter broadcasts scalar rid to a list of identical values when batch_size > 1. |
Easy fix; preserves external contract. |
Loses per-choice ID granularity. |
| B |
Adapter accepts both str and List[str]; validates length matches batch_size; else returns 400. |
Strict; supports unique per-choice IDs. |
Breaking change for invalid input callers. |
| C |
Remove rid from public schema until server refactor completes. |
Eliminates ambiguity; buys redesign time. |
May break existing clients using rid. |
| D |
Adopt vLLM’s extra_body pattern: move non-OpenAI params (e.g., rid) into a dedicated sub-dict. |
Future-proof; avoids OpenAI spec collisions. |
Requires client changes and deeper refactoring. |
Reproduction
Running test_embedding_openai_server.py with SGLang v0.4.7.post1 resulted in the following error before:

Environment
Under lmsysorg/sglang:v0.4.7.post1-cu124
Checklist
Describe the bug
1. Overview
Through refactoring OpenAI-Compatible Server with @CatherineSue. We found that with SGLang v0.4.7.post1, requests including a custom
ridfield fail under two specific conditions:When parameter
n > 1in the/v1/chat/completionsendpoint.When the input to
/v1/embeddingsis a list of strings.Current issue oriented from inconsistencies between the server's
adapterand the low-levelTokenizerManager. Specifically, the adapter currently forwardsridas a single string, whereas the internal batching logic expectsridas a list of strings matching the batch size.We temporarily addressed this issue by disabling
ridhandling in bothserving_baseand client payloads. However, a comprehensive fix is required, involving adjustments intokenizer_manager.pyandio_struct.py. That's why we open this issue track for future fix.Additionally, within openai_api/protocol.py, only a single string is accepted for
rid. This design choice originally aimed to serve enterprise clients who use internal correlation IDs (e.g.,X-Request-ID) for tracking requests via logs and metrics. However, during batch processing,TokenizerManagerinherently expects aList[str]forrid, leading to runtime crashes when a scalar is mistakenly treated as a list.2. Expected Behavior
ridshould function purely as an opaque correlation ID, analogous to headers such asX-OpenAI-Request-IDfrom the official API, without affecting batch logic.Both
/v1/chat/completionsand/v1/embeddingsendpoints should function correctly regardless of the value ofnor the length of the input.3. Temporary Workaround
Disable
ridin client payloads. Internal testing confirms that removing this field restores normal endpoint functionality.4. Protential Fixes
ridto a list of identical values whenbatch_size > 1.strandList[str]; validates length matchesbatch_size; else returns400.ridfrom public schema until server refactor completes.rid.extra_bodypattern: move non-OpenAI params (e.g.,rid) into a dedicated sub-dict.Reproduction
Running
test_embedding_openai_server.pywith SGLang v0.4.7.post1 resulted in the following error before:Environment
Under lmsysorg/sglang:v0.4.7.post1-cu124