Summary
This RFC proposes removing the existing /v1/batches and /v1/files endpoints from the main OpenAI-compatible server and replacing them with a standalone offline batch processing service.
Note: As part of the ongoing OpenAI API refactor, the batch support has already been removed from the main server. This RFC serves to document the rationale and formalize the replacement plan.
Problem
7.1 Fundamental Issues with the Current Batch API (#7068 )
The current design for online batch processing is flawed and not production-safe. Key issues include:
- Server Stability Risk: Uploading and processing thousands of requests at once can overwhelm online API servers.
- Timing Constraints: Difficult to enforce
completion_window in a real-time environment.
- Resource Contention: Batch jobs run alongside latency-sensitive requests without proper isolation.
- Architecture Mismatch: Batch workloads are inherently asynchronous/offline, conflicting with the synchronous nature of standard OpenAI endpoints.
Proposed Solution
1. Simplify Online Endpoints
- Remove logic for handling list-wrapped input in
/v1/chat/completions, /v1/embeddings, etc.
- Accept only single request per HTTP call (OpenAI spec-compliant).
- Cleaner code and better performance for common-case usage.
2. Split Out Batch Service
Implement batch processing as a separate offline job runner, modeled after how vLLM does it.
This batch runner will:
- Accept batch jobs in OpenAI-compatible
.jsonl format
- Spawn a new process/container to handle the job
- Stream output to a results file (local or presigned S3 URLs)
- Optionally enforce
completion_window guarantees in the background
3. Remove from Main Server
- Remove
/v1/batches and /v1/files routes from the main OpenAI-compatible HTTP server.
- These should live in a separate service (
batch-runner) to enforce separation of concerns.
📌 Action Items
Summary
This RFC proposes removing the existing
/v1/batchesand/v1/filesendpoints from the main OpenAI-compatible server and replacing them with a standalone offline batch processing service.Problem
7.1 Fundamental Issues with the Current Batch API (#7068 )
The current design for online batch processing is flawed and not production-safe. Key issues include:
completion_windowin a real-time environment.Proposed Solution
1. Simplify Online Endpoints
/v1/chat/completions,/v1/embeddings, etc.2. Split Out Batch Service
Implement batch processing as a separate offline job runner, modeled after how vLLM does it.
This batch runner will:
.jsonlformatcompletion_windowguarantees in the background3. Remove from Main Server
/v1/batchesand/v1/filesroutes from the main OpenAI-compatible HTTP server.batch-runner) to enforce separation of concerns.📌 Action Items