server: add prompt processing progress streaming for /completion endpoint #14685#14728
Conversation
…oint - Add server_task_result_cmpl_progress struct for streaming progress updates - Implement send_progress_response() function for real-time progress reporting - Send progress info during prompt processing phase before token generation - Support all compatibility modes (non-OAI, OAI completion, OAI chat) - Include detailed progress data: n_past, n_prompt_tokens, n_prompt_tokens_processed, progress percentage - Only send progress updates in streaming mode (stream: true) - Maintains backward compatibility with existing clients Closes ggml-org#14685
ngxson
left a comment
There was a problem hiding this comment.
Things we need to consider:
- This is not an official OAI spec, we should only send it if user explicitly request for that
- Maybe reuse
server_task_result_cmpl_partialinstead of having to create a dedicated response type.
No, this will breaks clients which assume the first response to be non-empty. |
|
As the feature requester, I would be happy with a flag in the request to send processing progress, default false. |
On backward compatibility: I understand that some clients may assume the first response contains content. I will reuse the existing server_task_result_cmpl_partial structure and make sure the progress field is only included when explicitly requested. Does that work ? |
|
Yes, and a small line explaining how it works is appreciated. It should be added to the documentation at |
#14685
Closes #14685
Make sure to read the contributing guidelines before submitting a PR