Skip to content

server: add prompt processing progress streaming for /completion endpoint #14685#14728

Open
baonudesifeizhai wants to merge 1 commit into
ggml-org:masterfrom
baonudesifeizhai:feature/prompt-processing-progress-stream
Open

server: add prompt processing progress streaming for /completion endpoint #14685#14728
baonudesifeizhai wants to merge 1 commit into
ggml-org:masterfrom
baonudesifeizhai:feature/prompt-processing-progress-stream

Conversation

@baonudesifeizhai

Copy link
Copy Markdown

#14685

  • Add server_task_result_cmpl_progress struct for streaming progress updates
  • Implement send_progress_response() function for real-time progress reporting
  • Send progress info during prompt processing phase before token generation
  • Support all compatibility modes (non-OAI, OAI completion, OAI chat)
  • Include detailed progress data: n_past, n_prompt_tokens, n_prompt_tokens_processed, progress percentage
  • Only send progress updates in streaming mode (stream: true)
  • Maintains backward compatibility with existing clients

Closes #14685

Make sure to read the contributing guidelines before submitting a PR

…oint

- Add server_task_result_cmpl_progress struct for streaming progress updates
- Implement send_progress_response() function for real-time progress reporting
- Send progress info during prompt processing phase before token generation
- Support all compatibility modes (non-OAI, OAI completion, OAI chat)
- Include detailed progress data: n_past, n_prompt_tokens, n_prompt_tokens_processed, progress percentage
- Only send progress updates in streaming mode (stream: true)
- Maintains backward compatibility with existing clients

Closes ggml-org#14685

@ngxson ngxson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things we need to consider:

  • This is not an official OAI spec, we should only send it if user explicitly request for that
  • Maybe reuse server_task_result_cmpl_partial instead of having to create a dedicated response type.

@ngxson

ngxson commented Jul 16, 2025

Copy link
Copy Markdown
Collaborator
  • Maintains backward compatibility with existing clients

No, this will breaks clients which assume the first response to be non-empty. prompt_processing is NOT an official OAI spec.

@BradHutchings

Copy link
Copy Markdown

As the feature requester, I would be happy with a flag in the request to send processing progress, default false.

@baonudesifeizhai

Copy link
Copy Markdown
Author
  • Maintains backward compatibility with existing clients

No, this will breaks clients which assume the first response to be non-empty. prompt_processing is NOT an official OAI spec.

On backward compatibility: I understand that some clients may assume the first response contains content. I will reuse the existing server_task_result_cmpl_partial structure and make sure the progress field is only included when explicitly requested. Does that work ?

@ngxson

ngxson commented Jul 16, 2025

Copy link
Copy Markdown
Collaborator

Yes, and a small line explaining how it works is appreciated. It should be added to the documentation at server/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Server stream response for "prompt processing progress"

3 participants