See for example here.
More helpful (because the actual types are really helpful to read and reason about the code, and because autocomplete and code analysis tools can help much better) would be something like:
from typing import Generator
def _handle_streaming_request(self, prompt: llm.Prompt, headers: dict[str,str], payload: dict[str,str], model_name: str) -> Generator[str, None, None]:
...
This pattern repeats many times, and I think it would make the code much shorter and easier to work with changing this.
What do you think?