Skip to content

Get token streaming working #2

@simonw

Description

@simonw

This proved a bit tricky, because the MLC library works based on a callback mechanism:

from mlc_chat import ChatModule
from mlc_chat.callback import StreamToStdout

cm = ChatModule(model="Llama-2-7b-chat-hf-q4f16_1")
cm.generate(
   prompt="A poem about a bunny eating lunch",
   progress_callback=StreamToStdout(callback_interval=1),
)

But... LLM expects to be able to do something like this:

for chunk in cm.generate(...):
    yield chunk

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions