Skip to content

Add CFG to vllm serving#517

Merged
rlouf merged 1 commit intodottxt-ai:mainfrom
mory91:vllm-cfg
Jan 12, 2024
Merged

Add CFG to vllm serving#517
rlouf merged 1 commit intodottxt-ai:mainfrom
mory91:vllm-cfg

Conversation

@mory91
Copy link
Copy Markdown
Contributor

@mory91 mory91 commented Jan 10, 2024

Hi,
This pull request adds support for CFG in vllm serving.

@rlouf rlouf linked an issue Jan 10, 2024 that may be closed by this pull request
@rlouf rlouf added structured generation Linked to structured generation vLLM Things involving vLLM support labels Jan 10, 2024
Comment thread outlines/serve/serve.py Outdated
# Sets default for the model (`facebook/opt-125m`)
engine = AsyncLLMEngine.from_engine_args(engine_args)

_adapt_tokenizer(engine.engine.tokenizer)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you calling this function here? The result is not used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer is changed inside the function anyways. I now assigned it to the tokenizer though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense. It's not needed however as vLLM handles tokenisation on its end during encoding/decoding.

Copy link
Copy Markdown
Contributor Author

@mory91 mory91 Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed because in here https://github.com/outlines-dev/outlines/blob/fde61a80a58de0401fdecdee7408db53e17ca4f4/outlines/fsm/fsm.py#L345 outlines expects tokenizer to return a list but vllm tokenizers return string.
Also I just realized that this change makes a breaking change to the library. If it's fine by the project directors its fine, if not we might need a change.
for example we can call _adapt_tokenizer inside __init__ functions of the logit processors

@mory91 mory91 force-pushed the vllm-cfg branch 2 times, most recently from 012a0e9 to 281c0af Compare January 11, 2024 23:20
@rlouf
Copy link
Copy Markdown
Member

rlouf commented Jan 12, 2024

Thank you for your contributions! I added some documentation before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

structured generation Linked to structured generation vLLM Things involving vLLM support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CFG guided generation to vLLM integration

2 participants