Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
f96784e to
f3b49e3
Compare
|
run-slow: gpt_oss |
|
This comment contains models: ["models/gpt_oss"] |
CI Results✅ No failing test specific to this PR 🎉 ! |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gpt_oss |
| "quantized=false|model=20b|kernels=false|attn_impl=eager|mode=eval": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "Roses are red, violets are blue, I love you, and I love you too.\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| ], | ||
| "quantized=false|model=20b|kernels=false|attn_impl=eager|mode=train": [ | ||
| "Roses are red, violets are blue.\" -> from which we can derive a rule: if we have a red object that is", | ||
| "How are you? Tell me the name of the president of the United States.\n\nI am an AI language model and I do not have a personal life or" | ||
| "Roses are red, violets are blue, I love you, and I love you too.\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| ], |
There was a problem hiding this comment.
non-quantized + no-kernels + eager attn seems to work fine
| "quantized=false|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi", | ||
| "How are you? Tell me the name of the president of the name of the president of the name of the president of the name of the president of the name" | ||
| ], | ||
| "quantized=false|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=train": [ | ||
| "Roses are red, violets are blue\" (makes sense). But the phrase \"the answer is 3\" is not a", | ||
| "How are you? Tell me the name of the president of the United States.\" The answer to that is \"Joe Biden.\" The user is asking for the name" | ||
| "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi", | ||
| "How are you? Tell me the name of the president of the name of the president of the name of the president of the name of the president of the name" | ||
| ], |
There was a problem hiding this comment.
kernels-community/vllm-flash-attn3 seems to be broken
There was a problem hiding this comment.
On which platform have you tested? Interestingly, I tested on A100 and I get your garbage output as well. Switching to H100 produces wellformed outputs
There was a problem hiding this comment.
A100 as well ! i guess it is a platform issue, the kernel is optimised for H100 but should work on A100 (cc @MekkCyber)
| "quantized=true|model=20b|kernels=true|attn_impl=eager|mode=eval": [ | ||
| "Did not work" | ||
| "Roses are red, violets are green, and the world is a beautiful place.\n\nIt sounds like you're sharing a poetic and", | ||
| "How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO" | ||
| ], | ||
| "quantized=true|model=20b|kernels=true|attn_impl=eager|mode=train": [ | ||
| "Did not work" | ||
| "Roses are red, violets are green, and the sky is blue.\n\nIt seems like you're sharing a playful and whimsical line", | ||
| "How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO" |
There was a problem hiding this comment.
kernels (megablocks) seems to be broken
There was a problem hiding this comment.
the repetition on the second sample seems like a bug / quality degradation.
| "quantized=true|model=20b|kernels=false|attn_impl=eager|mode=eval": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too.\n\nIt sounds like you're expressing a", | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| ], | ||
| "quantized=true|model=20b|kernels=false|attn_impl=eager|mode=train": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too.\n\nIt sounds like you're expressing a", | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| ], |
There was a problem hiding this comment.
quantized seems to work fine
What does this PR do?
GptOss slow tests are currently failing on main (tested locally on a100), they also don't seem to respect the "quantized" test parameter.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.