Skip to content

GptOss slow tests#43246

Draft
IlyasMoutawwakil wants to merge 6 commits intomainfrom
gpt-oss-slow-tests
Draft

GptOss slow tests#43246
IlyasMoutawwakil wants to merge 6 commits intomainfrom
gpt-oss-slow-tests

Conversation

@IlyasMoutawwakil
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil commented Jan 13, 2026

What does this PR do?

GptOss slow tests are currently failing on main (tested locally on a100), they also don't seem to respect the "quantized" test parameter.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil IlyasMoutawwakil changed the title GptOss non-quantized slow tests GptOss slow tests Jan 13, 2026
@IlyasMoutawwakil
Copy link
Member Author

run-slow: gpt_oss

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/gpt_oss"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as ready for review January 13, 2026 17:50
@github-actions github-actions bot requested a review from ydshieh January 13, 2026 17:50
@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as draft January 13, 2026 17:50
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss

Comment on lines 110 to 117
"quantized=false|model=20b|kernels=false|attn_impl=eager|mode=eval": [
"Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio",
"Roses are red, violets are blue, I love you, and I love you too.\n\nRoses are red, vio",
"How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
],
"quantized=false|model=20b|kernels=false|attn_impl=eager|mode=train": [
"Roses are red, violets are blue.\" -> from which we can derive a rule: if we have a red object that is",
"How are you? Tell me the name of the president of the United States.\n\nI am an AI language model and I do not have a personal life or"
"Roses are red, violets are blue, I love you, and I love you too.\n\nRoses are red, vio",
"How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-quantized + no-kernels + eager attn seems to work fine

Comment on lines 94 to 101
"quantized=false|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [
"Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio",
"How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
"Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
"How are you? Tell me the name of the president of the name of the president of the name of the president of the name of the president of the name"
],
"quantized=false|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=train": [
"Roses are red, violets are blue\" (makes sense). But the phrase \"the answer is 3\" is not a",
"How are you? Tell me the name of the president of the United States.\" The answer to that is \"Joe Biden.\" The user is asking for the name"
"Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
"How are you? Tell me the name of the president of the name of the president of the name of the president of the name of the president of the name"
],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernels-community/vllm-flash-attn3 seems to be broken

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On which platform have you tested? Interestingly, I tested on A100 and I get your garbage output as well. Switching to H100 produces wellformed outputs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A100 as well ! i guess it is a platform issue, the kernel is optimised for H100 but should work on A100 (cc @MekkCyber)

Comment on lines 54 to +60
"quantized=true|model=20b|kernels=true|attn_impl=eager|mode=eval": [
"Did not work"
"Roses are red, violets are green, and the world is a beautiful place.\n\nIt sounds like you're sharing a poetic and",
"How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO"
],
"quantized=true|model=20b|kernels=true|attn_impl=eager|mode=train": [
"Did not work"
"Roses are red, violets are green, and the sky is blue.\n\nIt seems like you're sharing a playful and whimsical line",
"How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO"
Copy link
Member Author

@IlyasMoutawwakil IlyasMoutawwakil Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernels (megablocks) seems to be broken

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be a GPU diff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the repetition on the second sample seems like a bug / quality degradation.

Comment on lines 46 to 53
"quantized=true|model=20b|kernels=false|attn_impl=eager|mode=eval": [
"Roses are red, violets are blue, I love you, and I love you too.\n\nIt sounds like you're expressing a",
"Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio",
"How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
],
"quantized=true|model=20b|kernels=false|attn_impl=eager|mode=train": [
"Roses are red, violets are blue, I love you, and I love you too.\n\nIt sounds like you're expressing a",
"Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio",
"How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quantized seems to work fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants