GptOss slow tests by IlyasMoutawwakil · Pull Request #43246 · huggingface/transformers

IlyasMoutawwakil · 2026-01-13T08:58:01Z

What does this PR do?

GptOss slow tests are currently failing on main (tested locally on a100), they also don't seem to respect the "quantized" test parameter.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2026-01-13T09:07:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

IlyasMoutawwakil · 2026-01-13T14:29:02Z

run-slow: gpt_oss

github-actions · 2026-01-13T14:30:16Z

This comment contains run-slow, running the specified jobs:

models: ["models/gpt_oss"]
quantizations: []

github-actions · 2026-01-13T15:37:56Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

github-actions · 2026-01-16T11:44:59Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss

IlyasMoutawwakil · 2026-01-16T11:58:57Z

tests/fixtures/gpt_oss/integration_tests.json

  "quantized=false|model=20b|kernels=false|attn_impl=eager|mode=eval": [
-    "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio",
+    "Roses are red, violets are blue, I love you, and I love you too.\n\nRoses are red, vio",
    "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
  ],
  "quantized=false|model=20b|kernels=false|attn_impl=eager|mode=train": [
-    "Roses are red, violets are blue.\" -> from which we can derive a rule: if we have a red object that is",
-    "How are you? Tell me the name of the president of the United States.\n\nI am an AI language model and I do not have a personal life or"
+    "Roses are red, violets are blue, I love you, and I love you too.\n\nRoses are red, vio",
+    "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
  ],


non-quantized + no-kernels + eager attn seems to work fine

IlyasMoutawwakil · 2026-01-16T11:59:28Z

tests/fixtures/gpt_oss/integration_tests.json

  "quantized=false|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [
-    "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio",
-    "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
+    "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
+    "How are you? Tell me the name of the president of the name of the president of the name of the president of the name of the president of the name"
  ],
  "quantized=false|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=train": [
-    "Roses are red, violets are blue\" (makes sense). But the phrase \"the answer is 3\" is not a",
-    "How are you? Tell me the name of the president of the United States.\" The answer to that is \"Joe Biden.\" The user is asking for the name"
+    "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi",
+    "How are you? Tell me the name of the president of the name of the president of the name of the president of the name of the president of the name"
  ],


kernels-community/vllm-flash-attn3 seems to be broken

On which platform have you tested? Interestingly, I tested on A100 and I get your garbage output as well. Switching to H100 produces wellformed outputs

A100 as well ! i guess it is a platform issue, the kernel is optimised for H100 but should work on A100 (cc @MekkCyber)

IlyasMoutawwakil · 2026-01-16T11:59:53Z

tests/fixtures/gpt_oss/integration_tests.json

  "quantized=true|model=20b|kernels=true|attn_impl=eager|mode=eval": [
-    "Did not work"
+    "Roses are red, violets are green, and the world is a beautiful place.\n\nIt sounds like you're sharing a poetic and",
+    "How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO"
  ],
  "quantized=true|model=20b|kernels=true|attn_impl=eager|mode=train": [
-    "Did not work"
+    "Roses are red, violets are green, and the sky is blue.\n\nIt seems like you're sharing a playful and whimsical line",
+    "How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO"


kernels (megablocks) seems to be broken

Might also be a GPU diff

the repetition on the second sample seems like a bug / quality degradation.

IlyasMoutawwakil · 2026-01-16T12:01:11Z

tests/fixtures/gpt_oss/integration_tests.json

  "quantized=true|model=20b|kernels=false|attn_impl=eager|mode=eval": [
-    "Roses are red, violets are blue, I love you, and I love you too.\n\nIt sounds like you're expressing a",
+    "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio",
    "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
  ],
  "quantized=true|model=20b|kernels=false|attn_impl=eager|mode=train": [
-    "Roses are red, violets are blue, I love you, and I love you too.\n\nIt sounds like you're expressing a",
+    "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio",
    "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for"
  ],


quantized seems to work fine

dequantize

556c504

IlyasMoutawwakil changed the title ~~GptOss non-quantized slow tests~~ GptOss slow tests Jan 13, 2026

distributed as well

f3b49e3

IlyasMoutawwakil force-pushed the gpt-oss-slow-tests branch from f96784e to f3b49e3 Compare January 13, 2026 11:18

Merge branch 'main' into gpt-oss-slow-tests

ea4fd53

IlyasMoutawwakil marked this pull request as ready for review January 13, 2026 17:50

github-actions bot requested a review from ydshieh January 13, 2026 17:50

IlyasMoutawwakil marked this pull request as draft January 13, 2026 17:50

IlyasMoutawwakil and others added 2 commits January 16, 2026 12:01

Merge branch 'main' into gpt-oss-slow-tests

71504dd

update dtype

29e1960

update the expectations

c93bc9f

IlyasMoutawwakil commented Jan 16, 2026

View reviewed changes

IlyasMoutawwakil mentioned this pull request Jan 16, 2026

GptOss experts implementation #43227

Merged

5 tasks

Conversation

IlyasMoutawwakil commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 13, 2026

Uh oh!

IlyasMoutawwakil commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

CI Results

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

IlyasMoutawwakil Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IlyasMoutawwakil commented Jan 13, 2026 •

edited

Loading

IlyasMoutawwakil Jan 16, 2026 •

edited

Loading