Add HyperCLOVAX SEED Think 14B by bigshanedogg · Pull Request #44956 · huggingface/transformers

bigshanedogg · 2026-03-23T19:34:30Z

What does this PR do?

Adds native Transformers support for HyperCLOVA X SEED Think 14B, a 14.74B-parameter Korean reasoning LLM developed by NAVER Cloud.

related issue: Add HyperCLOVA X SEED Think 14B #44957

Architecture

LLaMA-style decoder-only transformer with two modifications:

Peri-Layer Normalization (use_post_norm): an extra RMSNorm is applied after each
sub-layer output (both attention and MLP), in addition to the standard pre-norm.
Maximal Update Parametrization (μP): four per-config scaling factors replace fixed constants:
- attention_multiplier — replaces 1/sqrt(head_dim) in attention
- residual_multiplier — scales each sub-layer output before adding to the residual stream
- embedding_multiplier — scales the token embedding output
- logits_scaling — scales final logits before softmax / sampling

Implementation approach

Following the maintainer's guidance in #44957, this PR uses the modular system (modular_hyperclovax.py) to minimise LOC and make the diff easy to review-iterate. (Roughly 59% of lines are generated rather than manually maintained.)

The maintainer suggested inheriting the decoder layer with post-norms from GLM4. After evaluation, Granite was chosen as the decoder layer base instead, for the following reasons:

use_post_norm is optional (False by default). GLM4's decoder layer has post-norms always on — inheriting from it would require logic to conditionally disable post_self_attn_layernorm / post_mlp_layernorm, adding complexity rather than reducing it.
Granite's decoder layer already provides residual_multiplier (always-active MuP). When use_post_norm=False, HyperCLOVAXDecoderLayer is identical to GraniteDecoderLayer — zero extra code.
Using GLM4 would require adding both residual_multiplier and conditionally disabling its built-in norms — two changes in opposite directions for no net gain in code reuse.

All other modules (RMSNorm, MLP, Attention, etc.) are inherited from Granite unchanged. The modular file is a few hundred LOC as suggested.

Benchmark validation

Tasks	Metric	vLLM	this PR
hellaswag (non-think)	acc_norm	0.6521	0.6666
gsm8k (non-think)	flexible-extract	0.9151	0.9188

External support

Huggingface hub: naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
Technical report: arXiv 2506.22403
vLLM upstream: vllm-project/vllm#37107 (merged 2026-03-16)

Code Agent Policy

I confirm that this is not a pure code agent PR.

A code agent was used for mechanical tasks such as aligning docstrings and comments. The core implementation was written by the submitter directly, who has reviewed every changed line and personally run the tests including benchmark validation.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
- Add HyperCLOVA X SEED Think 14B #44957
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

@strict

Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5 compatibility. The upstream remote code config does not handle empty initialization (text_config=None), which breaks v5's @strict config validation added in huggingface/transformers#41250. Fixes: vllm-project#38387 TODO: Remove vendored config once HyperCLOVAX is upstreamed to transformers. Tracking PR: huggingface/transformers#44956 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bigshanedogg · 2026-03-29T21:58:25Z

@zucchini-nlp ,
Following your suggestion, I implemented this in a modular way by inheriting from Granite, incorporated the changes from #44957, and completed benchmark validation.

All CI checks have completed, except for one job that is still pending its status report.
Would it be okay to request a review at this stage?

bigshanedogg

This is a self-review of the key changes in this PR.

bigshanedogg · 2026-03-29T23:51:47Z

+    attention_multiplier: float | None = None
+    residual_multiplier: float | None = None
+    embedding_multiplier: float | None = None
+    logits_scaling: float | None = None


These fields also exist in Granite, but are defined here due to a different default values.
Although they are present in config.json, if not explicitly declared, the dynamic default value setting in post_init will not be applied.

This part has been removed based on the modification noted in the comment below, except for attention_multiplier.

bigshanedogg · 2026-03-29T23:52:15Z

+        # Peri-Layer Normalization: additional RMSNorm after each sub-layer output
+        if self.use_post_norm:
+            self.post_norm1 = HyperCLOVAXRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+            self.post_norm2 = HyperCLOVAXRMSNorm(config.hidden_size, eps=config.rms_norm_eps)


When self.use_post_norm is True,
post_norm for both attention and MLP are declared separately to match the Peri-LN structure.
Since there is a branch on self.use_post_norm, Granite is inherited instead of GLM4
(field similarity with Granite was also greater).

@strict

Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5 compatibility. The upstream remote code config does not handle empty initialization (text_config=None), which breaks v5's @strict config validation added in huggingface/transformers#41250. Fixes: vllm-project#38387 TODO: Remove vendored config once HyperCLOVAX is upstreamed to transformers. Tracking PR: huggingface/transformers#44956 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fang Han <fhan0520@gmail.com>

zucchini-nlp

Great work on applying modular! I left a few comments on what can be deleted because it's already auto-resolved by modular

Other than that we're fine. After addressing the comments, will request core maintainer review and we'll merge

zucchini-nlp · 2026-04-02T15:50:48Z

+        hidden_states = outputs.last_hidden_state
+        slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep
+        # MuP: multiply logits by logits_scaling (cf. GraniteForCausalLM which divides)
+        logits = self.lm_head(hidden_states[:, slice_indices, :]) * self.config.logits_scaling


can we adjust scaling, so we can copy fully? For ex in config self.logits_scaling = 1 / self.logits_scaling

Good idea!
However, I'm a bit concerned that storing the inverted value in Config.logits_scaling could cause confusion,
since users inspecting config.json would see a different value than what's actually used in the forward pass.
Would it be okay to keep the explicit * self.config.logits_scaling in forward for clarity, even if it means a small override?

zucchini-nlp · 2026-04-02T15:53:47Z

run-slow: hyperclovax

github-actions · 2026-04-02T15:55:18Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/hyperclovax"]
quantizations: []

HuggingFaceDocBuilderDev · 2026-04-02T16:03:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-04-02T16:14:23Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	7d1b9113	workflow commit (merge commit)
PR	6aa22bc3	branch commit (from PR)
main	bb803105	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

bigshanedogg · 2026-04-03T03:02:39Z

@zucchini-nlp,
Thank you for the thorough review!
I've addressed all the feedback and removed quite a few unnecessary lines. For the logits_scaling part, I've left an additional comment as I wasn't sure if it might cause confusion.
The model behavior has been verified to remain unchanged after the edits.

Some of the failed tests appear to be outside the scope of this PR (e.g., VibeVoiceAsrForConditionalGenerationModelTest).
I will investigate the remaining cases related to HyperCLOVAX.

zucchini-nlp

Nice, to fix the CI you need to run make fix-repo. I merged main which will fix unrelated failures, and requestd a core maintainer's review

zucchini-nlp · 2026-04-07T13:32:20Z

@@ -0,0 +1,27 @@
+# Copyright 2025 The HuggingFace Team. All rights reserved.


a few files left wrt 2026 😄

zucchini-nlp · 2026-04-07T13:40:23Z

run-slow: hyperclovax

zucchini-nlp

Oke, seeing a bad rebase with unrelated diff 😄 and a tiny change in rope doc. I will pass-over the latest diff after the bad rebase is fixed, and prob a core maintainer will pass over soon

bigshanedogg · 2026-04-10T08:48:31Z

@zucchini-nlp ,
I've incorporated the suggested changes and reverted to your last reviewed commit (c025d918).
Really appreciate you taking the time to look into this!

zucchini-nlp · 2026-04-10T09:16:32Z

@bigshanedogg , one tiny unrelated diff left-out. And vasqu will come to review next week :)

vasqu · 2026-04-21T13:10:37Z

Sorry for all the delays, will be taking a look today!!

vasqu

Only some nits tbh, looks overall super good! Let's sync with main and fixup the last details 🤗

vasqu · 2026-04-21T16:29:50Z

+    @unittest.skip(
+        "In TP mode, Float8 quantization derives scales per shard rather than globally, "
+        "so each TP rank observes different weight magnitudes than the full-weight non-TP "
+        "baseline. HyperCLOVAX's Peri-Layer Normalization (post_norm1/post_norm2) amplifies "
+        "this discrepancy past the 75% token-match threshold. Skipped pending an upstream fix."
+    )
+    @is_tensor_parallel_test
+    def test_tp_generation_quantized(self):
+        pass


Interesting, cc @3outeille @SunMarc just for viz

@strict

Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5 compatibility. The upstream remote code config does not handle empty initialization (text_config=None), which breaks v5's @strict config validation added in huggingface/transformers#41250. With the vendored config registered, vLLM uses the local class instead of the broken remote code, so we can lift the max_transformers_version cap that was added in tests/models/registry.py to skip this model on v5. Also fix the unreachable hidden_size n_embd fallback per gemini-code-assist review: the text_config_attribute_map remap pops n_embd before the fallback would ever be checked. Read hidden_size from the instantiated text_config object instead. Fixes: vllm-project#38387 TODO: Remove vendored config once HyperCLOVAX is upstreamed to transformers. Tracking PR: huggingface/transformers#44956 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Fang Han <fhan0520@gmail.com>

bigshanedogg

@vasqu , Thank you for the detailed comments!

I've addressed the points you mentioned in your review.
Please let me know if I've missed anything or if there's anything else you'd like me to address.

vasqu · 2026-05-06T17:17:24Z

hey @bigshanedogg 👋 I will review tomorrow, gh currently has issues so I can't create any reviews or new comments 😢 just wanted to keep you in the loop

bigshanedogg · 2026-05-07T08:21:07Z

Thanks for letting me know! No rush at all — take your time. 🙂
I'm back up to speed on this PR, so whenever you're ready to review, I'll be quick to follow up.
Feel free to flag anything that needs further work.

vasqu

Very nicely done!! 🫡 let me take care of CI slow tests on our side but will merge in a bit

vasqu · 2026-05-07T17:38:43Z

run-slow: hyperclovax

github-actions · 2026-05-07T17:39:39Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, hyperclovax

github-actions · 2026-05-07T17:40:23Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/hyperclovax"]
quantizations: []

github-actions · 2026-05-07T17:55:11Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	eab381a5	workflow commit (merge commit)
PR	4bab7410	branch commit (from PR)
main	ebac5e52	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

vasqu · 2026-05-07T18:13:44Z

@bigshanedogg congrats on the merge!! 🤗

zucchini-nlp · 2026-05-07T19:25:18Z

Thanks everyone, we will work on the VLM now that the lm backbone is merged!

@bigshanedogg would be great if you could also update the README on the hub, I am seeing that it sets trust_remote_code=True atm

bigshanedogg · 2026-05-07T21:35:04Z

@vasqu Thank you for review and the additional commits on the test code!

@zucchini-nlp Along with the README update you mentioned, I'll also push a minor update to fix chat_template.jinja. I'll get it updated as soon as I have approval and follow up with the #44314 work.

* feat: hyperclovax * fix: import and doc date * updated tests --------- Co-authored-by: vasqu <antonprogamer@gmail.com>

bigshanedogg mentioned this pull request Mar 23, 2026

Add HyperCLOVA X SEED Think 14B #44957

Open

2 tasks

DarkLight1337 mentioned this pull request Mar 28, 2026

[Transformers v5] HCXVisionForCausalLM vllm-project/vllm#38387

Closed

This was referenced Mar 29, 2026

[Transformers v5] Vendor HCXVisionConfig for compatibility vllm-project/vllm#38447

Merged

AutoConfig.register() ignored when trust_remote_code=True and auto_map is present #45093

Closed

bigshanedogg force-pushed the feat/hyperclovax branch from b31ff44 to ef1e73f Compare March 29, 2026 13:51

bigshanedogg marked this pull request as ready for review March 29, 2026 21:57

github-actions Bot requested review from ArthurZucker and Rocketknight1 March 29, 2026 21:57

bigshanedogg changed the title ~~[WIP] Add HyperCLOVAX model~~ Add HyperCLOVAX model Mar 29, 2026

bigshanedogg commented Mar 29, 2026

View reviewed changes

Rocketknight1 mentioned this pull request Mar 30, 2026

add HyperCLOVA X SEED Vision Instruct 3B #45099

Open

2 tasks

bigshanedogg changed the title ~~Add HyperCLOVAX model~~ Add HyperCLOVAX SEED Think 14B Mar 31, 2026

zucchini-nlp reviewed Apr 2, 2026

View reviewed changes

bigshanedogg force-pushed the feat/hyperclovax branch from 6aa22bc to a0f82ba Compare April 3, 2026 02:14

bigshanedogg force-pushed the feat/hyperclovax branch from a0f82ba to 9c3fd14 Compare April 4, 2026 01:49

zucchini-nlp approved these changes Apr 7, 2026

View reviewed changes

zucchini-nlp requested review from vasqu and removed request for ArthurZucker and Rocketknight1 April 7, 2026 13:34

zucchini-nlp reviewed Apr 9, 2026

View reviewed changes

Comment thread .github/workflows/trl-ci-bot.yml Outdated

Comment thread docs/source/en/internal/rope_utils.md

bigshanedogg force-pushed the feat/hyperclovax branch from 29df799 to 331ed88 Compare April 10, 2026 06:28

zucchini-nlp reviewed Apr 10, 2026

View reviewed changes

Comment thread src/transformers/models/blip/image_processing_blip.py

bigshanedogg force-pushed the feat/hyperclovax branch from 9600edb to d5a0472 Compare April 12, 2026 06:06

vasqu approved these changes Apr 21, 2026

View reviewed changes

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

feat: hyperclovax

fa3494a

bigshanedogg force-pushed the feat/hyperclovax branch from d5a0472 to fa3494a Compare May 6, 2026 05:06

fix: import and doc date

514df6b

bigshanedogg commented May 6, 2026

View reviewed changes

tarekziade mentioned this pull request May 7, 2026

Add HyperCLOVAX SEED Think 14B tarekziade/tarekziade-transformers-reviewer-test#15

Open

6 tasks

vasqu approved these changes May 7, 2026

View reviewed changes

vasqu added the New model label May 7, 2026

updated tests

4bab741

vasqu added this pull request to the merge queue May 7, 2026

Merged via the queue into huggingface:main with commit 0f90c8e May 7, 2026
30 checks passed

louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request May 11, 2026

Add HyperCLOVAX SEED Think 14B (huggingface#44956)

586039b

* feat: hyperclovax * fix: import and doc date * updated tests --------- Co-authored-by: vasqu <antonprogamer@gmail.com>

		@@ -0,0 +1,27 @@
		# Copyright 2025 The HuggingFace Team. All rights reserved.

Conversation

bigshanedogg commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Architecture

Implementation approach

Benchmark validation

External support

Code Agent Policy

Before submitting

Uh oh!

bigshanedogg commented Mar 29, 2026

Uh oh!

bigshanedogg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bigshanedogg Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

bigshanedogg Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

bigshanedogg Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

bigshanedogg Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

CI Results

Commit Info

Uh oh!

bigshanedogg commented Apr 3, 2026

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Apr 7, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bigshanedogg commented Apr 10, 2026

Uh oh!

Uh oh!

zucchini-nlp commented Apr 10, 2026

Uh oh!

vasqu commented Apr 21, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

bigshanedogg commented Mar 23, 2026 •

edited

Loading

zucchini-nlp left a comment •

edited

Loading