fp16 tied embedding + warmdown/LR tuning (val_bpb 1.2197)#42
Merged
0hq merged 1 commit intoopenai:mainfrom Mar 19, 2026
Merged
Conversation
keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closed
Contributor
|
This is great! Thanks for being the first challenge submission, adding now with some review. |
Contributor
Author
|
@0hq awesome!! So excited! Waiting for the runpod credits to keep researching and upgrading my model. Had to spend some money out of pocket to test this but so worth it. :) |
0hq
approved these changes
Mar 19, 2026
5 tasks
4 tasks
maxivione
pushed a commit
to maxivione/parameter-golf
that referenced
this pull request
Mar 20, 2026
keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6 tasks
scottspace
pushed a commit
to scottspace/parameter-golf
that referenced
this pull request
Mar 21, 2026
keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nedcut
pushed a commit
to nedcut/parameter-golf
that referenced
this pull request
Mar 26, 2026
keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gHashTag
added a commit
to gHashTag/parameter-golf
that referenced
this pull request
Apr 30, 2026
openai#42) Anchor: phi^2 + phi^-2 = 3. Lands the manifest contract that 'tri railway restore' (RW-DR-01 openai#25) and the lock writer (RW-DR-04 openai#28) consume. Files: schemas/restore-fleet.schema.json JSON Schema Draft 2020-12 for the manifest. Strictly closed (additionalProperties: false) so unknown fields fail fast in CI rather than getting silently ignored at restore time. Pattern-bounds on identifiers (project name [A-Za-z0-9._-]+, service name ^[a-z0-9][a-z0-9-]*$ to satisfy Railway DNS), version pinned to const: 1, services minItems: 1. restore-fleet.json Updated to schema v1. Six trainer-seed entries (42, 43, 44, 100, 101, 102) — covers both lane A (champion-fineweb) and lane B (mirror-2). shared_vars includes L_R8_SYNTHETIC_FALLBACK=FORBID per the issue acceptance criteria. schemas/fixtures/{valid_minimal,invalid_v0,invalid_uppercase_service, invalid_empty_services}.json Positive/negative fixtures. Filename prefix (valid_/invalid_) is the ground truth read by the validation workflow. .github/workflows/manifest-validate.yml Runs on every PR and push that touches the manifest, schema, or fixtures. Three checks: 1. The schema itself is a valid Draft 2020-12 schema (Draft202012Validator.check_schema). 2. restore-fleet.json validates against the schema. 3. Each fixture under schemas/fixtures/ behaves as labelled (valid_* must pass, invalid_* must fail). Acceptance criteria from openai#26: [x] restore-fleet.json committed to repo root [x] schema can be parsed by serde_json round-trip without warnings (provable in openai#25 via the manifest_parses_v1 test) [x] shared_vars includes L_R8_SYNTHETIC_FALLBACK=FORBID [x] Mirror-2 account seeds (100/101/102) covered [x] serde_json::from_str round-trips (validated locally with jsonschema; CI now enforces it) Closes openai#26. Refs openai#25 (RW-DR-01), openai#28 (RW-DR-04), openai#143. Co-authored-by: Perplexity Computer <computer@perplexity.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey! Noticed the tied embedding takes a huge hit from int8 quantization since it doubles as the output head. Keeping it in fp16 during export basically kills the quant gap (~0.007 down to ~0.0005 BPB) for just ~500KB extra.
To fit under 16MB I trimmed the MLP hidden from 1024 to 992 — barely noticeable quality-wise but frees up the space. Also bumped warmdown to 3600 and matrix LR to 0.06 since the default schedule doesn't really line up with the actual step count you get in 10 min.
8xH100 SXM results (RunPod secure):
Also ran 3 seeds on 8xH200 — all land in the 1.216–1.218 range.
Improvement over baseline: ~0.013 nats across runs.
Logs for both seeds included. Happy to answer any questions.