Skip to content

fp16 tied embedding + warmdown/LR tuning (val_bpb 1.2197)#42

Merged
0hq merged 1 commit intoopenai:mainfrom
chonchiog:submission
Mar 19, 2026
Merged

fp16 tied embedding + warmdown/LR tuning (val_bpb 1.2197)#42
0hq merged 1 commit intoopenai:mainfrom
chonchiog:submission

Conversation

@chonchiog
Copy link
Copy Markdown
Contributor

Hey! Noticed the tied embedding takes a huge hit from int8 quantization since it doubles as the output head. Keeping it in fp16 during export basically kills the quant gap (~0.007 down to ~0.0005 BPB) for just ~500KB extra.

To fit under 16MB I trimmed the MLP hidden from 1024 to 992 — barely noticeable quality-wise but frees up the space. Also bumped warmdown to 3600 and matrix LR to 0.06 since the default schedule doesn't really line up with the actual step count you get in 10 min.

8xH100 SXM results (RunPod secure):

seed steps val_loss val_bpb size
1337 13,692 2.0595 1.2197 15.90MB
42 13,722 2.0600 1.2201 15.90MB

Also ran 3 seeds on 8xH200 — all land in the 1.216–1.218 range.

Improvement over baseline: ~0.013 nats across runs.

Logs for both seeds included. Happy to answer any questions.

keep tok_emb.weight in fp16 during int8 export (kills the quant gap),
shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600
and matrix LR to 0.06.

tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@0hq
Copy link
Copy Markdown
Contributor

0hq commented Mar 19, 2026

This is great! Thanks for being the first challenge submission, adding now with some review.

@chonchiog
Copy link
Copy Markdown
Contributor Author

@0hq awesome!! So excited! Waiting for the runpod credits to keep researching and upgrading my model. Had to spend some money out of pocket to test this but so worth it. :)

@0hq 0hq merged commit a5eb9ed into openai:main Mar 19, 2026
maxivione pushed a commit to maxivione/parameter-golf that referenced this pull request Mar 20, 2026
keep tok_emb.weight in fp16 during int8 export (kills the quant gap),
shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600
and matrix LR to 0.06.

tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
scottspace pushed a commit to scottspace/parameter-golf that referenced this pull request Mar 21, 2026
keep tok_emb.weight in fp16 during int8 export (kills the quant gap),
shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600
and matrix LR to 0.06.

tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nedcut pushed a commit to nedcut/parameter-golf that referenced this pull request Mar 26, 2026
keep tok_emb.weight in fp16 during int8 export (kills the quant gap),
shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600
and matrix LR to 0.06.

tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gHashTag added a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
openai#42)

Anchor: phi^2 + phi^-2 = 3.

Lands the manifest contract that 'tri railway restore' (RW-DR-01 openai#25)
and the lock writer (RW-DR-04 openai#28) consume.

Files:

  schemas/restore-fleet.schema.json
    JSON Schema Draft 2020-12 for the manifest. Strictly closed
    (additionalProperties: false) so unknown fields fail fast in CI
    rather than getting silently ignored at restore time. Pattern-bounds
    on identifiers (project name [A-Za-z0-9._-]+, service name
    ^[a-z0-9][a-z0-9-]*$ to satisfy Railway DNS), version pinned to
    const: 1, services minItems: 1.

  restore-fleet.json
    Updated to schema v1. Six trainer-seed entries (42, 43, 44, 100,
    101, 102) — covers both lane A (champion-fineweb) and lane B
    (mirror-2). shared_vars includes L_R8_SYNTHETIC_FALLBACK=FORBID
    per the issue acceptance criteria.

  schemas/fixtures/{valid_minimal,invalid_v0,invalid_uppercase_service,
                    invalid_empty_services}.json
    Positive/negative fixtures. Filename prefix (valid_/invalid_) is
    the ground truth read by the validation workflow.

  .github/workflows/manifest-validate.yml
    Runs on every PR and push that touches the manifest, schema, or
    fixtures. Three checks:
      1. The schema itself is a valid Draft 2020-12 schema
         (Draft202012Validator.check_schema).
      2. restore-fleet.json validates against the schema.
      3. Each fixture under schemas/fixtures/ behaves as labelled
         (valid_* must pass, invalid_* must fail).

Acceptance criteria from openai#26:

  [x] restore-fleet.json committed to repo root
  [x] schema can be parsed by serde_json round-trip without warnings
      (provable in openai#25 via the manifest_parses_v1 test)
  [x] shared_vars includes L_R8_SYNTHETIC_FALLBACK=FORBID
  [x] Mirror-2 account seeds (100/101/102) covered
  [x] serde_json::from_str round-trips (validated locally with
      jsonschema; CI now enforces it)

Closes openai#26. Refs openai#25 (RW-DR-01), openai#28 (RW-DR-04), openai#143.

Co-authored-by: Perplexity Computer <computer@perplexity.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants