Non-record: Shared-weight transformer with extended warmdown (1.1454 val_bpb) by leofeasby · Pull Request #470 · openai/parameter-golf

leofeasby · 2026-03-22T22:33:49Z

This is a non-record submission to the 16MB track.

We study a shared-weight transformer in which a single transformer block is reused across depth (9 passes), forming a recurrent-style stack with U-Net skip connections.

Result:
The model reaches 1.1454 val_bpb after ~2.3 hours on 8×H100, with loss still decreasing at the end of training. Training terminated due to schedule constraints (LR→0), not convergence.

Key observation:
The majority of improvement occurs during extended warmdown. The model continues improving steadily throughout the low-LR phase, with no plateau observed within the explored horizon.

This behaviour is consistent with a regime in which performance is strongly influenced by schedule alignment, potentially more so than parameter capacity for this architecture. We do not claim this as a universal property, but as an observed characteristic of this shared-weight setup.

Notable components:

Shared-core transformer (full weight sharing across depth)
Per-layer scaling (attention, MLP, residual mixing) to break symmetry
U-Net style skip connections across passes
Step-based warmdown control (WARMDOWN_START_STEP) to decouple schedule from wallclock

This submission targets long-horizon optimisation behaviour rather than the 10-minute constraint, and aims to highlight differences in convergence dynamics between shared-weight and standard transformers.

Add non-record shared-core transformer submission (1.1454 bpb)

9f96828

notapplica mentioned this pull request Mar 22, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Shared-weight transformer with extended warmdown (1.1454 val_bpb)#470

Non-record: Shared-weight transformer with extended warmdown (1.1454 val_bpb)#470
leofeasby wants to merge 1 commit intoopenai:mainfrom
leofeasby:shared-weight-nonrecord-clean

leofeasby commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leofeasby commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant