Skip to content

[WIP] Record: Hybrid architecture 8L 3:1 GDN/Transformer (val_bpb=1.2093)#651

Draft
phulin wants to merge 1 commit intoopenai:mainfrom
phulin:submit1
Draft

[WIP] Record: Hybrid architecture 8L 3:1 GDN/Transformer (val_bpb=1.2093)#651
phulin wants to merge 1 commit intoopenai:mainfrom
phulin:submit1

Conversation

@phulin
Copy link
Copy Markdown

@phulin phulin commented Mar 24, 2026

Still too big, needs some work on quantization and LOTS of HP optimizations and importation of TTT and some of the other tricks from the top leaderboard solutions. But at least I got something up on this class of models.

@phulin phulin marked this pull request as draft March 24, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant