Skip to content

Non-record: Negative Results — Architecture, TTT Variants, Quantization, and N-gram Cache Illegality#1186

Open
andrewbaggio1 wants to merge 2 commits intoopenai:mainfrom
andrewbaggio1:negative-results-mar31
Open

Non-record: Negative Results — Architecture, TTT Variants, Quantization, and N-gram Cache Illegality#1186
andrewbaggio1 wants to merge 2 commits intoopenai:mainfrom
andrewbaggio1:negative-results-mar31

Conversation

@andrewbaggio1
Copy link
Copy Markdown

Summary

~15 experiments that didn't work or were marginal on the LeakyReLU(0.5)² stack (PR #518 architecture). Documenting these so others don't repeat them.

Architecture:

  • Depth recurrence (Huginn-style): +0.20 BPB — compute overhead dominates
  • TrigramHash: +0.045 BPB — quantization destroys small weights
  • MLP 3.25x: marginal gain but artifact exceeds 16MB
  • XSA-all (11 layers): neutral at MLP 3.0

TTT variants:

  • SGD+momentum: +0.065 BPB worse than AdamW — adaptive LR matters
  • MLP-only TTT: +0.237 BPB — needs meta-learning to work
  • TTT LR floor 0.05: +0.015 BPB — cosine should decay to 0
  • 20 vs 30 epochs: 10 extra epochs worth ~0.03 BPB

Quantization:

  • int5 post-training swap: catastrophic — must train with int5 QAT
  • 1xH100 training: not viable proxy for 8xH100 (8x fewer optimizer steps)

N-gram cache:

Key takeaway

At 16MB, architecture is converged. Eval-time AdamW TTT with cosine LR is the remaining legal lever. Everything else is noise.

Test plan

  • All experiments ran to completion
  • BPB numbers verified from logs
  • Illegal approaches clearly labeled
  • Documentation only — no artifacts or code

🤖 Generated with Claude Code

andrewbaggio1 and others added 2 commits March 25, 2026 11:34
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ments

Documents ~15 experiments that didn't work or were marginal on the
LeakyReLU² stack. Covers depth recurrence, TrigramHash, MLP expansion,
SGD vs AdamW TTT, int5 post-training swap, n-gram cache illegality,
and 1xH100 vs 8xH100 viability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant