Skip to content

Non-record: 11L PR315 Backout + Native FA3 RunPod (val_bpb=1.1247)#394

Open
greqone wants to merge 1 commit intoopenai:mainfrom
greqone:codex/pr315-backout-fa3-nonrecord
Open

Non-record: 11L PR315 Backout + Native FA3 RunPod (val_bpb=1.1247)#394
greqone wants to merge 1 commit intoopenai:mainfrom
greqone:codex/pr315-backout-fa3-nonrecord

Conversation

@greqone
Copy link
Copy Markdown

@greqone greqone commented Mar 22, 2026

Summary

  • add a non-record 10-minute-track submission folder for a faithful RunPod 8xH100 SXM PR315-style run plus Backout
  • include the exact training log, self-contained train_gpt.py, requirements.txt, submission.json, and README
  • package this as a non-record entry because the current live public frontier is already slightly below this score and this submission does not include a significance set for a new record claim

Result

  • exact sliding-window metric: val_bpb = 1.12467423
  • exact sliding-window loss: 1.89896029
  • total artifact bytes in this packaged folder: 15,545,662
  • hardware: 8xH100 SXM on RunPod with native Hopper FlashAttention and torch.compile

Notes

  • the original experiment used a sibling flash_attn_interface.py; for this submission folder that helper is inlined into train_gpt.py so the package is self-contained and closer to the repo guidance that counted code should live in train_gpt.py
  • this is intentionally filed under records/track_non_record_16mb/...

Copilot AI review requested due to automatic review settings March 22, 2026 03:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new non-record 10-minute / 16MB artifact-cap submission folder under records/track_non_record_16mb, packaging a self-contained train_gpt.py snapshot plus run artifacts for an 8xH100 SXM (RunPod) run using native FlashAttention (FA3) and torch.compile.

Changes:

  • Add a self-contained training script (train_gpt.py) with inlined FlashAttention interface logic, Backout residual, and sliding-window evaluation.
  • Include exact run artifacts (train.log) and metadata (submission.json) for the reported val_bpb=1.12467423.
  • Add reproducibility notes (README.md) and a minimal dependency list (requirements.txt).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/train_gpt.py Self-contained training + export + int6 quant + sliding-window eval script for the submission run.
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/train.log Captured training/eval log for the submitted run.
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/submission.json Leaderboard-style metadata for the non-record entry.
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/requirements.txt Dependencies needed to reproduce locally (per repo guidance).
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/README.md Run description, artifact accounting, and reproduction command.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

(int(k.split(".")[1]) for k in state_dict if k.startswith("blocks.")),
default=0,
) + 1
late_k_layers = set(range(num_layers_total - 2, num_layers_total))
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late_k_layers is computed but never used, which makes the quantization logic harder to follow and suggests a partially removed feature. Consider deleting it or wiring it into the intended “late-K passthrough” behavior so the code matches the stated design.

Suggested change
late_k_layers = set(range(num_layers_total - 2, num_layers_total))

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +3
train_gpt_submit.py — Submission v2: wider MLP + STE int6 QAT + MTP + seq2048 + NTK RoPE +
fp16 embed + late-K passthrough + sliding window eval.
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring refers to train_gpt_submit.py and lists features (e.g., “fp16 embed”, “MTP”) that don’t clearly match the actual defaults/implementation in this file. This can be confusing when someone audits the submission; consider updating the docstring to reflect the actual filename and the concrete features enabled in this snapshot (or remove the feature list).

Suggested change
train_gpt_submit.pySubmission v2: wider MLP + STE int6 QAT + MTP + seq2048 + NTK RoPE +
fp16 embed + late-K passthrough + sliding window eval.
Training script for GPT models used in parameter-golf submissions.
This module's behavior is defined by the hyperparameters and options below; refer to
the code and configuration flags instead of this docstring for an authoritative list
of enabled features.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +63
def _load_system_flash_attn_interface():
for entry in sys.path:
if not entry:
continue
try:
resolved = Path(entry).resolve()
except OSError:
continue
candidate = resolved / "flash_attn_interface.py"
if not candidate.exists() or candidate.resolve() == here:
continue
if repo_root in candidate.resolve().parents:
continue
spec = importlib.util.spec_from_file_location("_system_flash_attn_interface", candidate)
if spec is None or spec.loader is None:
continue
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
fn = getattr(module, "flash_attn_func", None)
if callable(fn):
return fn
return None
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_load_system_flash_attn_interface() dynamically locates and executes an arbitrary flash_attn_interface.py from sys.path. This is a code-execution footgun (and can make runs non-reproducible if sys.path differs). Consider removing this path-walk entirely, or gating it behind an explicit env var that points to a known file and validating it’s in an expected location (e.g., site-packages) before importing.

Copilot uses AI. Check for mistakes.
except OSError:
continue
candidate = resolved / "flash_attn_interface.py"
if not candidate.exists() or candidate.resolve() == here:
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _load_system_flash_attn_interface, the check candidate.resolve() == here will never be true because candidate is flash_attn_interface.py while here is train_gpt.py. If the intent is to avoid importing a repo-local helper, consider removing this condition (the subsequent repo_root parent check already covers it) or comparing against the actual helper path.

Suggested change
if not candidate.exists() or candidate.resolve() == here:
if not candidate.exists():

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants