Agent runs are non-deterministic. The published code is from one representative run. The specific model and hyperparameters may differ across runs, but the overall approach is consistent. Within each run, the agent may either version files (e.g., train.py → train_v2.py) or modify them in place, so versioned files do not necessarily represent the complete iteration history. Only Python source files are included; model weights and submission files are not published.
Five competitions from our runs, including AI4Code:
exp-examples/AI4Codeexp-examples/imet-2020-fgvc7exp-examples/billion-word-imputationexp-examples/nfl-player-contact-detectionexp-examples/uw-madison-gi-tract-image-segmentation
Define the following tool for use in the agent:
def validate_submission(
submission_file: str = "submission.csv",
competition: str | None = None,
) -> dict[str, Any]:
"""Validate a submission CSV file format using MLE-Bench.
Args:
submission_file: Name of the submission file in /workspace/ (default: "submission.csv").
Examples: "submission.csv", "submission_v2.csv", "ensemble_pred.csv"
competition: Override competition name (default: uses ctx.competition).
Returns:
dict with 'valid' (bool) and 'message' (str).
"""
submission_path = ctx.workspace_root / submission_file
if not submission_path.exists():
return {"valid": False, "message": f"{submission_file} not found in {ctx.workspace_root}"}
comp = competition or ctx.competition
local_data_path = SETTING.local_data_path
if not local_data_path:
return {"valid": False, "message": "SETTING.local_data_path is empty"}
workspace_mount_path = "/workspace/submission"
if ctx.env_type == "conda":
submission_csv_path = f".volumes/workspace/submission/{submission_file}"
else:
submission_csv_path = f"{workspace_mount_path}/{submission_file}"
command = f"mlebench grade-sample {submission_csv_path} {comp} --data-dir ./zip_files"
try:
ctx.mleb_env.check_output(
entry=command,
local_path=local_data_path,
running_extra_volume={
str(ctx.workspace_root): {"bind": workspace_mount_path, "mode": "ro"},
str(Path("~/.kaggle").expanduser().absolute()): "/root/.kaggle",
},
disable_cache=True,
disable_chmod=True,
)
except Exception as exc:
return {"valid": False, "message": f"Submission invalid: {exc}"}
log_event(ctx, "validate_submission", {"submission_file": submission_file, "valid": True})
return {"valid": True, "message": "Submission is valid."}