GitHub - dalphakr/dalpha-MLE-Bench

Dalpha MLE Bench Artifact Repo

Workspace Artifact

Agent runs are non-deterministic. The published code is from one representative run. The specific model and hyperparameters may differ across runs, but the overall approach is consistent. Within each run, the agent may either version files (e.g., train.py → train_v2.py) or modify them in place, so versioned files do not necessarily represent the complete iteration history. Only Python source files are included; model weights and submission files are not published.

Five competitions from our runs, including AI4Code:

exp-examples/AI4Code
exp-examples/imet-2020-fgvc7
exp-examples/billion-word-imputation
exp-examples/nfl-player-contact-detection
exp-examples/uw-madison-gi-tract-image-segmentation

Valid Submission Tool

Define the following tool for use in the agent:

def validate_submission(
    submission_file: str = "submission.csv",
    competition: str | None = None,
) -> dict[str, Any]:
    """Validate a submission CSV file format using MLE-Bench.

    Args:
        submission_file: Name of the submission file in /workspace/ (default: "submission.csv").
                            Examples: "submission.csv", "submission_v2.csv", "ensemble_pred.csv"
        competition: Override competition name (default: uses ctx.competition).

    Returns:
        dict with 'valid' (bool) and 'message' (str).
    """
    submission_path = ctx.workspace_root / submission_file
    if not submission_path.exists():
        return {"valid": False, "message": f"{submission_file} not found in {ctx.workspace_root}"}

    comp = competition or ctx.competition
    local_data_path = SETTING.local_data_path
    if not local_data_path:
        return {"valid": False, "message": "SETTING.local_data_path is empty"}

    workspace_mount_path = "/workspace/submission"
    if ctx.env_type == "conda":
        submission_csv_path = f".volumes/workspace/submission/{submission_file}"
    else:
        submission_csv_path = f"{workspace_mount_path}/{submission_file}"
    command = f"mlebench grade-sample {submission_csv_path} {comp} --data-dir ./zip_files"

    try:
        ctx.mleb_env.check_output(
            entry=command,
            local_path=local_data_path,
            running_extra_volume={
                str(ctx.workspace_root): {"bind": workspace_mount_path, "mode": "ro"},
                str(Path("~/.kaggle").expanduser().absolute()): "/root/.kaggle",
            },
            disable_cache=True,
            disable_chmod=True,
        )
    except Exception as exc:
        return {"valid": False, "message": f"Submission invalid: {exc}"}

    log_event(ctx, "validate_submission", {"submission_file": submission_file, "valid": True})
    return {"valid": True, "message": "Submission is valid."}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
exp-examples		exp-examples
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dalpha MLE Bench Artifact Repo

Workspace Artifact

Valid Submission Tool

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dalpha MLE Bench Artifact Repo

Workspace Artifact

Valid Submission Tool

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages