Skip to content

Ensure external does not have a revision#107

Merged
KennethEnevoldsen merged 9 commits into
mainfrom
examining-odd-filtering
Feb 5, 2025
Merged

Ensure external does not have a revision#107
KennethEnevoldsen merged 9 commits into
mainfrom
examining-odd-filtering

Conversation

@KennethEnevoldsen

@KennethEnevoldsen KennethEnevoldsen commented Feb 4, 2025

Copy link
Copy Markdown
Contributor

External models had revision. This lead to duplicate scores (two results from the same model on the same revision).

I have replace the revision with "no_revision_available". I also think this more accurately reflects the reality (we don't know the version they used when they ran the model).

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the results files checker make pre-push.

@Samoed

Samoed commented Feb 4, 2025

Copy link
Copy Markdown
Member

Can you update the load external script too?

@KennethEnevoldsen

Copy link
Copy Markdown
Contributor Author

I think we should avoid using the load_external (model submission should happen here). So I have actually deleted it.

@KennethEnevoldsen

KennethEnevoldsen commented Feb 4, 2025

Copy link
Copy Markdown
Contributor Author

Let me know if you agree

I also added a few extra tests these should make the revision much more consistent in the future (they also disallow "external")

essentially you are allowed to use a sha1 rev id or an integer (1, 2, 3) in case of APIs

@KennethEnevoldsen

Copy link
Copy Markdown
Contributor Author

(@x-tabdeveloping just so that you see this)

@Samoed

Samoed commented Feb 4, 2025

Copy link
Copy Markdown
Member

Yes, I agree that with new leaderboard this script can be deleted

@KennethEnevoldsen KennethEnevoldsen enabled auto-merge (squash) February 5, 2025 07:56
@KennethEnevoldsen KennethEnevoldsen merged commit 7bfb6d9 into main Feb 5, 2025
@Samoed Samoed deleted the examining-odd-filtering branch December 24, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants