Integrate Birco V2 by AdnanElAssadi56 · Pull Request #2022 · embeddings-benchmark/mteb

AdnanElAssadi56 · 2025-02-08T22:12:54Z

Added a new task class BIRCOReRankingTask, extending the AbsTask base class.
Implemented the necessary methods for task initialization, evaluation, and metadata retrieval.
Integrated BIRCO task logic to support reranking evaluations.

Closes #818

AdnanElAssadi56 · 2025-02-08T22:14:36Z

@orionw Can you please check this?

Samoed · 2025-02-08T22:50:41Z

I've changed yor base branch to v2, because in your history was a lot of PRs from there.

AdnanElAssadi56 · 2025-02-08T22:54:04Z

I've changed yor base branch to v2, because in your history was a lot of PRs from there.

Thanks, I was just about to do that. Feel free to also give it a look. It is quite different from the previous version because v2 already has many of the implementations.

Samoed · 2025-02-08T23:05:20Z

This pull request and #1947 are adding the same tasks? If so, maybe the old pull request should be closed?

Samoed

Can you run tasks to make sure that score of tasks is matching with author's implementation?

AdnanElAssadi56 · 2025-02-11T23:33:48Z

Hi @Samoed

I've refactored the datasets. I preprocessed them externally, and now they look like the retrieval/reranking datasets on HF. Hence, I've deleted all the transforming/loading. I also fixed formatting like citations and other points you mentioned.

They should work now.

For some reason, I cannot test them locally. The first time I ran the tests, they failed because they expected corpus, queries, and qrels as configs instead of default, so I did that.

When I try to run tests now, it says it expects default as a config, so I am not sure.

Samoed · 2025-02-11T23:43:35Z

I think issue with eval splits. In TaskMetadata you've annotated that they will use "test" split, but on huggingface you've uploaded train

Samoed

Can you please get back the .vscode folder and upload the dev and test splits for the datasets, as it is not currently clear what "train" is, as it was not in the original dataset?

AdnanElAssadi56 · 2025-02-14T20:53:47Z

I added the vscode folder.

There is only a test split for the dataset as far as I am aware because we are just evaluating.
It now shows the split as test on huggingface.

Can you please run it from your side and tell me if it works?

Samoed · 2025-02-14T20:56:46Z

Yes, I'll try to run it. Can you resolve merge conflict?

Samoed · 2025-02-14T21:55:49Z

Ah, yes. You should also calculate task.calculate_metadata_metrics() for each of the task

AdnanElAssadi56 · 2025-02-14T22:52:41Z

Do I add them in the classes for each task? And under which function?
Why isn't the calculation done in the abstract class, or is it just a one time thing?

Samoed · 2025-02-15T08:12:30Z

You should call this function for each of the tasks. This function will produce json in descriptive_stats folder with info about dataset.

AdnanElAssadi56 · 2025-02-15T16:04:42Z

I am getting this error when I try to calculate metrics:
ValueError: BuilderConfig 'default' not found. Available: ['corpus', 'queries', 'qrels']

Isn't it expected to have corpus, queries, qrels as configs?

AdnanElAssadi56 · 2025-02-15T19:02:58Z

I tried changing qrels to default because apparently some datasets have it this way, but it is giving me this:
ValueError: Couldn't find cache for mteb/BIRCO-DorisMae-Test for config 'default'
Available configs in the cache: ['corpus', 'queries']

Can you please tell me what structure is the correct one?

Samoed · 2025-02-15T19:26:36Z

I can download BIRCO-DorisMae without issues with 010fedd0de72e8b77388ea5b1d563d638d5900e5 revision

Samoed · 2025-02-15T19:29:42Z

+        date=("2024-01-01", "2024-12-31"),
+        domains=["Academic"],  # Valid combination
+        task_subtypes=["Scientific Reranking"],  # MTEB-approved subtype
+        license="https://creativecommons.org/licenses/by/4.0/",  # Full URL


Suggested change

license="https://creativecommons.org/licenses/by/4.0/", # Full URL

license="cc-by-4.0",

AdnanElAssadi56 · 2025-02-15T19:47:16Z

I can download BIRCO-DorisMae without issues with 010fedd0de72e8b77388ea5b1d563d638d5900e5 revision

I still haven't commited that revision. I've only made the change on HF. This revision has default as a config instead of qrels. Is this how it's supposed to be?

Samoed · 2025-02-15T19:51:15Z

Yes, that dataset has right split format

AdnanElAssadi56 · 2025-02-16T00:03:15Z

Fixed License and now all datasets have default, queries, and corpus as configs.

AdnanElAssadi56 · 2025-02-16T17:29:53Z

Hi @Samoed

Another dataset seems to be causing the tests to fail "Namaa." Can you please confirm?

Samoed · 2025-02-16T17:41:06Z

Yes, but that a bit weird, because I can find it on HF

Samoed

Great work!

AdnanElAssadi56 added 2 commits February 8, 2025 15:37

Birco Datasets Added to V2

d24db98

Birco added to INIT files

6a063be

Samoed changed the base branch from main to v2.0.0 February 8, 2025 22:49

Samoed reviewed Feb 8, 2025

View reviewed changes

AdnanElAssadi56 added 8 commits February 9, 2025 01:20

Fixed ndcg typo

a2353e3

Changed Class to Function + Instructions Format Fixed

c3681d5

MetaData Fix + Test Skeleton

44d7898

Instruction Structure inside metadata

10cf602

Ran Lint

cc4a3fd

Removed Birco Base Class

34705e0

More Formatting

0ce4feb

Separate Birco Classes

89833e2

Samoed reviewed Feb 10, 2025

View reviewed changes

Comment thread .vscode/settings.json

Samoed reviewed Feb 10, 2025

View reviewed changes

Comment thread mteb/tasks/Reranking/eng/BIRCOArguAnaReranking.py Outdated

Samoed mentioned this pull request Feb 10, 2025

Integrate BIRCO Benchmark Datasets into MTEB with Graded Evaluation Support #1947

Closed

Samoed added the v2 label Feb 10, 2025

Samoed reviewed Feb 10, 2025

View reviewed changes

Comment thread mteb/tasks/Reranking/eng/BIRCOArguAnaReranking.py Outdated

Samoed reviewed Feb 10, 2025

View reviewed changes

Comment thread mteb/tasks/Reranking/eng/BIRCODorisMaeReranking.py Outdated

Finalized BIRCO Datasets Structures

c4d93e0

test split typo fix

cdcabf7

Samoed reviewed Feb 13, 2025

View reviewed changes

AdnanElAssadi56 added 2 commits February 14, 2025 12:46

Fixed Train->Test

1083775

Added .vscode folder

87083ad

Merge branch 'v2.0.0' into integrate_birco_v2

2744b5a

AdnanElAssadi56 added 2 commits February 14, 2025 18:13

Added more info in HF readme

daadb06

Removed function added by mistake

11756ad

Samoed reviewed Feb 15, 2025

View reviewed changes

Qrels config changed to Default

294caa4

AdnanElAssadi56 and others added 3 commits February 15, 2025 16:12

Ran calculate_metadata

959bca0

Removed Unnecessary Files

fd98e2e

Ran make lint

3086be4

Samoed approved these changes Feb 16, 2025

View reviewed changes

Samoed merged commit 9461759 into embeddings-benchmark:v2.0.0 Feb 16, 2025

	license="https://creativecommons.org/licenses/by/4.0/", # Full URL
	license="cc-by-4.0",

Uh oh!

Conversation

AdnanElAssadi56 commented Feb 8, 2025 • edited by Samoed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdnanElAssadi56 commented Feb 8, 2025

Uh oh!

Samoed commented Feb 8, 2025

Uh oh!

AdnanElAssadi56 commented Feb 8, 2025

Uh oh!

Samoed commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AdnanElAssadi56 commented Feb 11, 2025

Uh oh!

Samoed commented Feb 11, 2025

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

AdnanElAssadi56 commented Feb 14, 2025

Uh oh!

Samoed commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Feb 14, 2025

Uh oh!

AdnanElAssadi56 commented Feb 14, 2025

Uh oh!

Samoed commented Feb 15, 2025

Uh oh!

AdnanElAssadi56 commented Feb 15, 2025

Uh oh!

AdnanElAssadi56 commented Feb 15, 2025

Uh oh!

Samoed commented Feb 15, 2025

Uh oh!

Samoed Feb 15, 2025

Choose a reason for hiding this comment

Uh oh!

AdnanElAssadi56 commented Feb 15, 2025

Uh oh!

Samoed commented Feb 15, 2025

Uh oh!

AdnanElAssadi56 commented Feb 16, 2025

Uh oh!

AdnanElAssadi56 commented Feb 16, 2025

Uh oh!

Samoed commented Feb 16, 2025

Uh oh!

Samoed left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AdnanElAssadi56 commented Feb 8, 2025 •

edited by Samoed

Loading

Samoed commented Feb 8, 2025 •

edited

Loading

Samoed commented Feb 14, 2025 •

edited

Loading

Samoed left a comment •

edited

Loading