Skip to content

Integrate Birco V2#2022

Merged
Samoed merged 21 commits into
embeddings-benchmark:v2.0.0from
AdnanElAssadi56:integrate_birco_v2
Feb 16, 2025
Merged

Integrate Birco V2#2022
Samoed merged 21 commits into
embeddings-benchmark:v2.0.0from
AdnanElAssadi56:integrate_birco_v2

Conversation

@AdnanElAssadi56

@AdnanElAssadi56 AdnanElAssadi56 commented Feb 8, 2025

Copy link
Copy Markdown
Contributor
  • Added a new task class BIRCOReRankingTask, extending the AbsTask base class.
  • Implemented the necessary methods for task initialization, evaluation, and metadata retrieval.
  • Integrated BIRCO task logic to support reranking evaluations.

Closes #818

@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

@orionw Can you please check this?

@Samoed Samoed changed the base branch from main to v2.0.0 February 8, 2025 22:49
@Samoed

Samoed commented Feb 8, 2025

Copy link
Copy Markdown
Member

I've changed yor base branch to v2, because in your history was a lot of PRs from there.

@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

I've changed yor base branch to v2, because in your history was a lot of PRs from there.

Thanks, I was just about to do that. Feel free to also give it a look. It is quite different from the previous version because v2 already has many of the implementations.

@Samoed

Samoed commented Feb 8, 2025

Copy link
Copy Markdown
Member

This pull request and #1947 are adding the same tasks? If so, maybe the old pull request should be closed?

@Samoed Samoed left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run tasks to make sure that score of tasks is matching with author's implementation?

Comment thread mteb/tasks/Reranking/eng/BIRCO_Reranking.py Outdated
Comment thread mteb/tasks/Reranking/eng/BIRCO_Reranking.py Outdated
Comment thread mteb/tasks/Reranking/eng/BIRCO_Reranking.py Outdated
Comment thread mteb/tasks/Reranking/eng/BIRCO_Reranking.py Outdated
Comment thread mteb/tasks/Reranking/eng/BIRCO_Reranking.py Outdated
Comment thread .vscode/settings.json
Comment thread mteb/tasks/Reranking/eng/BIRCOArguAnaReranking.py Outdated
Comment thread mteb/tasks/Reranking/eng/BIRCOArguAnaReranking.py Outdated
Comment thread mteb/tasks/Reranking/eng/BIRCODorisMaeReranking.py Outdated
@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

Hi @Samoed

I've refactored the datasets. I preprocessed them externally, and now they look like the retrieval/reranking datasets on HF. Hence, I've deleted all the transforming/loading. I also fixed formatting like citations and other points you mentioned.

They should work now.

For some reason, I cannot test them locally. The first time I ran the tests, they failed because they expected corpus, queries, and qrels as configs instead of default, so I did that.

When I try to run tests now, it says it expects default as a config, so I am not sure.

@Samoed

Samoed commented Feb 11, 2025

Copy link
Copy Markdown
Member

I think issue with eval splits. In TaskMetadata you've annotated that they will use "test" split, but on huggingface you've uploaded train

@Samoed Samoed left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please get back the .vscode folder and upload the dev and test splits for the datasets, as it is not currently clear what "train" is, as it was not in the original dataset?

@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

I added the vscode folder.

There is only a test split for the dataset as far as I am aware because we are just evaluating.
It now shows the split as test on huggingface.

Can you please run it from your side and tell me if it works?

@Samoed

Samoed commented Feb 14, 2025

Copy link
Copy Markdown
Member

Yes, I'll try to run it. Can you resolve merge conflict?

@Samoed

Samoed commented Feb 14, 2025

Copy link
Copy Markdown
Member

Ah, yes. You should also calculate task.calculate_metadata_metrics() for each of the task

@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

Do I add them in the classes for each task? And under which function?
Why isn't the calculation done in the abstract class, or is it just a one time thing?

@Samoed

Samoed commented Feb 15, 2025

Copy link
Copy Markdown
Member

You should call this function for each of the tasks. This function will produce json in descriptive_stats folder with info about dataset.

@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

I am getting this error when I try to calculate metrics:
ValueError: BuilderConfig 'default' not found. Available: ['corpus', 'queries', 'qrels']

Isn't it expected to have corpus, queries, qrels as configs?

@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

I tried changing qrels to default because apparently some datasets have it this way, but it is giving me this:
ValueError: Couldn't find cache for mteb/BIRCO-DorisMae-Test for config 'default'
Available configs in the cache: ['corpus', 'queries']

Can you please tell me what structure is the correct one?

@Samoed

Samoed commented Feb 15, 2025

Copy link
Copy Markdown
Member

I can download BIRCO-DorisMae without issues with 010fedd0de72e8b77388ea5b1d563d638d5900e5 revision

date=("2024-01-01", "2024-12-31"),
domains=["Academic"], # Valid combination
task_subtypes=["Scientific Reranking"], # MTEB-approved subtype
license="https://creativecommons.org/licenses/by/4.0/", # Full URL

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
license="https://creativecommons.org/licenses/by/4.0/", # Full URL
license="cc-by-4.0",

@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

I can download BIRCO-DorisMae without issues with 010fedd0de72e8b77388ea5b1d563d638d5900e5 revision

I still haven't commited that revision. I've only made the change on HF. This revision has default as a config instead of qrels. Is this how it's supposed to be?

@Samoed

Samoed commented Feb 15, 2025

Copy link
Copy Markdown
Member

Yes, that dataset has right split format

@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

Fixed License and now all datasets have default, queries, and corpus as configs.

@AdnanElAssadi56

Copy link
Copy Markdown
Contributor Author

Hi @Samoed

Another dataset seems to be causing the tests to fail "Namaa." Can you please confirm?

@Samoed

Samoed commented Feb 16, 2025

Copy link
Copy Markdown
Member

Yes, but that a bit weird, because I can find it on HF

@Samoed Samoed left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@Samoed Samoed merged commit 9461759 into embeddings-benchmark:v2.0.0 Feb 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants