[v2] Refactor text tasks to use DataLoader by Samoed · Pull Request #2198 · embeddings-benchmark/mteb

Samoed · 2025-02-28T19:15:50Z

Now models will receive encode function Dataloader

{
   "text": [...],  # default text
   "image": [...], 
   "audio": [...], 
   "body: [...], # models are allowed to construct the text from the body + title if they wish
   "title: [...],
}

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Samoed · 2025-03-01T12:25:24Z

Right now it is very much a quick wrapper. Wouldn’t we prefer directly working with the dataset for datasets? (I know that this is more code to write)

It's not easy because datasets have different column names and most datasets require encoding two columns, and I don’t have a clear solution for handling that. Also in most tasks list of sentences passed to evaluators and there datasets can't be used for now, but we can change that. Additionally, some datasets return a dictionary instead of a dataset, and Pair classification expects all data to be in the first row (as I recall). I could pass the dataset directly and select columns, but that would be a similar approach to using a wrapper. (edited)

KennethEnevoldsen

So I would really like to see how a Dataloader native abstask would look like. Can we try to do it with just Classification?

I am also afraid of how much this influences throughput - can we do a quick test e.g. using minishlab models?

It is a bit annoying that we have to convert everything in the encode functions (it might be the right solution). We could consider whether it better to just hand of the Dataset object to the model? (but I assume that does not work for images?)

KennethEnevoldsen · 2025-03-02T12:50:26Z

-        if isinstance(queries[0], list):
+        # Encode only unique queries using the dataloader
+        if isinstance(query_list[0], list):
+            # For conversations, still use the original encode_conversations method


Hmm don't we want to standardize everything?

We want, but I still don't know what to do with them, because we don't have implementation for any model #1330

I pinged him. Can't we just convert it to text and keep the "conversation in a column as well??

Yes, can change like that

I've tried to standardize it in bb2a897, but it is hard to tell if it correct, because I don't know conversational datasets to check results

Samoed · 2025-03-02T20:49:32Z

I've updated clustering and classification tasks to use Dataset more natively

KennethEnevoldsen

Looking better.

I added a few comments on the classification.

we should also update the documentation to match (how to implement a custom encoder).

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

Samoed · 2025-03-04T08:33:56Z

I've updated Classification evaluator and removed create_dataloader. What else do you want to change?

KennethEnevoldsen

A few more minor things.

Would love @isaac-chung s opinion on this as well

(would love to see more adaption of tasks to avoid the many dataset transformation)

KennethEnevoldsen · 2025-03-04T11:34:39Z

        rng_state = np.random.default_rng(self.seed)
        rng_state.shuffle(idxs)


Suggested change

rng_state = np.random.default_rng(self.seed)

rng_state.shuffle(idxs)

self.rng_state.shuffle(idxs)

test and believe they should be eq.

On the first experiment they're equal and on others they're different. I think this is because we're recreating rng_state on each experiment

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

Samoed · 2025-03-04T13:37:49Z

@orionw It would be great if you could review this PR!

orionw · 2025-03-04T13:57:31Z

Dataloader seems fine to me if it helps make things easier. I am not sure what the benefits are offhand, but I am not opposed. EDIT: wait, maybe I missed that the inputs are now dataloaders. I probably would be hesistant to make large changes then. What's the benefit to doing so?

Re: allowing to choose how to combine passage and title, I like the motivation but this is a fairly large change (every single model that anyone ever uses).

Could we instead define a custom function that can be overridden, something like "combine_passage_and_title" and have a default? I am hesitant to make such a large API change. We already have a custom function that can be overridden for combine_query_and_instruction

Samoed · 2025-03-04T14:28:37Z

Now, the title and passage are computed the same way as before in text field, so this won't break anything. However, we could add a function to allow overriding if needed.

The main benefit of dataloaders is standardizing input, especially since we now have images and audio, which are difficult to handle otherwise. You can check the discussion in this thread

orionw · 2025-03-04T14:44:58Z

That makes sense and seems good to have the input change happen with v2 then. It is a lot of changes but shouldn't change anything of substance.

Now, the title and passage are computed the same way as before in text field, so this won't break anything. However, we could add a function to allow overriding if needed.

Seems great then. We can add it as an extension if we want but not high priority.

# Conflicts: # mteb/encoder_interface.py # mteb/evaluation/evaluators/ClassificationEvaluator.py

Samoed · 2025-03-07T14:33:18Z

@KennethEnevoldsen We will wait for more reviews, or can we merge this?

KennethEnevoldsen · 2025-03-07T15:15:49Z

Good to merge!

update text tasks except retrieval

1e30543

Samoed requested review from KennethEnevoldsen and isaac-chung February 28, 2025 19:15

Samoed added the v2 label Feb 28, 2025

Samoed added 6 commits March 1, 2025 00:32

update retrieval

92f195c

fix mock models

d843190

remove change to model card

245ca83

fix tests

390453a

fix tests

4228d95

fix tests

40b2e24

Samoed changed the title ~~update text tasks except retrieval~~ [v2] Refactor text tasks to use DataLoader Mar 1, 2025

change loaders to batches

493c268

Samoed marked this pull request as ready for review March 1, 2025 15:21

KennethEnevoldsen reviewed Mar 2, 2025

View reviewed changes

Samoed added 3 commits March 2, 2025 23:31

update review comments

ac98600

update clustering

f0acebf

update classification

81e44b7

KennethEnevoldsen reviewed Mar 4, 2025

View reviewed changes

Comment thread mteb/create_dataloaders.py Outdated

Comment thread mteb/evaluation/evaluators/ClassificationEvaluator.py Outdated

Comment thread mteb/evaluation/evaluators/ClassificationEvaluator.py

Comment thread mteb/abstasks/AbsTaskClassification.py Outdated

Samoed added 2 commits March 4, 2025 10:54

use datasets

071be17

lint

17aeb54

KennethEnevoldsen reviewed Mar 4, 2025

View reviewed changes

Comment thread mteb/abstasks/AbsTaskClassification.py Outdated

Samoed and others added 2 commits March 4, 2025 11:01

Update mteb/abstasks/AbsTaskClassification.py

698a951

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

update evaluators

ac52f26

update clustering test

1e1e69a

KennethEnevoldsen approved these changes Mar 4, 2025

View reviewed changes

Samoed and others added 2 commits March 4, 2025 14:47

Update mteb/abstasks/AbsTaskClassification.py

7758d04

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

update clustering and multilabel classification

f99e209

integrate conversations

bb2a897

Samoed requested a review from orionw March 4, 2025 13:36

add conversation type to annotations

0fe35a3

lint

27d22d5

Merge branch 'refs/heads/v2.0.0' into integrate_dataloaders

4c9214b

# Conflicts: # mteb/encoder_interface.py # mteb/evaluation/evaluators/ClassificationEvaluator.py

Samoed merged commit bd33a33 into v2.0.0 Mar 7, 2025

Samoed deleted the integrate_dataloaders branch March 7, 2025 15:26

		rng_state = np.random.default_rng(self.seed)
		rng_state.shuffle(idxs)

	rng_state = np.random.default_rng(self.seed)
	rng_state.shuffle(idxs)
	self.rng_state.shuffle(idxs)

Uh oh!

Conversation

Samoed commented Feb 28, 2025

Code Quality

Uh oh!

Samoed commented Mar 1, 2025

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Samoed commented Mar 2, 2025

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed commented Mar 4, 2025

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed commented Mar 4, 2025

Uh oh!

orionw commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orionw commented Mar 4, 2025

Uh oh!

Samoed commented Mar 7, 2025

Uh oh!

KennethEnevoldsen commented Mar 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Samoed Mar 4, 2025 •

edited

Loading

orionw commented Mar 4, 2025 •

edited

Loading

Samoed commented Mar 4, 2025 •

edited

Loading