Skip to content

fleurs retrieval tasks#2976

Merged
isaac-chung merged 3 commits into
embeddings-benchmark:maebfrom
hepengfe:fleurs
Aug 4, 2025
Merged

fleurs retrieval tasks#2976
isaac-chung merged 3 commits into
embeddings-benchmark:maebfrom
hepengfe:fleurs

Conversation

@hepengfe

@hepengfe hepengfe commented Aug 3, 2025

Copy link
Copy Markdown
Member

It fixes #2681

  • I have outlined why this dataset is filling an existing gap in mteb
  • I have tested that the dataset runs with the mteb package.

An easy way to test it is using:

import mteb
# sample model:
model = mteb.get_model("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

task = mteb.get_task("{name of your task}")
evaluation = mteb.MTEB(tasks=[task])
evaluation.run(model)
  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
    • laion/clap-htsat-unfused
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)

@hepengfe

hepengfe commented Aug 3, 2025

Copy link
Copy Markdown
Member Author

Model laion/clap-htsat-unfused on FleursT2ARetrieval task results

{
  "dataset_revision": "d7c758a6dceecd54a98cac43404d3d576e721f07",
  "task_name": "FleursT2ARetrieval",
  "mteb_version": "1.21.3",
  "scores": {
    "test": [
      {
        "ndcg_at_1": 0.00758,
        "ndcg_at_3": 0.00997,
        "ndcg_at_5": 0.01469,
        "ndcg_at_10": 0.02191,
        "ndcg_at_20": 0.03777,
        "ndcg_at_100": 0.09088,
        "ndcg_at_1000": 0.16946,
        "map_at_1": 0.00758,
        "map_at_3": 0.00947,
        "map_at_5": 0.01212,
        "map_at_10": 0.01502,
        "map_at_20": 0.01914,
        "map_at_100": 0.02519,
        "map_at_1000": 0.0287,
        "recall_at_1": 0.00758,
        "recall_at_3": 0.01136,
        "recall_at_5": 0.02273,
        "recall_at_10": 0.04545,
        "recall_at_20": 0.10985,
        "recall_at_100": 0.41667,
        "recall_at_1000": 1.0,
        "cv_recall_at_1": 0.00758,
        "cv_recall_at_3": 0.01136,
        "cv_recall_at_5": 0.02273,
        "cv_recall_at_10": 0.04545,
        "cv_recall_at_20": 0.10985,
        "cv_recall_at_100": 0.41667,
        "cv_recall_at_1000": 1.0,
        "precision_at_1": 0.00758,
        "precision_at_3": 0.00379,
        "precision_at_5": 0.00455,
        "precision_at_10": 0.00455,
        "precision_at_20": 0.00549,
        "precision_at_100": 0.00417,
        "precision_at_1000": 0.001,
        "mrr_at_1": 0.007576,
        "mrr_at_3": 0.00947,
        "mrr_at_5": 0.012121,
        "mrr_at_10": 0.015018,
        "mrr_at_20": 0.019141,
        "mrr_at_100": 0.025192,
        "mrr_at_1000": 0.028695,
        "nauc_ndcg_at_1_max": 0.12624,
        "nauc_ndcg_at_1_std": 0.251132,
        "nauc_ndcg_at_1_diff1": 0.661316,
        "nauc_ndcg_at_3_max": 0.173338,
        "nauc_ndcg_at_3_std": 0.091751,
        "nauc_ndcg_at_3_diff1": 0.365446,
        "nauc_ndcg_at_5_max": 0.294654,
        "nauc_ndcg_at_5_std": 0.270838,
        "nauc_ndcg_at_5_diff1": 0.173733,
        "nauc_ndcg_at_10_max": 0.278814,
        "nauc_ndcg_at_10_std": 0.257203,
        "nauc_ndcg_at_10_diff1": 0.140568,
        "nauc_ndcg_at_20_max": 0.18475,
        "nauc_ndcg_at_20_std": 0.12298,
        "nauc_ndcg_at_20_diff1": 0.112796,
        "nauc_ndcg_at_100_max": 0.164302,
        "nauc_ndcg_at_100_std": 0.149342,
        "nauc_ndcg_at_100_diff1": 0.086863,
        "nauc_ndcg_at_1000_max": 0.18502,
        "nauc_ndcg_at_1000_std": 0.162075,
        "nauc_ndcg_at_1000_diff1": 0.123474,
        "nauc_map_at_1_max": 0.12624,
        "nauc_map_at_1_std": 0.251132,
        "nauc_map_at_1_diff1": 0.661316,
        "nauc_map_at_3_max": 0.165519,
        "nauc_map_at_3_std": 0.118211,
        "nauc_map_at_3_diff1": 0.414565,
        "nauc_map_at_5_max": 0.254465,
        "nauc_map_at_5_std": 0.231471,
        "nauc_map_at_5_diff1": 0.273293,
        "nauc_map_at_10_max": 0.25408,
        "nauc_map_at_10_std": 0.225778,
        "nauc_map_at_10_diff1": 0.232776,
        "nauc_map_at_20_max": 0.214551,
        "nauc_map_at_20_std": 0.16687,
        "nauc_map_at_20_diff1": 0.207065,
        "nauc_map_at_100_max": 0.198771,
        "nauc_map_at_100_std": 0.176006,
        "nauc_map_at_100_diff1": 0.185072,
        "nauc_map_at_1000_max": 0.202866,
        "nauc_map_at_1000_std": 0.177871,
        "nauc_map_at_1000_diff1": 0.191672,
        "nauc_recall_at_1_max": 0.12624,
        "nauc_recall_at_1_std": 0.251132,
        "nauc_recall_at_1_diff1": 0.661316,
        "nauc_recall_at_3_max": 0.191704,
        "nauc_recall_at_3_std": 0.029596,
        "nauc_recall_at_3_diff1": 0.250064,
        "nauc_recall_at_5_max": 0.360509,
        "nauc_recall_at_5_std": 0.344919,
        "nauc_recall_at_5_diff1": 0.010287,
        "nauc_recall_at_10_max": 0.301164,
        "nauc_recall_at_10_std": 0.296879,
        "nauc_recall_at_10_diff1": 0.044946,
        "nauc_recall_at_20_max": 0.1512,
        "nauc_recall_at_20_std": 0.08099,
        "nauc_recall_at_20_diff1": 0.04982,
        "nauc_recall_at_100_max": 0.14774,
        "nauc_recall_at_100_std": 0.135561,
        "nauc_recall_at_100_diff1": 0.044575,
        "nauc_recall_at_1000_max": NaN,
        "nauc_recall_at_1000_std": NaN,
        "nauc_recall_at_1000_diff1": NaN,
        "nauc_precision_at_1_max": 0.12624,
        "nauc_precision_at_1_std": 0.251132,
        "nauc_precision_at_1_diff1": 0.661316,
        "nauc_precision_at_3_max": 0.191704,
        "nauc_precision_at_3_std": 0.029596,
        "nauc_precision_at_3_diff1": 0.250064,
        "nauc_precision_at_5_max": 0.360509,
        "nauc_precision_at_5_std": 0.344919,
        "nauc_precision_at_5_diff1": 0.010287,
        "nauc_precision_at_10_max": 0.301164,
        "nauc_precision_at_10_std": 0.296879,
        "nauc_precision_at_10_diff1": 0.044946,
        "nauc_precision_at_20_max": 0.1512,
        "nauc_precision_at_20_std": 0.08099,
        "nauc_precision_at_20_diff1": 0.04982,
        "nauc_precision_at_100_max": 0.14774,
        "nauc_precision_at_100_std": 0.135561,
        "nauc_precision_at_100_diff1": 0.044575,
        "nauc_precision_at_1000_max": NaN,
        "nauc_precision_at_1000_std": NaN,
        "nauc_precision_at_1000_diff1": NaN,
        "nauc_cv_recall_at_1_max": 0.12624,
        "nauc_cv_recall_at_1_std": 0.251132,
        "nauc_cv_recall_at_1_diff1": 0.661316,
        "nauc_cv_recall_at_3_max": 0.191704,
        "nauc_cv_recall_at_3_std": 0.029596,
        "nauc_cv_recall_at_3_diff1": 0.250064,
        "nauc_cv_recall_at_5_max": 0.360509,
        "nauc_cv_recall_at_5_std": 0.344919,
        "nauc_cv_recall_at_5_diff1": 0.010287,
        "nauc_cv_recall_at_10_max": 0.301164,
        "nauc_cv_recall_at_10_std": 0.296879,
        "nauc_cv_recall_at_10_diff1": 0.044946,
        "nauc_cv_recall_at_20_max": 0.1512,
        "nauc_cv_recall_at_20_std": 0.08099,
        "nauc_cv_recall_at_20_diff1": 0.04982,
        "nauc_cv_recall_at_100_max": 0.14774,
        "nauc_cv_recall_at_100_std": 0.135561,
        "nauc_cv_recall_at_100_diff1": 0.044575,
        "nauc_cv_recall_at_1000_max": NaN,
        "nauc_cv_recall_at_1000_std": NaN,
        "nauc_cv_recall_at_1000_diff1": NaN,
        "nauc_mrr_at_1_max": 0.12624,
        "nauc_mrr_at_1_std": 0.251132,
        "nauc_mrr_at_1_diff1": 0.661316,
        "nauc_mrr_at_3_max": 0.165519,
        "nauc_mrr_at_3_std": 0.118211,
        "nauc_mrr_at_3_diff1": 0.414565,
        "nauc_mrr_at_5_max": 0.254465,
        "nauc_mrr_at_5_std": 0.231471,
        "nauc_mrr_at_5_diff1": 0.273293,
        "nauc_mrr_at_10_max": 0.25408,
        "nauc_mrr_at_10_std": 0.225778,
        "nauc_mrr_at_10_diff1": 0.232776,
        "nauc_mrr_at_20_max": 0.214551,
        "nauc_mrr_at_20_std": 0.16687,
        "nauc_mrr_at_20_diff1": 0.207065,
        "nauc_mrr_at_100_max": 0.198771,
        "nauc_mrr_at_100_std": 0.176006,
        "nauc_mrr_at_100_diff1": 0.185072,
        "nauc_mrr_at_1000_max": 0.202866,
        "nauc_mrr_at_1000_std": 0.177871,
        "nauc_mrr_at_1000_diff1": 0.191672,
        "main_score": 0.02273,
        "hf_subset": "af_za",
        "languages": [
          "afr-Latn"
        ]
      }
    ]
  },
  "evaluation_time": 26.83681845664978,
  "kg_co2_emissions": null
}

@hepengfe

hepengfe commented Aug 3, 2025

Copy link
Copy Markdown
Member Author

Model laion/clap-htsat-unfused on FleursA2TRetrieval task results

{
  "dataset_revision": "d7c758a6dceecd54a98cac43404d3d576e721f07",
  "task_name": "FleursA2TRetrieval",
  "mteb_version": "1.21.3",
  "scores": {
    "test": [
      {
        "ndcg_at_1": 0.00309,
        "ndcg_at_3": 0.00309,
        "ndcg_at_5": 0.00369,
        "ndcg_at_10": 0.00802,
        "ndcg_at_20": 0.01294,
        "ndcg_at_100": 0.03386,
        "ndcg_at_1000": 0.13494,
        "map_at_1": 0.00309,
        "map_at_3": 0.00309,
        "map_at_5": 0.0034,
        "map_at_10": 0.00509,
        "map_at_20": 0.00637,
        "map_at_100": 0.00874,
        "map_at_1000": 0.01162,
        "recall_at_1": 0.00309,
        "recall_at_3": 0.00309,
        "recall_at_5": 0.00464,
        "recall_at_10": 0.01855,
        "recall_at_20": 0.03864,
        "recall_at_100": 0.1592,
        "recall_at_1000": 1.0,
        "cv_recall_at_1": 0.00309,
        "cv_recall_at_3": 0.00309,
        "cv_recall_at_5": 0.00618,
        "cv_recall_at_10": 0.01855,
        "cv_recall_at_20": 0.03709,
        "cv_recall_at_100": 0.1592,
        "cv_recall_at_1000": 1.0,
        "precision_at_1": 0.00309,
        "precision_at_3": 0.00103,
        "precision_at_5": 0.00093,
        "precision_at_10": 0.00185,
        "precision_at_20": 0.00193,
        "precision_at_100": 0.00159,
        "precision_at_1000": 0.001,
        "mrr_at_1": 0.003091,
        "mrr_at_3": 0.003091,
        "mrr_at_5": 0.003709,
        "mrr_at_10": 0.005107,
        "mrr_at_20": 0.006296,
        "mrr_at_100": 0.008736,
        "mrr_at_1000": 0.011621,
        "nauc_ndcg_at_1_max": 0.493771,
        "nauc_ndcg_at_1_std": -0.053629,
        "nauc_ndcg_at_1_diff1": 0.464816,
        "nauc_ndcg_at_3_max": 0.493771,
        "nauc_ndcg_at_3_std": -0.053629,
        "nauc_ndcg_at_3_diff1": 0.464816,
        "nauc_ndcg_at_5_max": 0.575818,
        "nauc_ndcg_at_5_std": 0.117139,
        "nauc_ndcg_at_5_diff1": 0.496999,
        "nauc_ndcg_at_10_max": 0.25802,
        "nauc_ndcg_at_10_std": 0.006896,
        "nauc_ndcg_at_10_diff1": 0.291058,
        "nauc_ndcg_at_20_max": 0.137024,
        "nauc_ndcg_at_20_std": 0.006043,
        "nauc_ndcg_at_20_diff1": 0.127678,
        "nauc_ndcg_at_100_max": 0.112731,
        "nauc_ndcg_at_100_std": 0.044239,
        "nauc_ndcg_at_100_diff1": 0.100306,
        "nauc_ndcg_at_1000_max": 0.136365,
        "nauc_ndcg_at_1000_std": 0.037785,
        "nauc_ndcg_at_1000_diff1": 0.136663,
        "nauc_map_at_1_max": 0.493771,
        "nauc_map_at_1_std": -0.053629,
        "nauc_map_at_1_diff1": 0.464816,
        "nauc_map_at_3_max": 0.493771,
        "nauc_map_at_3_std": -0.053629,
        "nauc_map_at_3_diff1": 0.464816,
        "nauc_map_at_5_max": 0.539791,
        "nauc_map_at_5_std": 0.042155,
        "nauc_map_at_5_diff1": 0.482868,
        "nauc_map_at_10_max": 0.349119,
        "nauc_map_at_10_std": 0.007151,
        "nauc_map_at_10_diff1": 0.360554,
        "nauc_map_at_20_max": 0.265947,
        "nauc_map_at_20_std": 0.0036,
        "nauc_map_at_20_diff1": 0.256618,
        "nauc_map_at_100_max": 0.22163,
        "nauc_map_at_100_std": 0.026677,
        "nauc_map_at_100_diff1": 0.220486,
        "nauc_map_at_1000_max": 0.221758,
        "nauc_map_at_1000_std": 0.02602,
        "nauc_map_at_1000_diff1": 0.220519,
        "nauc_recall_at_1_max": 0.493771,
        "nauc_recall_at_1_std": -0.053629,
        "nauc_recall_at_1_diff1": 0.464816,
        "nauc_recall_at_3_max": 0.493771,
        "nauc_recall_at_3_std": -0.053629,
        "nauc_recall_at_3_diff1": 0.464816,
        "nauc_recall_at_5_max": 0.662514,
        "nauc_recall_at_5_std": 0.29758,
        "nauc_recall_at_5_diff1": 0.531005,
        "nauc_recall_at_10_max": 0.166372,
        "nauc_recall_at_10_std": -0.003367,
        "nauc_recall_at_10_diff1": 0.219578,
        "nauc_recall_at_20_max": 0.049888,
        "nauc_recall_at_20_std": 0.00442,
        "nauc_recall_at_20_diff1": 0.038989,
        "nauc_recall_at_100_max": 0.084675,
        "nauc_recall_at_100_std": 0.048286,
        "nauc_recall_at_100_diff1": 0.061122,
        "nauc_recall_at_1000_max": NaN,
        "nauc_recall_at_1000_std": NaN,
        "nauc_recall_at_1000_diff1": NaN,
        "nauc_precision_at_1_max": 0.493771,
        "nauc_precision_at_1_std": -0.053629,
        "nauc_precision_at_1_diff1": 0.464816,
        "nauc_precision_at_3_max": 0.493771,
        "nauc_precision_at_3_std": -0.053629,
        "nauc_precision_at_3_diff1": 0.464816,
        "nauc_precision_at_5_max": 0.662514,
        "nauc_precision_at_5_std": 0.29758,
        "nauc_precision_at_5_diff1": 0.531005,
        "nauc_precision_at_10_max": 0.166372,
        "nauc_precision_at_10_std": -0.003367,
        "nauc_precision_at_10_diff1": 0.219578,
        "nauc_precision_at_20_max": 0.049888,
        "nauc_precision_at_20_std": 0.00442,
        "nauc_precision_at_20_diff1": 0.038989,
        "nauc_precision_at_100_max": 0.084675,
        "nauc_precision_at_100_std": 0.048286,
        "nauc_precision_at_100_diff1": 0.061122,
        "nauc_precision_at_1000_max": 1.0,
        "nauc_precision_at_1000_std": 1.0,
        "nauc_precision_at_1000_diff1": 1.0,
        "nauc_cv_recall_at_1_max": 0.493771,
        "nauc_cv_recall_at_1_std": -0.053629,
        "nauc_cv_recall_at_1_diff1": 0.464816,
        "nauc_cv_recall_at_3_max": 0.493771,
        "nauc_cv_recall_at_3_std": -0.053629,
        "nauc_cv_recall_at_3_diff1": 0.464816,
        "nauc_cv_recall_at_5_max": 0.35316,
        "nauc_cv_recall_at_5_std": 0.07946,
        "nauc_cv_recall_at_5_diff1": 0.346792,
        "nauc_cv_recall_at_10_max": 0.166372,
        "nauc_cv_recall_at_10_std": -0.003367,
        "nauc_cv_recall_at_10_diff1": 0.219578,
        "nauc_cv_recall_at_20_max": 0.060544,
        "nauc_cv_recall_at_20_std": -0.023036,
        "nauc_cv_recall_at_20_diff1": -0.001053,
        "nauc_cv_recall_at_100_max": 0.084675,
        "nauc_cv_recall_at_100_std": 0.048286,
        "nauc_cv_recall_at_100_diff1": 0.061122,
        "nauc_cv_recall_at_1000_max": NaN,
        "nauc_cv_recall_at_1000_std": NaN,
        "nauc_cv_recall_at_1000_diff1": NaN,
        "nauc_mrr_at_1_max": 0.493771,
        "nauc_mrr_at_1_std": -0.053629,
        "nauc_mrr_at_1_diff1": 0.464816,
        "nauc_mrr_at_3_max": 0.493771,
        "nauc_mrr_at_3_std": -0.053629,
        "nauc_mrr_at_3_diff1": 0.464816,
        "nauc_mrr_at_5_max": 0.4469,
        "nauc_mrr_at_5_std": -0.009266,
        "nauc_mrr_at_5_diff1": 0.425475,
        "nauc_mrr_at_10_max": 0.341201,
        "nauc_mrr_at_10_std": -0.009338,
        "nauc_mrr_at_10_diff1": 0.348687,
        "nauc_mrr_at_20_max": 0.264453,
        "nauc_mrr_at_20_std": -0.018609,
        "nauc_mrr_at_20_diff1": 0.238235,
        "nauc_mrr_at_100_max": 0.217288,
        "nauc_mrr_at_100_std": 0.016315,
        "nauc_mrr_at_100_diff1": 0.212872,
        "nauc_mrr_at_1000_max": 0.217412,
        "nauc_mrr_at_1000_std": 0.015725,
        "nauc_mrr_at_1000_diff1": 0.212734,
        "main_score": 0.00618,
        "hf_subset": "en_us",
        "languages": [
          "eng-Latin"
        ]
      }
    ]
  },
  "evaluation_time": 63.66722893714905,
  "kg_co2_emissions": null
}

@hepengfe

hepengfe commented Aug 3, 2025

Copy link
Copy Markdown
Member Author

@isaac-chung I noticed that it requires reduce dataset size, any pointer on how should I achieve it?

@hepengfe hepengfe marked this pull request as ready for review August 3, 2025 06:37
@hepengfe hepengfe self-assigned this Aug 3, 2025
@hepengfe hepengfe requested a review from isaac-chung August 3, 2025 06:37
@hepengfe hepengfe changed the title fleurs first commit fleurs dataset Aug 3, 2025
@isaac-chung isaac-chung linked an issue Aug 3, 2025 that may be closed by this pull request
@hepengfe hepengfe changed the title fleurs dataset fleurs retrieval tasks Aug 3, 2025
@isaac-chung

Copy link
Copy Markdown
Collaborator

How big is the entire dataset now? (Test split only) I think for retrieval tasks we could do some negative mining, but I think for the first run we could leave this as is.

@KennethEnevoldsen what do you think of the plan?

@KennethEnevoldsen

KennethEnevoldsen commented Aug 4, 2025

Copy link
Copy Markdown
Contributor

I would say leave it as is - we can create a downsamples version later

@isaac-chung isaac-chung merged commit d841b33 into embeddings-benchmark:maeb Aug 4, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add FLEURS dataset for audio retrieval

3 participants