Conversation
New mteb/api subpackage exposes the leaderboard data as a FastAPI service backed by ResultCache + the existing polars summary builders. Routes mirror the SvelteKit frontend's data needs: benchmark menu, benchmark detail, and prerendered summary tables. CORS origins, preload, and cache locations come from settings. Dockerfile clones mteb@api, installs .[api], and serves uvicorn on :7860 as UID 1000 — drop-in for a Hugging Face Space. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
pydantic-settings' EnvSettingsSource tries to json.loads any field it considers complex *before* invoking field_validators, which made the documented comma-separated MTEB_API_CORS_ORIGINS format crash with JSONDecodeError at app startup inside the HF Space. NoDecode skips that pre-parse step and lets the existing field_validator split on commas as advertised. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`RUN git clone` always produces the same layer hash because the command string never changes, so HF Spaces was rebuilding the image on top of a stale checkout — the cors_origins NoDecode fix never made it into the running container. Pull the latest commit SHA from GitHub via ADD just before the clone; ADD invalidates the layer whenever the response body changes, which forces a fresh clone per push. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts: # mteb/benchmarks/benchmark.py
The api module needed only this one-line helper from mteb.leaderboard.app, but importing it pulled in gradio, pandas, and cachetools — none of which belong in the [api] extra. Promoting it to a property on ResultCache lets every consumer (api, leaderboard, bench script) reach the path without dragging the Gradio stack into the API container. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops the cold-start cost of cloning the GitHub results repo on first request by pulling the same data from huggingface.co/datasets/mteb/results during image build. Goes into the default huggingface_hub cache under HF_HOME so callers reach it via the standard hub APIs. The download is guarded with `|| true` so it stays non-fatal while the dataset is still being populated upstream — the API just falls back to the GitHub clone on first request. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The results-repo sync now pushes one HF dataset config per benchmark (plus a ``default`` config holding every result, deduped). Rewires the API consumer to match: * ``_load_from_hub`` enumerates configs and ``load_dataset(name=cfg, split='train')`` each. A failure on one config no longer poisons the whole load. * ``_load_per_benchmark_frames`` collapses to two paths — hub or cold rebuild — and returns a ``(per_benchmark, all_results)`` tuple instead of the ``_LoadedFrames`` dataclass. The two named wrappers (``get_all_benchmark_frames`` / ``get_all_results_df``) go away; callers destructure inline. * Hub-supplied ``default`` config short-circuits the per-benchmark concat for the unified view. Other follow-ups: * ``BenchmarkResults`` gains ``load_leaderboard_frame`` and ``split_leaderboard_frame`` so loading the raw combined frame can be decoupled from splitting it. The new ``_split_by_benchmark_tasks`` filters via an inner join on ``(task_name, split, subset)`` tuples — off-spec subsets/splits no longer leak through to ``_create_summary_table``'s ``group_by(model_name, task_name).mean()``. * ``MTEB_API_CACHE_REPO`` moves to ``Settings`` alongside ``cors_origins`` / ``preload``; consumers go through ``settings.cache_repo()``. * /robots.txt added to silence Space probes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the MVEB (Massive Video Embedding Benchmark) benchmark objects to main so the leaderboard and get_benchmark() can resolve them. The underlying tasks are already on main; this adds only the curated benchmark groupings and their registration. - benchmarks.py: MVEB (23 tasks), MVEB(text, video) (19), MVEB(video) (9), MVEB(beta, extended) (184, alias MVEB(extended)). - benchmarks/__init__.py: import + __all__ registration. - _leaderboard_menu.py: new "Video" group under General Purpose. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
I am just not sure this is something I want to maintain going forward. It is currently so slow. Maybe we should make a simplified version that (without gradio) just construct the leaderboard table and displays it. It could just write to an HTML file. This is probably a seperate issue though (not for this PR) |
# Conflicts: # pyproject.toml
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
I think we can merge this in
I think some create some issues. I suspect we are at a point where a refactor into mteb-core, mteb-models and mteb-api might be a thing we need to look into.
Do we want to document the API somewhere (find to not do it now, but I would do it now, but I suspect this will be a feature bump so would do it in an upcoming PR is we want to).
# Conflicts: # pyproject.toml
What kind of docs do you want? |
I could imagine some people might be interested in fetching data from the API. I am not sure we want to encourage this though |
|
I have |
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
A perfect missed that one
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
A FastAPI service that powers the leaderboard frontend. New package under
mteb/api/.Module layout (
mteb/api/)app.pyrouterunder/v1andinfra_routerat root, lifespan-driven warmup,/ogstatic mount with 1-dayCache-Control.routes.py_cached_json(handles 304, gzip negotiation,Cache-Control); uncached ones return pydantic schemas.schemas.pysnake_casein Python,camelCaseover the wire (matchesleaderboardv2/src/lib/types.ts).adapters.pySchema.from_*constructors so each benchmark/task/model pays construction cost once.aggregators.pybuild_benchmark_summary,build_benchmark_per_language,build_benchmark_leaders,build_model_scores,build_task_scores) — turn long polars frames into schema objects.frames.pycachesoaggregatorscan depend on it without dragging in the bytes cache.cache.pyCacheLayergeneric: single-flight per-key locks + LRU store + Prometheus labels. Holds the warm serialised bytes routes hand out.serialization.pywarmup.pymetrics.py/metricsrenderer.otel.pytraceparentpropagation. No-op unlessOTEL_EXPORTER_OTLP_ENDPOINTis set.icons.pysettings.pypydantic-settingsknobs:CORS_ORIGINS,PRELOAD,CACHE_REPO,OG_DIR,PREWARM_MAX_WORKERS,PRELOAD_CONCURRENCY,HTTP_MAX_AGE,DISK_CACHE, log level, OTEL vars.static/favicon.pngEndpoint map
Data routes under
/v1, infra at root.Request flow
Long-frame source of truth lives in
frames.py(loaded once at startup or first request; persisted to~/.cache/mteb/leaderboard/, invalidated by HF dataset commit SHA).