Skip to content

feat: Add mteb/api FastAPI service for new leaderboard#4760

Merged
Samoed merged 108 commits into
mainfrom
api
Jun 21, 2026
Merged

feat: Add mteb/api FastAPI service for new leaderboard#4760
Samoed merged 108 commits into
mainfrom
api

Conversation

@Samoed

@Samoed Samoed commented Jun 2, 2026

Copy link
Copy Markdown
Member

A FastAPI service that powers the leaderboard frontend. New package under mteb/api/.

Module layout (mteb/api/)

Module Role
app.py App factory: GZip + CORS + Prometheus middleware, mounts router under /v1 and infra_router at root, lifespan-driven warmup, /og static mount with 1-day Cache-Control.
routes.py All HTTP endpoints. Cached endpoints serve pre-built bytes via _cached_json (handles 304, gzip negotiation, Cache-Control); uncached ones return pydantic schemas.
schemas.py Pydantic response models. snake_case in Python, camelCase over the wire (matches leaderboardv2/src/lib/types.ts).
adapters.py Memoised wrappers around Schema.from_* constructors so each benchmark/task/model pays construction cost once.
aggregators.py Pure builders (build_benchmark_summary, build_benchmark_per_language, build_benchmark_leaders, build_model_scores, build_task_scores) — turn long polars frames into schema objects.
frames.py Process-wide long-results polars frames, loaded once. Sits below cache so aggregators can depend on it without dragging in the bytes cache.
cache.py CacheLayer generic: single-flight per-key locks + LRU store + Prometheus labels. Holds the warm serialised bytes routes hand out.
serialization.py Off-thread schema → JSON bytes + gzip pair, used by the bytes cache.
warmup.py Lifespan-time warmup: builds frames, prewarms schema caches, optionally preloads every summary in the background (semaphore-capped).
metrics.py Prometheus middleware + counters/histograms/in-flight gauge; cache-outcome counter; /metrics renderer.
otel.py OTLP HTTP tracing setup — one span per request, W3C traceparent propagation. No-op unless OTEL_EXPORTER_OTLP_ENDPOINT is set.
icons.py Benchmark icon proxy + cache, so the browser gets immutable 1-year cache instead of upstream's 5-minute one.
settings.py pydantic-settings knobs: CORS_ORIGINS, PRELOAD, CACHE_REPO, OG_DIR, PREWARM_MAX_WORKERS, PRELOAD_CONCURRENCY, HTTP_MAX_AGE, DISK_CACHE, log level, OTEL vars.
static/favicon.png Shipped as package data.

Endpoint map

Data routes under /v1, infra at root.

/health                                   GET   liveness
/metrics                                  GET   prometheus scrape
/robots.txt, /favicon.ico, /og/*          GET   static / proxied assets
/v1/icon/{name:path}                      GET   benchmark icon proxy

/v1/benchmarks                            GET   flat list
/v1/benchmarks/menu                       GET   nested menu tree
/v1/benchmarks/{name}                     GET   single benchmark metadata
/v1/benchmarks/{name}/scores              GET   full summary (legacy alias: /summary)
/v1/benchmarks/{name}/per-language        GET   per-language rows
/v1/benchmarks/{name}/leaders             GET   slim home-page leader tiles

/v1/tasks                                 GET   flat list
/v1/tasks/{name}                          GET   single task metadata
/v1/tasks/{name}/scores                   GET   per-task scores

/v1/models                                GET   flat list
/v1/models/{name}                         GET   single model metadata
/v1/models/{name}/scores                  GET   per-model scores

Request flow

request → CORSMiddleware → GZipMiddleware → PrometheusMiddleware → router
            │
            ├── cached endpoints ──→ cache.CacheLayer (single-flight)
            │                          │ miss → aggregators.build_* → serialize_schema (off-thread)
            │                          └ hit  → cached (bytes, gzip, etag)
            │                                    → Response with 304 / Vary / Cache-Control
            │
            └── uncached endpoints → adapters.*_to_schema → pydantic-core JSON

Long-frame source of truth lives in frames.py (loaded once at startup or first request; persisted to ~/.cache/mteb/leaderboard/, invalidated by HF dataset commit SHA).

Samoed and others added 18 commits May 22, 2026 18:34
New mteb/api subpackage exposes the leaderboard data as a FastAPI
service backed by ResultCache + the existing polars summary builders.
Routes mirror the SvelteKit frontend's data needs: benchmark menu,
benchmark detail, and prerendered summary tables. CORS origins,
preload, and cache locations come from settings.

Dockerfile clones mteb@api, installs .[api], and serves uvicorn on
:7860 as UID 1000 — drop-in for a Hugging Face Space.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Base automatically changed from lb_parquet to main June 2, 2026 12:56
Samoed and others added 11 commits June 2, 2026 15:59
pydantic-settings' EnvSettingsSource tries to json.loads any field it
considers complex *before* invoking field_validators, which made the
documented comma-separated MTEB_API_CORS_ORIGINS format crash with
JSONDecodeError at app startup inside the HF Space. NoDecode skips
that pre-parse step and lets the existing field_validator split on
commas as advertised.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`RUN git clone` always produces the same layer hash because the command
string never changes, so HF Spaces was rebuilding the image on top of a
stale checkout — the cors_origins NoDecode fix never made it into the
running container. Pull the latest commit SHA from GitHub via ADD just
before the clone; ADD invalidates the layer whenever the response body
changes, which forces a fresh clone per push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts:
#	mteb/benchmarks/benchmark.py
The api module needed only this one-line helper from
mteb.leaderboard.app, but importing it pulled in gradio, pandas, and
cachetools — none of which belong in the [api] extra. Promoting it to
a property on ResultCache lets every consumer (api, leaderboard,
bench script) reach the path without dragging the Gradio stack into
the API container.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops the cold-start cost of cloning the GitHub results repo on first
request by pulling the same data from huggingface.co/datasets/mteb/results
during image build. Goes into the default huggingface_hub cache under
HF_HOME so callers reach it via the standard hub APIs. The download is
guarded with `|| true` so it stays non-fatal while the dataset is still
being populated upstream — the API just falls back to the GitHub clone
on first request.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The results-repo sync now pushes one HF dataset config per benchmark
(plus a ``default`` config holding every result, deduped). Rewires the
API consumer to match:

* ``_load_from_hub`` enumerates configs and ``load_dataset(name=cfg,
  split='train')`` each. A failure on one config no longer poisons the
  whole load.
* ``_load_per_benchmark_frames`` collapses to two paths — hub or cold
  rebuild — and returns a ``(per_benchmark, all_results)`` tuple
  instead of the ``_LoadedFrames`` dataclass. The two named wrappers
  (``get_all_benchmark_frames`` / ``get_all_results_df``) go away;
  callers destructure inline.
* Hub-supplied ``default`` config short-circuits the per-benchmark
  concat for the unified view.

Other follow-ups:

* ``BenchmarkResults`` gains ``load_leaderboard_frame`` and
  ``split_leaderboard_frame`` so loading the raw combined frame can be
  decoupled from splitting it. The new
  ``_split_by_benchmark_tasks`` filters via an inner join on
  ``(task_name, split, subset)`` tuples — off-spec subsets/splits no
  longer leak through to ``_create_summary_table``'s
  ``group_by(model_name, task_name).mean()``.
* ``MTEB_API_CACHE_REPO`` moves to ``Settings`` alongside
  ``cors_origins`` / ``preload``; consumers go through
  ``settings.cache_repo()``.
* /robots.txt added to silence Space probes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the MVEB (Massive Video Embedding Benchmark) benchmark objects to
main so the leaderboard and get_benchmark() can resolve them. The
underlying tasks are already on main; this adds only the curated
benchmark groupings and their registration.

- benchmarks.py: MVEB (23 tasks), MVEB(text, video) (19), MVEB(video)
  (9), MVEB(beta, extended) (184, alias MVEB(extended)).
- benchmarks/__init__.py: import + __all__ registration.
- _leaderboard_menu.py: new "Video" group under General Purpose.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Samoed Samoed changed the title Add mteb/api FastAPI service and HF Space Dockerfile Add mteb/api FastAPI service for new leaderboard Jun 18, 2026
@KennethEnevoldsen

Copy link
Copy Markdown
Contributor

I think we should keep old leaderboard. I think some people can fork mteb to create their propriete benchmarks and to see scores they could use gradio implmenetation. I don't think that this is possible to run new leaderboard from our package because it's js app

I am just not sure this is something I want to maintain going forward. It is currently so slow. Maybe we should make a simplified version that (without gradio) just construct the leaderboard table and displays it. It could just write to an HTML file.

This is probably a seperate issue though (not for this PR)

@KennethEnevoldsen KennethEnevoldsen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this in

I think some create some issues. I suspect we are at a point where a refactor into mteb-core, mteb-models and mteb-api might be a thing we need to look into.

Do we want to document the API somewhere (find to not do it now, but I would do it now, but I suspect this will be a feature bump so would do it in an upcoming PR is we want to).

Comment thread mteb/abstasks/classification.py
@Samoed

Samoed commented Jun 20, 2026

Copy link
Copy Markdown
Member Author

Do we want to document the API somewhere

What kind of docs do you want?

@KennethEnevoldsen

Copy link
Copy Markdown
Contributor

What kind of docs do you want?

I could imagine some people might be interested in fetching data from the API. I am not sure we want to encourage this though

@Samoed

Samoed commented Jun 20, 2026

Copy link
Copy Markdown
Member Author

I have README.md inside API folder and docstrings will be visible in swagger doc

@KennethEnevoldsen KennethEnevoldsen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A perfect missed that one

Comment thread mteb/api/README.md Outdated
@KennethEnevoldsen KennethEnevoldsen changed the title Add mteb/api FastAPI service for new leaderboard feat: Add mteb/api FastAPI service for new leaderboard Jun 21, 2026
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
@Samoed Samoed enabled auto-merge (squash) June 21, 2026 11:55
@Samoed Samoed merged commit a3dfc9d into main Jun 21, 2026
14 of 15 checks passed
@Samoed Samoed deleted the api branch June 21, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants