add typos check by Samoed · Pull Request #8 · tomaarsen/sentence-transformers

Samoed · 2026-04-01T17:03:04Z

Add typos to pre-commit config. A bit breaking changes:

compute_metrices -> compute_metrics, but there was old function with same name and I deleted it... Not sure what to do with it
eucledian_distance -> eucledian_distance

tomaarsen

I've noticed these in the past, but they're breaking as you can imagine. I can resolve them, but only if I additionally introduce some backwards compatibility. Thanks for opening this!

P.s. I'm getting close to the release now. I intend to make integration pull requests for these models before release: https://huggingface.co/collections/tomaarsen/sentence-transformers-v54-integrations

And after release, I aim to make PRs for some other models as well, so that ST can be a convenient interface for every modality. It should be flexible enough, especially with trust_remote_code=True.

tomaarsen · 2026-04-02T08:20:19Z

+[tool.typos.default.extend-words]
+NCE = "NCE"
+nce = "nce"
+Ein = "Ein"
+ein = "ein"
+ist = "ist"
+arange = "arange"
+
+
+[tool.typos.files]
+extend-exclude = [
+    "*.json",
+    "*.ipynb",
+    "examples/sentence_transformer/unsupervised_learning/query_generation/3_programming_semantic_search.py",
+    "examples/sentence_transformer/unsupervised_learning/query_generation/1_programming_query_generation.py",
+    "sentence_transformers/sentence_transformer/modules/tokenizer/word.py",
+]
+
+[tool.typos.type.py.extend-words]
+thw = "thw"
+nd = "nd"
+Evangelia = "Evangelia"
+aas = "aas"
+sur = "sur"
+paria = "paria"
+ende = "ende"


What distinguishes these two types of extend-words from each other?

In tool.typos.default.extend-words would work on any file type, tool.typos.type.py.extend-words would work only on python files

Samoed · 2026-04-02T08:39:29Z

I made them compatible by name.

I'll take a closer look at your changes. Let me know how I can help! I could probably try to help with more ruff rules integration and mypy checking, but they can cause some incompatibilities, e.g. number of args flagged by pylint, or mismatched interfaces caught by mypy, so this would be more like v6 version, rather v5.4. Let me know what you think!

I think we can stick with using typos through pre-commit.

Copilot

Pull request overview

This PR introduces automated typo checking via typos and applies spelling-related renames/fixes across evaluators, losses, tests, and docs/examples.

Changes:

Add typos to pre-commit and configure it via pyproject.toml.
Rename misspelled public APIs (compute_metrices → compute_metrics, eucledian_distance → euclidean_distance) and add backward-compatibility aliases in several places.
Fix various typos in tests, docs, READMEs, and notebooks.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/sentence_transformer/test_model.py	Fix typo in a test comment.
tests/sentence_transformer/evaluation/test_information_retrieval_evaluator.py	Rename test function to `test_metrics`.
sentence_transformers/sparse_encoder/evaluation/sparse_reranking.py	Remove redundant `compute_metrices*` passthrough methods.
sentence_transformers/sparse_encoder/evaluation/sparse_information_retrieval.py	Remove redundant `compute_metrices` passthrough method.
sentence_transformers/sparse_encoder/evaluation/sparse_binary_classification.py	Remove redundant `compute_metrices` passthrough method.
sentence_transformers/sentence_transformer/losses/batch_semi_hard_triplet.py	Switch default distance metric to `euclidean_distance`.
sentence_transformers/sentence_transformer/losses/batch_hard_triplet.py	Rename distance function + add backwards-compat alias.
sentence_transformers/sentence_transformer/losses/batch_hard_soft_margin_triplet.py	Switch default distance metric to `euclidean_distance`.
sentence_transformers/sentence_transformer/losses/batch_all_triplet.py	Switch default distance metric to `euclidean_distance`.
sentence_transformers/sentence_transformer/evaluation/reranking.py	Rename `compute_metrices` → `compute_metrics` + add aliases.
sentence_transformers/sentence_transformer/evaluation/information_retrieval.py	Rename `compute_metrices` → `compute_metrics` + add alias; remove old helper.
sentence_transformers/sentence_transformer/evaluation/binary_classification.py	Rename `compute_metrices` → `compute_metrics` + add alias.
pyproject.toml	Add `typos` configuration (words/excludes/back-compat identifiers).
examples/sparse_encoder/training/sts/README.md	Fix “Overview” spelling.
examples/sparse_encoder/applications/retrieve_rerank/retrieve_rerank_simple_wikipedia.ipynb	Fix typo in markdown cell.
examples/sentence_transformer/training/sts/README.md	Fix “Overview” spelling.
examples/sentence_transformer/training/data_augmentation/train_sts_qqp_crossdomain.py	Fix comment typo (“question”).
examples/sentence_transformer/applications/image-search/image_search_multilingual.ipynb	Fix typo in markdown cell.
examples/sentence_transformer/applications/image-search/Image_Classification.ipynb	Fix typos in markdown cell.
examples/cross_encoder/training/sts/README.md	Fix “Overview” spelling.
examples/cross_encoder/training/ms_marco/README.md	Fix “mileage” spelling.
docs/sparse_encoder/usage/efficiency.rst	Fix “mileage” spelling.
docs/sentence_transformer/usage/mteb_evaluation.md	Fix grammar (“existing models”).
docs/sentence_transformer/usage/efficiency.rst	Fix “mileage” spelling.
docs/cross_encoder/usage/efficiency.rst	Fix “mileage” spelling.
.pre-commit-config.yaml	Add `typos` pre-commit hook.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T15:16:56Z

        self.store_metrics_in_model_card_data(model, metrics, epoch, steps)
        return metrics

-    def compute_metrices(
+    def compute_metrics(
        self,
        model: SentenceTransformer,
        corpus_model=None,


Renaming compute_metrices to compute_metrics introduces a name collision with the internal scoring helper that this class still calls later (self.compute_metrics(queries_result_list[...])). Since the old compute_metrics(queries_result_list) implementation was removed, that call will now recurse back into this method with the wrong argument type and fail at runtime. Please reintroduce the results-list scoring helper under a distinct name (e.g. _compute_ir_metrics / compute_metrics_from_results) and call that where you currently compute per-query metrics, keeping compute_metrics(model, ...) as the public API (and compute_metrices as an alias).

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

sentence_transformers/sentence_transformer/evaluation/information_retrieval.py:306

The return type for compute_all_metrics is annotated as dict[str, float], but the function returns a nested structure keyed by score-function name and metric name (e.g. scores[name]["ndcg@k"][k]). This mismatch can break type-checking and mislead API consumers; please update the annotation to reflect the nested dict shape (or use a TypedDict / dict[str, Any] if you want to avoid an overly complex type).

    def compute_all_metrics(
        self,
        model: SentenceTransformer,
        corpus_model=None,
        corpus_embeddings: Tensor | None = None,
        output_path: str | None = None,
    ) -> dict[str, float]:
        if corpus_model is None:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tomaarsen · 2026-04-03T15:37:50Z

I think this is looking good. Thanks for the contribution! I'll merge it now.

I'll take a closer look at your changes. Let me know how I can help! I could probably try to help with more ruff rules integration and mypy checking, but they can cause some incompatibilities, e.g. number of args flagged by pylint, or mismatched interfaces caught by mypy, so this would be more like v6 version, rather v5.4. Let me know what you think!

I'm curious about ruff, but pretty happy with their defaults. For type checking: I'm curious about ty. I think transformers is also using it, and I've been using it in VS Code for a few months. I'll likely follow what transformers does and slowly introduce it over time.

Tom Aarsen

Samoed · 2026-04-03T15:42:58Z

I tried ruff, when v0.0.1 released, but in my cases mypy was more accurate

tomaarsen · 2026-04-03T16:29:41Z

Fair enough, I've only used mypy many years ago, I'm not sure about its current state.

Tom Aarsen

add typos check

d337572

tomaarsen reviewed Apr 2, 2026

View reviewed changes

make backward compatible

07eca77

Samoed requested a review from tomaarsen April 2, 2026 10:43

Merge branch 'refactor/multimodal' of into pr-8

39b5b0d

tomaarsen reviewed Apr 3, 2026

View reviewed changes

Comment thread docs/sentence_transformer/usage/mteb_evaluation.md Outdated

tomaarsen and others added 2 commits April 3, 2026 16:57

Add comma

d946130

Simplify backwards-compat aliases and remove dependency group

085407f

I think we can stick with using typos through pre-commit.

tomaarsen requested a review from Copilot April 3, 2026 15:12

Copilot started reviewing on behalf of tomaarsen April 3, 2026 15:12 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

tomaarsen and others added 2 commits April 3, 2026 17:25

Revert method deletion from d337572

5e13e02

Indent yaml more

bd16749

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tomaarsen requested a review from Copilot April 3, 2026 15:26

Copilot started reviewing on behalf of tomaarsen April 3, 2026 15:26 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread sentence_transformers/sentence_transformer/evaluation/reranking.py

Comment thread sentence_transformers/sentence_transformer/evaluation/binary_classification.py

Comment thread sentence_transformers/sentence_transformer/evaluation/information_retrieval.py

tomaarsen merged commit 1c6d463 into tomaarsen:refactor/multimodal Apr 3, 2026
4 checks passed

Samoed deleted the fix_typos branch April 3, 2026 15:41

Conversation

Samoed commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomaarsen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomaarsen Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Samoed Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Samoed commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tomaarsen commented Apr 3, 2026

Uh oh!

Uh oh!

Samoed commented Apr 3, 2026

Uh oh!

tomaarsen commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Samoed commented Apr 1, 2026 •

edited

Loading

tomaarsen left a comment •

edited

Loading

Samoed commented Apr 2, 2026 •

edited

Loading