Skip to content

add typos check#8

Merged
tomaarsen merged 7 commits intotomaarsen:refactor/multimodalfrom
Samoed:fix_typos
Apr 3, 2026
Merged

add typos check#8
tomaarsen merged 7 commits intotomaarsen:refactor/multimodalfrom
Samoed:fix_typos

Conversation

@Samoed
Copy link
Copy Markdown

@Samoed Samoed commented Apr 1, 2026

Add typos to pre-commit config. A bit breaking changes:

  • compute_metrices -> compute_metrics, but there was old function with same name and I deleted it... Not sure what to do with it
  • eucledian_distance -> eucledian_distance

@tomaarsen

Copy link
Copy Markdown
Owner

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed these in the past, but they're breaking as you can imagine. I can resolve them, but only if I additionally introduce some backwards compatibility. Thanks for opening this!

P.s. I'm getting close to the release now. I intend to make integration pull requests for these models before release: https://huggingface.co/collections/tomaarsen/sentence-transformers-v54-integrations

And after release, I aim to make PRs for some other models as well, so that ST can be a convenient interface for every modality. It should be flexible enough, especially with trust_remote_code=True.

Comment thread pyproject.toml Outdated
Comment on lines +125 to +150
[tool.typos.default.extend-words]
NCE = "NCE"
nce = "nce"
Ein = "Ein"
ein = "ein"
ist = "ist"
arange = "arange"


[tool.typos.files]
extend-exclude = [
"*.json",
"*.ipynb",
"examples/sentence_transformer/unsupervised_learning/query_generation/3_programming_semantic_search.py",
"examples/sentence_transformer/unsupervised_learning/query_generation/1_programming_query_generation.py",
"sentence_transformers/sentence_transformer/modules/tokenizer/word.py",
]

[tool.typos.type.py.extend-words]
thw = "thw"
nd = "nd"
Evangelia = "Evangelia"
aas = "aas"
sur = "sur"
paria = "paria"
ende = "ende" No newline at end of file
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What distinguishes these two types of extend-words from each other?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tool.typos.default.extend-words would work on any file type, tool.typos.type.py.extend-words would work only on python files

@Samoed
Copy link
Copy Markdown
Author

Samoed commented Apr 2, 2026

I made them compatible by name.

I'll take a closer look at your changes. Let me know how I can help! I could probably try to help with more ruff rules integration and mypy checking, but they can cause some incompatibilities, e.g. number of args flagged by pylint, or mismatched interfaces caught by mypy, so this would be more like v6 version, rather v5.4. Let me know what you think!

@Samoed Samoed requested a review from tomaarsen April 2, 2026 10:43
Comment thread docs/sentence_transformer/usage/mteb_evaluation.md Outdated
tomaarsen and others added 2 commits April 3, 2026 16:57
I think we can stick with using typos through pre-commit.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces automated typo checking via typos and applies spelling-related renames/fixes across evaluators, losses, tests, and docs/examples.

Changes:

  • Add typos to pre-commit and configure it via pyproject.toml.
  • Rename misspelled public APIs (compute_metricescompute_metrics, eucledian_distanceeuclidean_distance) and add backward-compatibility aliases in several places.
  • Fix various typos in tests, docs, READMEs, and notebooks.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/sentence_transformer/test_model.py Fix typo in a test comment.
tests/sentence_transformer/evaluation/test_information_retrieval_evaluator.py Rename test function to test_metrics.
sentence_transformers/sparse_encoder/evaluation/sparse_reranking.py Remove redundant compute_metrices* passthrough methods.
sentence_transformers/sparse_encoder/evaluation/sparse_information_retrieval.py Remove redundant compute_metrices passthrough method.
sentence_transformers/sparse_encoder/evaluation/sparse_binary_classification.py Remove redundant compute_metrices passthrough method.
sentence_transformers/sentence_transformer/losses/batch_semi_hard_triplet.py Switch default distance metric to euclidean_distance.
sentence_transformers/sentence_transformer/losses/batch_hard_triplet.py Rename distance function + add backwards-compat alias.
sentence_transformers/sentence_transformer/losses/batch_hard_soft_margin_triplet.py Switch default distance metric to euclidean_distance.
sentence_transformers/sentence_transformer/losses/batch_all_triplet.py Switch default distance metric to euclidean_distance.
sentence_transformers/sentence_transformer/evaluation/reranking.py Rename compute_metrices*compute_metrics* + add aliases.
sentence_transformers/sentence_transformer/evaluation/information_retrieval.py Rename compute_metricescompute_metrics + add alias; remove old helper.
sentence_transformers/sentence_transformer/evaluation/binary_classification.py Rename compute_metricescompute_metrics + add alias.
pyproject.toml Add typos configuration (words/excludes/back-compat identifiers).
examples/sparse_encoder/training/sts/README.md Fix “Overview” spelling.
examples/sparse_encoder/applications/retrieve_rerank/retrieve_rerank_simple_wikipedia.ipynb Fix typo in markdown cell.
examples/sentence_transformer/training/sts/README.md Fix “Overview” spelling.
examples/sentence_transformer/training/data_augmentation/train_sts_qqp_crossdomain.py Fix comment typo (“question”).
examples/sentence_transformer/applications/image-search/image_search_multilingual.ipynb Fix typo in markdown cell.
examples/sentence_transformer/applications/image-search/Image_Classification.ipynb Fix typos in markdown cell.
examples/cross_encoder/training/sts/README.md Fix “Overview” spelling.
examples/cross_encoder/training/ms_marco/README.md Fix “mileage” spelling.
docs/sparse_encoder/usage/efficiency.rst Fix “mileage” spelling.
docs/sentence_transformer/usage/mteb_evaluation.md Fix grammar (“existing models”).
docs/sentence_transformer/usage/efficiency.rst Fix “mileage” spelling.
docs/cross_encoder/usage/efficiency.rst Fix “mileage” spelling.
.pre-commit-config.yaml Add typos pre-commit hook.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .pre-commit-config.yaml Outdated
Comment on lines 296 to 302
self.store_metrics_in_model_card_data(model, metrics, epoch, steps)
return metrics

def compute_metrices(
def compute_metrics(
self,
model: SentenceTransformer,
corpus_model=None,
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming compute_metrices to compute_metrics introduces a name collision with the internal scoring helper that this class still calls later (self.compute_metrics(queries_result_list[...])). Since the old compute_metrics(queries_result_list) implementation was removed, that call will now recurse back into this method with the wrong argument type and fail at runtime. Please reintroduce the results-list scoring helper under a distinct name (e.g. _compute_ir_metrics / compute_metrics_from_results) and call that where you currently compute per-query metrics, keeping compute_metrics(model, ...) as the public API (and compute_metrices as an alias).

Copilot uses AI. Check for mistakes.
tomaarsen and others added 2 commits April 3, 2026 17:25
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

sentence_transformers/sentence_transformer/evaluation/information_retrieval.py:306

  • The return type for compute_all_metrics is annotated as dict[str, float], but the function returns a nested structure keyed by score-function name and metric name (e.g. scores[name]["ndcg@k"][k]). This mismatch can break type-checking and mislead API consumers; please update the annotation to reflect the nested dict shape (or use a TypedDict / dict[str, Any] if you want to avoid an overly complex type).
    def compute_all_metrics(
        self,
        model: SentenceTransformer,
        corpus_model=None,
        corpus_embeddings: Tensor | None = None,
        output_path: str | None = None,
    ) -> dict[str, float]:
        if corpus_model is None:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sentence_transformers/sentence_transformer/evaluation/reranking.py
@tomaarsen
Copy link
Copy Markdown
Owner

I think this is looking good. Thanks for the contribution! I'll merge it now.

I'll take a closer look at your changes. Let me know how I can help! I could probably try to help with more ruff rules integration and mypy checking, but they can cause some incompatibilities, e.g. number of args flagged by pylint, or mismatched interfaces caught by mypy, so this would be more like v6 version, rather v5.4. Let me know what you think!

I'm curious about ruff, but pretty happy with their defaults. For type checking: I'm curious about ty. I think transformers is also using it, and I've been using it in VS Code for a few months. I'll likely follow what transformers does and slowly introduce it over time.

  • Tom Aarsen

@tomaarsen tomaarsen merged commit 1c6d463 into tomaarsen:refactor/multimodal Apr 3, 2026
4 checks passed
@Samoed Samoed deleted the fix_typos branch April 3, 2026 15:41
@Samoed
Copy link
Copy Markdown
Author

Samoed commented Apr 3, 2026

I tried ruff, when v0.0.1 released, but in my cases mypy was more accurate

@tomaarsen
Copy link
Copy Markdown
Owner

Fair enough, I've only used mypy many years ago, I'm not sure about its current state.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants