Conversation
There was a problem hiding this comment.
I've noticed these in the past, but they're breaking as you can imagine. I can resolve them, but only if I additionally introduce some backwards compatibility. Thanks for opening this!
P.s. I'm getting close to the release now. I intend to make integration pull requests for these models before release: https://huggingface.co/collections/tomaarsen/sentence-transformers-v54-integrations
And after release, I aim to make PRs for some other models as well, so that ST can be a convenient interface for every modality. It should be flexible enough, especially with trust_remote_code=True.
| [tool.typos.default.extend-words] | ||
| NCE = "NCE" | ||
| nce = "nce" | ||
| Ein = "Ein" | ||
| ein = "ein" | ||
| ist = "ist" | ||
| arange = "arange" | ||
|
|
||
|
|
||
| [tool.typos.files] | ||
| extend-exclude = [ | ||
| "*.json", | ||
| "*.ipynb", | ||
| "examples/sentence_transformer/unsupervised_learning/query_generation/3_programming_semantic_search.py", | ||
| "examples/sentence_transformer/unsupervised_learning/query_generation/1_programming_query_generation.py", | ||
| "sentence_transformers/sentence_transformer/modules/tokenizer/word.py", | ||
| ] | ||
|
|
||
| [tool.typos.type.py.extend-words] | ||
| thw = "thw" | ||
| nd = "nd" | ||
| Evangelia = "Evangelia" | ||
| aas = "aas" | ||
| sur = "sur" | ||
| paria = "paria" | ||
| ende = "ende" No newline at end of file |
There was a problem hiding this comment.
What distinguishes these two types of extend-words from each other?
There was a problem hiding this comment.
In tool.typos.default.extend-words would work on any file type, tool.typos.type.py.extend-words would work only on python files
|
I made them compatible by name. I'll take a closer look at your changes. Let me know how I can help! I could probably try to help with more ruff rules integration and mypy checking, but they can cause some incompatibilities, e.g. number of args flagged by pylint, or mismatched interfaces caught by mypy, so this would be more like |
I think we can stick with using typos through pre-commit.
There was a problem hiding this comment.
Pull request overview
This PR introduces automated typo checking via typos and applies spelling-related renames/fixes across evaluators, losses, tests, and docs/examples.
Changes:
- Add
typosto pre-commit and configure it viapyproject.toml. - Rename misspelled public APIs (
compute_metrices→compute_metrics,eucledian_distance→euclidean_distance) and add backward-compatibility aliases in several places. - Fix various typos in tests, docs, READMEs, and notebooks.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/sentence_transformer/test_model.py | Fix typo in a test comment. |
| tests/sentence_transformer/evaluation/test_information_retrieval_evaluator.py | Rename test function to test_metrics. |
| sentence_transformers/sparse_encoder/evaluation/sparse_reranking.py | Remove redundant compute_metrices* passthrough methods. |
| sentence_transformers/sparse_encoder/evaluation/sparse_information_retrieval.py | Remove redundant compute_metrices passthrough method. |
| sentence_transformers/sparse_encoder/evaluation/sparse_binary_classification.py | Remove redundant compute_metrices passthrough method. |
| sentence_transformers/sentence_transformer/losses/batch_semi_hard_triplet.py | Switch default distance metric to euclidean_distance. |
| sentence_transformers/sentence_transformer/losses/batch_hard_triplet.py | Rename distance function + add backwards-compat alias. |
| sentence_transformers/sentence_transformer/losses/batch_hard_soft_margin_triplet.py | Switch default distance metric to euclidean_distance. |
| sentence_transformers/sentence_transformer/losses/batch_all_triplet.py | Switch default distance metric to euclidean_distance. |
| sentence_transformers/sentence_transformer/evaluation/reranking.py | Rename compute_metrices* → compute_metrics* + add aliases. |
| sentence_transformers/sentence_transformer/evaluation/information_retrieval.py | Rename compute_metrices → compute_metrics + add alias; remove old helper. |
| sentence_transformers/sentence_transformer/evaluation/binary_classification.py | Rename compute_metrices → compute_metrics + add alias. |
| pyproject.toml | Add typos configuration (words/excludes/back-compat identifiers). |
| examples/sparse_encoder/training/sts/README.md | Fix “Overview” spelling. |
| examples/sparse_encoder/applications/retrieve_rerank/retrieve_rerank_simple_wikipedia.ipynb | Fix typo in markdown cell. |
| examples/sentence_transformer/training/sts/README.md | Fix “Overview” spelling. |
| examples/sentence_transformer/training/data_augmentation/train_sts_qqp_crossdomain.py | Fix comment typo (“question”). |
| examples/sentence_transformer/applications/image-search/image_search_multilingual.ipynb | Fix typo in markdown cell. |
| examples/sentence_transformer/applications/image-search/Image_Classification.ipynb | Fix typos in markdown cell. |
| examples/cross_encoder/training/sts/README.md | Fix “Overview” spelling. |
| examples/cross_encoder/training/ms_marco/README.md | Fix “mileage” spelling. |
| docs/sparse_encoder/usage/efficiency.rst | Fix “mileage” spelling. |
| docs/sentence_transformer/usage/mteb_evaluation.md | Fix grammar (“existing models”). |
| docs/sentence_transformer/usage/efficiency.rst | Fix “mileage” spelling. |
| docs/cross_encoder/usage/efficiency.rst | Fix “mileage” spelling. |
| .pre-commit-config.yaml | Add typos pre-commit hook. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.store_metrics_in_model_card_data(model, metrics, epoch, steps) | ||
| return metrics | ||
|
|
||
| def compute_metrices( | ||
| def compute_metrics( | ||
| self, | ||
| model: SentenceTransformer, | ||
| corpus_model=None, |
There was a problem hiding this comment.
Renaming compute_metrices to compute_metrics introduces a name collision with the internal scoring helper that this class still calls later (self.compute_metrics(queries_result_list[...])). Since the old compute_metrics(queries_result_list) implementation was removed, that call will now recurse back into this method with the wrong argument type and fail at runtime. Please reintroduce the results-list scoring helper under a distinct name (e.g. _compute_ir_metrics / compute_metrics_from_results) and call that where you currently compute per-query metrics, keeping compute_metrics(model, ...) as the public API (and compute_metrices as an alias).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
sentence_transformers/sentence_transformer/evaluation/information_retrieval.py:306
- The return type for
compute_all_metricsis annotated asdict[str, float], but the function returns a nested structure keyed by score-function name and metric name (e.g.scores[name]["ndcg@k"][k]). This mismatch can break type-checking and mislead API consumers; please update the annotation to reflect the nested dict shape (or use a TypedDict /dict[str, Any]if you want to avoid an overly complex type).
def compute_all_metrics(
self,
model: SentenceTransformer,
corpus_model=None,
corpus_embeddings: Tensor | None = None,
output_path: str | None = None,
) -> dict[str, float]:
if corpus_model is None:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I think this is looking good. Thanks for the contribution! I'll merge it now.
I'm curious about
|
|
I tried ruff, when v0.0.1 released, but in my cases mypy was more accurate |
|
Fair enough, I've only used mypy many years ago, I'm not sure about its current state.
|
Add
typosto pre-commit config. A bit breaking changes:@tomaarsen