Skip to content

[feat] Per-call processing_kwargs override in Transformer.preprocess#3753

Merged
tomaarsen merged 1 commit into
huggingface:mainfrom
tomaarsen:feat/per-call-processing-kwargs
May 7, 2026
Merged

[feat] Per-call processing_kwargs override in Transformer.preprocess#3753
tomaarsen merged 1 commit into
huggingface:mainfrom
tomaarsen:feat/per-call-processing-kwargs

Conversation

@tomaarsen

@tomaarsen tomaarsen commented May 6, 2026

Copy link
Copy Markdown
Member

Hello!

Pull Request overview

  • Add per-call processing_kwargs override to Transformer.preprocess
  • Allowlist it through encode/predict and document it on the entry points

Details

A dependent project wanted to pass different processing_kwargs on each call rather than only at construction time. I've added a processing_kwargs parameter to Transformer.preprocess that merges on top of the instance-level self.processing_kwargs with shallow per-modality semantics, so individual settings (e.g. only max_length) can be overridden without replacing the whole modality dict.

The merged dict is also threaded through _call_processor to _process_chat_messages, since the chat_template slot was read directly from self.processing_kwargs rather than via the existing modality_kwargs/common_kwargs plumbing. Without this, per-call {"chat_template": {...}} overrides would silently no-op. SentenceTransformer.encode and SparseEncoder.encode now allowlist processing_kwargs alongside task; CrossEncoder.predict and BaseModel.preprocess already forward **kwargs through.

cc @NohTow

  • Tom Aarsen

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a per-call processing_kwargs override path for Transformer.preprocess, and threads the merged kwargs through the processor call chain so chat_template overrides reliably reach apply_chat_template. It also allowlists processing_kwargs at the main inference entry points and adds regression tests to ensure the kwarg isn’t silently dropped.

Changes:

  • Add processing_kwargs parameter to Transformer.preprocess with shallow per-top-level-key merge semantics over self.processing_kwargs.
  • Plumb merged chat_template kwargs through _call_processor_process_chat_messages so per-call overrides work for message inputs.
  • Allowlist and document processing_kwargs on SentenceTransformer.encode / SparseEncoder.encode (and document on CrossEncoder.predict), with test coverage across all three model types.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/sparse_encoder/test_model.py Adds integration test verifying SparseEncoder.encode(..., processing_kwargs=...) affects preprocessing (truncation) and output.
tests/sentence_transformer/test_model.py Adds integration test verifying SentenceTransformer.encode(..., processing_kwargs=...) affects tokenization and embeddings.
tests/cross_encoder/test_model.py Adds integration test verifying CrossEncoder.predict(..., processing_kwargs=...) affects preprocessing and prediction scores.
tests/base/modules/test_transformer.py Adds unit tests for per-call merge behavior, warning on unknown keys, non-mutation, and per-call chat_template override routing.
sentence_transformers/sparse_encoder/model.py Allowlists processing_kwargs in SparseEncoder.encode kwarg validation and documents it.
sentence_transformers/sentence_transformer/model.py Allowlists processing_kwargs in SentenceTransformer.encode kwarg validation and documents it.
sentence_transformers/cross_encoder/model.py Documents processing_kwargs as a forwarded kwarg for predict.
sentence_transformers/base/modules/transformer.py Implements per-call processing_kwargs merge and threads chat_template_kwargs through processor dispatch.
sentence_transformers/base/model.py Documents that processing_kwargs is forwarded to the input module preprocess (notably Transformer).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1218 to +1223
text = "this sentence is much longer than four tokens for sure"
truncated = model.encode(
[text],
convert_to_tensor=True,
processing_kwargs={"text": {"max_length": 4, "truncation": True}},
)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it can be used for actual models from this PR onwards.

@tomaarsen tomaarsen merged commit 9b1e596 into huggingface:main May 7, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants