Skip to content

Embedding / bi-encoder fine-tuning with Unsloth + sentence-transformers#3718

Closed
chiggly007 wants to merge 7 commits intounslothai:mainfrom
chiggly007:feat/embedding-sentence-transformers
Closed

Embedding / bi-encoder fine-tuning with Unsloth + sentence-transformers#3718
chiggly007 wants to merge 7 commits intounslothai:mainfrom
chiggly007:feat/embedding-sentence-transformers

Conversation

@chiggly007
Copy link

Adds docs + runnable example for fine-tuning embedding models with FastModel + LoRA + sentence-transformers. Includes guidance to disable fast-generation for non-causal encoders.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chiggly007, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands Unsloth's capabilities by enabling and thoroughly documenting the fine-tuning of embedding and bi-encoder models. It provides a comprehensive guide and a runnable example demonstrating how to leverage Unsloth's FastModel and LoRA with the sentence-transformers library. The changes specifically address the unique requirements of non-causal encoder models, such as the necessity to disable fast-generation kernels, allowing users to efficiently fine-tune a broader range of models for tasks like semantic search and information retrieval.

Highlights

  • Embedding Model Fine-tuning Support: Introduced comprehensive support for fine-tuning embedding and bi-encoder models (e.g., BERT, E5, Arctic-Embed) using Unsloth's FastModel API and LoRA, integrated with the sentence-transformers training ecosystem.
  • New Documentation Guide: Added a detailed guide (docs/basics/embedding-model-fine-tuning.md) outlining the step-by-step process for embedding model fine-tuning, covering aspects like disabling fast-generation kernels for non-causal encoders, model loading, LoRA adapter configuration, SentenceTransformer wrapping, and training procedures.
  • Runnable Example Script: Provided a new runnable Python example script (examples/embedding_sentence_transformers.py) that demonstrates an end-to-end workflow for fine-tuning an embedding model, including dataset preparation and training with MultipleNegativesRankingLoss.
  • README Update: The README.md has been updated with a new section dedicated to embedding/bi-encoder fine-tuning, including direct links to the new example script and the comprehensive guide.
  • Troubleshooting Guidance: Enhanced the troubleshooting documentation (docs/basics/troubleshooting-and-faqs.md) by adding specific guidance on disabling fast generation for embedding models when encountering CUDA-related runtime errors.
  • Dataset Shuffling Callback: Implemented a ShuffleDatasetCallback in the example script to ensure reproducible per-epoch randomization of the training dataset, which is crucial for robust model training.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation and a runnable example for fine-tuning embedding models using Unsloth and sentence-transformers. The additions are comprehensive and will be very helpful for users. My review focuses on improving the model saving process for correctness and providing a more complete example, along with a minor code style suggestion.

Comment on lines +180 to +181
sbert_model.save_pretrained("embeddings_merged")
model.save_pretrained_merged("embeddings_merged", tokenizer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current saving method is incorrect as model.save_pretrained_merged will overwrite files created by sbert_model.save_pretrained (like config.json), resulting in a corrupted model directory that cannot be loaded as a SentenceTransformer.

To correctly save the merged SentenceTransformer model, you should first merge the LoRA adapters in-place and then save the sbert_model, which now contains the merged weights. This ensures the entire pipeline (including pooling and normalization layers) is saved correctly.

Suggested change
sbert_model.save_pretrained("embeddings_merged")
model.save_pretrained_merged("embeddings_merged", tokenizer)
model.merge_and_unload()
sbert_model.save_pretrained("embeddings_merged")

`sentence-transformers` expects its own `Transformer` module. You can reuse the Unsloth‑loaded model/tokenizer by injecting them into that module.

```python
import sentence_transformers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sentence_transformers library is imported locally within the get_st_unsloth_wrapper function. It's better to move all imports to the top of the script for clarity and to follow standard Python conventions. Since sentence_transformers is already imported in the example code block in section 4, this line is redundant and can be removed.

tokenizer,
base_model_id = BASE_MODEL_ID,
pooling_mode = "cls",
max_seq_length = MAX_SEQ_LENGTH,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sentence_transformers library is imported locally within the get_st_unsloth_wrapper function, but it's already imported at the top of the file (lines 24-28). This local import is redundant and should be removed to adhere to Python best practices (PEP 8).

lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(
f"{round(trainer_stats.metrics['train_runtime'] / 60, 2)} minutes used for training."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

model.save_pretrained_merged saves the merged Hugging Face model, but it doesn't save the full SentenceTransformer pipeline (which includes pooling and normalization layers). To provide a more complete and user-friendly example, you can save the entire SentenceTransformer model after merging the LoRA adapters. This allows users to load the complete model with a single command and is more consistent with the goal of fine-tuning a sentence-transformer.

Suggested change
)
print("Merging LoRA adapters...")
model.merge_and_unload()
print(f"Saving merged SentenceTransformer model to {name}_{run}_merged...")
sbert_model.save_pretrained(f"{name}_{run}_merged")
print("Done.")

@danielhanchen
Copy link
Contributor

@chiggly007 Oh nice work! Would you be interested in making this inside of Unsloth Docs and a notebook as well?

@shimmyshimmer
Copy link
Collaborator

Closed because duplicate of: #3719

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants