Embedding / bi-encoder fine-tuning with Unsloth + sentence-transformers#3718
Embedding / bi-encoder fine-tuning with Unsloth + sentence-transformers#3718chiggly007 wants to merge 7 commits intounslothai:mainfrom
Conversation
Summary of ChangesHello @chiggly007, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands Unsloth's capabilities by enabling and thoroughly documenting the fine-tuning of embedding and bi-encoder models. It provides a comprehensive guide and a runnable example demonstrating how to leverage Unsloth's Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces documentation and a runnable example for fine-tuning embedding models using Unsloth and sentence-transformers. The additions are comprehensive and will be very helpful for users. My review focuses on improving the model saving process for correctness and providing a more complete example, along with a minor code style suggestion.
| sbert_model.save_pretrained("embeddings_merged") | ||
| model.save_pretrained_merged("embeddings_merged", tokenizer) |
There was a problem hiding this comment.
The current saving method is incorrect as model.save_pretrained_merged will overwrite files created by sbert_model.save_pretrained (like config.json), resulting in a corrupted model directory that cannot be loaded as a SentenceTransformer.
To correctly save the merged SentenceTransformer model, you should first merge the LoRA adapters in-place and then save the sbert_model, which now contains the merged weights. This ensures the entire pipeline (including pooling and normalization layers) is saved correctly.
| sbert_model.save_pretrained("embeddings_merged") | |
| model.save_pretrained_merged("embeddings_merged", tokenizer) | |
| model.merge_and_unload() | |
| sbert_model.save_pretrained("embeddings_merged") |
| `sentence-transformers` expects its own `Transformer` module. You can reuse the Unsloth‑loaded model/tokenizer by injecting them into that module. | ||
|
|
||
| ```python | ||
| import sentence_transformers |
There was a problem hiding this comment.
The sentence_transformers library is imported locally within the get_st_unsloth_wrapper function. It's better to move all imports to the top of the script for clarity and to follow standard Python conventions. Since sentence_transformers is already imported in the example code block in section 4, this line is redundant and can be removed.
| tokenizer, | ||
| base_model_id = BASE_MODEL_ID, | ||
| pooling_mode = "cls", | ||
| max_seq_length = MAX_SEQ_LENGTH, |
| lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) | ||
| print( | ||
| f"{round(trainer_stats.metrics['train_runtime'] / 60, 2)} minutes used for training." | ||
| ) |
There was a problem hiding this comment.
model.save_pretrained_merged saves the merged Hugging Face model, but it doesn't save the full SentenceTransformer pipeline (which includes pooling and normalization layers). To provide a more complete and user-friendly example, you can save the entire SentenceTransformer model after merging the LoRA adapters. This allows users to load the complete model with a single command and is more consistent with the goal of fine-tuning a sentence-transformer.
| ) | |
| print("Merging LoRA adapters...") | |
| model.merge_and_unload() | |
| print(f"Saving merged SentenceTransformer model to {name}_{run}_merged...") | |
| sbert_model.save_pretrained(f"{name}_{run}_merged") | |
| print("Done.") |
|
@chiggly007 Oh nice work! Would you be interested in making this inside of Unsloth Docs and a notebook as well? |
|
Closed because duplicate of: #3719 |
Adds docs + runnable example for fine-tuning embedding models with FastModel + LoRA + sentence-transformers. Includes guidance to disable fast-generation for non-causal encoders.