We ❤️ Open Source

A community education resource

June 9, 2026

8 min read

Fine-tuning is no longer just an AI research process. It’s becoming a developer workflow.

Open source gave you the weights. Now it gives you the tools to make them yours.

By Nihal Kaul

Sandstone canyon with sun beam on the ground — Image by Pexels from Pixabay

While many teams may not want a more intelligent general-purpose model, they probably do want one that’s familiar with their product, their help desk ticket issues, their documentation, their internal labeling, and their particular industry-specific choices.

Prompting assists with instruction. RAG assists with external information. Fine-tuning assists with something else: behavior. Patterns for classification, tone, decision-making boundaries, output formats for structured outputs, and repeating tasks. None of these are solved by writing a better prompt or attaching a vector database. They’re solved by teaching the model how your specific industry operates.

Until recently, that type of instruction required a research lab. That is no longer true.

Why LLM fine-tuning was so daunting for most developers

There were several obstacles. Costly graphics processing units. Difficult to manage training scripts. Fragile datasets. Unclear evaluation metrics. Difficult-to-understand model formats. A deployment method that seemed to be another development project entirely. Underneath all of the above was the worry of “destroying” an existing base model by training it incorrectly.

This worry was valid. Fine-tuning does require extensive machine learning expertise, expensive compute resources, and a willingness to have the model perform in unpredictable manners which are difficult to debug.

However, what changed is that the open source community began to encapsulate that complexity into simple, developer-friendly workflows. Unsloth conducts fine-tuning on a completely free Google Colab T4 GPU at two times faster speeds than other methods and uses 70-80% less memory. Axolotl manages the entire fine-tuning process via a YAML file. The barrier dropped much sooner than most developers realize.

The open source LLM fine-tuning stack: Tools for every layer

The best way to perceive the current collection of tools is not as individual products but rather as layers in a process. Each layer addresses a distinct challenge.

Preparing your dataset is where most fine-tuning processes begin or end. Hugging Face Datasets supplies the tools needed for extracting, filtering, and preparing your dataset for use during fine-tuning.

However, the majority of the work occurs at a domain level: cleaning your labeled examples, categorizing your labels in such a manner that makes sense, and dividing your labeled examples into training and testing sets based upon how you actually plan to use each set.

Efficient fine-tuning represents where the financial aspect became dramatically cheaper. LoRA (Low-Rank Adaptation) freezes the base model weight values and trains adapter matrices to fit the new weight values instead, reducing trainable parameters by roughly 90% while maintaining strong performance.

QLoRA expands upon LoRA and incorporates 4-bit quantization to enable training on a consumer-grade GPU that would normally require enterprise-class hardware. Hugging Face PEFT is the canonical library for both, describing LoRA as a method that “drastically reduces the number of parameters that need to be fine-tuned.”

Training orchestration is where the developer experience has improved most. Axolotl is a configurable tool available under the Apache 2.0 license, and currently includes LoRA, QLoRA, full fine-tuning, reward modeling, and as of March 2025, multimodal training involving both visual and auditory models.

Unsloth achieves faster results through custom Triton kernel code and introduced Unsloth Studio in March 2026 which included a no-code web-based user interface; Unsloth Studio generated enough buzz among developers that it appeared on the front page of Hacker News within one day of release.

LLaMA-Factory contains over 68,000 GitHub stars and offers compatibility with the largest variety of models using its own web interface. TRL is Hugging Face’s official library for conducting supervised fine-tuning, DPO, and RLHF. torchtune is a native library to PyTorch which uses minimal abstractions in order to accommodate consumer-grade GPUs containing 24 GB of VRAM.

Scaling becomes a factor when your team outgrows a single GPU. DeepSpeed and PyTorch’s FSDP both support distributed training across multiple GPUs; Axolotl enables seamless integration with both via its configuration layer.

Evaluation is possibly the least developed component of this stack. While lm-eval-harness provides standard benchmarks, the primary evaluation methodology used by most teams will be a custom test set drawn from production-quality examples of usage. If you do not create an evaluation set before fine-tuning, you’ll have no way to measure whether it worked.

Deployment completes the loop. vLLM, llama.cpp, and Ollama all provide functionality for deploying a base model with LoRA adapters, therefore allowing teams to deploy without merging weights or maintaining a duplicate of the base model.

When to use fine-tuning vs RAG vs prompting

Fine-tuning works well when you need your model to repeat a particular task over and over again. Examples include: categorizing customer service requests, writing corporate messages, finding structured fields in unstructured content, following a very narrow process in an industry, generating code for projects with different styles, or producing exactly the same results with minimal reliance on long, fragile prompts.

Use fine-tuning if the above describes your situation. Otherwise, there may be other methods available: use RAG for constantly changing information, better prompts for simpler tasks, or reconsider if you have low-quality datasets or limited resources to create a validation set. As Nova AI Ops puts it, prompting, RAG, and fine-tuning “get more powerful and more expensive in that order. Most teams should try them in that order too.”

The emerging consensus in 2026 is that the best production systems combine these approaches: put volatile knowledge in retrieval, put stable behavior in fine-tuning, and stop trying to force one tool to do both jobs.

It really does matter which method you choose. All AI problems are not fine-tuning problems and therefore, teams will have to determine whether or not fine-tuning is the best option.

Fine-tuning in practice: Support ticket categorization with LoRA and QLoRA

This is not a hypothetical scenario. Real-world case studies show that fine-tuned small models can reach higher accuracy than generic frontier APIs on internal classification tasks while being roughly 50x cheaper to run in production. One team trained on 3,500 clean examples and classified 70,000 tickets for a total cost of $500. Another used QLoRA on a 7B model with 5,000 hand-labeled examples and achieved 98% accuracy at under $100 in training costs.

A large company receives thousands of support issues per month. These issues are categorized as either: billing, onboarding, bug reports, access to accounts, integration issues, or enterprise escalations. A general-purpose model can categorize these issues, but it performs poorly because it is inconsistent. Frequently, it incorrectly identifies integration issues as bug reports. Just as often, it misroutes enterprise escalations.

Step-by-step solution: Collect a representative sample of each type of issue. Label each example as one of the six types listed above. Clean and normalize the labels. Create two separate files based on these six categories. Create a small base model and then fine-tune it using LoRA or QLoRA with Axolotl or Unsloth. Test the accuracy of the model against a holdout set. Once satisfied, deploy the adapter. Monitor errors in production and retrain when changes occur in the error distribution.

The result is a model that is less expensive to operate than having the model query a remote API for every single issue received by the help desk. The model categorizes issues consistently and makes it easy to add new categories as they become necessary.

The adapter file is typically 2-5 MB, smaller than most .zip files.

What open source fine-tuning means for developers today

Developers were able to access the weights associated with open source models. Now, developers have access to open source fine-tuning tools allowing them to take those weights and develop truly relevant solutions for their specific industries. As Hugging Face’s PEFT documentation notes, parameter-efficient methods make it “more accessible to train and store large language models on consumer hardware,” and the ecosystem around these tools continues to grow rapidly.

Every part of the development cycle, from preparing a dataset for fine-tuning through training, testing, and finally deploying, can be done entirely with open source tools. According to a 2026 comparison of fine-tuning frameworks, LLaMA-Factory, Axolotl, Unsloth, and torchtune each serve different hardware profiles and use cases, giving teams real options regardless of their GPU budget.

Clearly there is potential here for developers. The fine-tuning workflow still has rough edges. There are several opportunities for developers to contribute including developing better tools for creating datasets for fine-tuning, developing better frameworks for evaluating fine-tuned models, and developing easier-to-use tools for managing adapters.

In my previous article, I made the case that small, open source models can handle the majority of production AI workloads. Fine-tuning is how you close the remaining gap. The base model gives you general intelligence. What you train it on gives you domain expertise.

About the Author

Nihal Kaul

Nihal Kaul is a Lead Software Engineer at Revscale AI who builds scalable, cloud native systems for fast moving startups. He works across distributed systems, infrastructure, and reliability, and has recently focused on AI systems design and the memory layer, building persistent, context aware automation that makes products more adaptive and useful.

Read Nihal Kaul's Full Bio

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.

The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.

Working on something worth sharing? Write for us.

Get Started

Contribute to We ❤️ Open Source

Help educate our community by contributing a blog post, tutorial, or how-to.

Two World-class Events

If you didn't make it to All Things AI, check out the event summary, and make plans to join us October 19-20 for All Things Open.

Open Source Meetups

We host some of the most active open source meetups in the U.S. Get more info and RSVP to an upcoming event.