{"id":42087,"date":"2025-07-14T06:00:00","date_gmt":"2025-07-14T00:30:00","guid":{"rendered":"https:\/\/debuggercafe.com\/?p=42087"},"modified":"2025-07-14T06:07:24","modified_gmt":"2025-07-14T00:37:24","slug":"litgpt-getting-started","status":"publish","type":"post","link":"https:\/\/debuggercafe.com\/litgpt-getting-started\/","title":{"rendered":"LitGPT &#8211; Getting Started"},"content":{"rendered":"\n<p>We have seen a flood of LLMs for the past 3 years. With this shift, organizations are also releasing new libraries to use these LLMs. Among these, <strong>LitGPT<\/strong> is one of the more prominent and user-friendly ones. With close to <em>40 LLMs<\/em> (at the time of writing this), it has something for every use case. From mobile-friendly to cloud-based LLMs. In this article, we are going to cover all the <strong><em>features of LitGPT<\/em><\/strong> along with examples.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-horizontal is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-499968f5 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--1\"><a class=\"wp-block-button__link has-black-color has-luminous-vivid-orange-background-color has-text-color has-background wp-element-button\" href=\"#download-code\"><strong>Jump to Download Code<\/strong><\/a><\/div>\n<\/div>\n\n\n\n<p><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt-tasks.png\" target=\"_blank\" rel=\" noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"800\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt-tasks.png\" alt=\"Tasks supported by LitGPT - Chat, fine-tuning, pretraining, and evaluation of LLMs.\" class=\"wp-image-42166\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt-tasks.png 800w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt-tasks-300x300.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt-tasks-150x150.png 150w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt-tasks-768x768.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 1. Tasks supported by LitGPT &#8211; Chat, fine-tuning, pretraining, and evaluation of LLMs.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>With LitGPT, we get access to high-performance LLMs. The ease of pretraining, finetuning, evaluating, and deploying these LLMs at scale is what makes LitGPT stand out.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><em>What will we cover in this article?<\/em><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What are the features provided by LitGPT?<\/li>\n\n\n\n<li>How to use a pretrained LLM with LiTGPT?<\/li>\n\n\n\n<li>How do we fine-tune an LLM with a supported dataset?<\/li>\n\n\n\n<li>And how do we fine-tune a LitGPT model using a custom dataset?<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why LitGPT?<\/h2>\n\n\n\n<p>Although there are several options for running LLMs, LitGPT makes the end-to-end workflow extremely easy. It supports:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Easy loading of pretrained LLMs for inference.<\/li>\n\n\n\n<li>Optimized fine-tuning of pre-defined and custom datasets.<\/li>\n\n\n\n<li>Simple evaluation workflows on several benchmark datasets.<\/li>\n\n\n\n<li>And <strong><a href=\"https:\/\/debuggercafe.com\/serving-llms-using-litserve\/\" target=\"_blank\" rel=\"noreferrer noopener\">serving LLMs using LitAPI<\/a><\/strong>.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>With its host of models available, we can choose from several of the latest LLM families such as Qwen,  Llama3.1, or even Phi4.<\/p>\n\n\n\n<p>In this article, after experimenting with pretrained models, we will fine-tune a small language model for German-to-English translation. This will give us a better idea of how LitGPT works on all fronts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Installing LitGPT<\/h2>\n\n\n\n<p>Installing LitGPT is quite straightforward:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pip install 'litgpt[extra]'<\/pre>\n\n\n\n<p>The above command installs all the necessary libraries as well, such as those required from Hugging Face.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Directory Structure<\/h2>\n\n\n\n<p>Let&#8217;s take a look at the entire directory structure and all the notebooks that we will be dealing with:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\u251c\u2500\u2500 checkpoints\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 HuggingFaceTB\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 meta-llama\n\u251c\u2500\u2500 data\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 alpacagpt4\n\u251c\u2500\u2500 finetuning_data\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 train.json\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 val.json\n\u251c\u2500\u2500 smollm2_custom_finetune\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 logs\n\u251c\u2500\u2500 smollm2_finetune\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 logs\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 step-001000\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 step-002000\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 step-003000\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 step-004000\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 step-005000\n\u251c\u2500\u2500 smollm2_wmt_eval\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 config.json\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 generation_config.json\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 model_config.yaml\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 pytorch_model.bin\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 results.json\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 tokenizer_config.json\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 tokenizer.json\n\u251c\u2500\u2500 evaluate.ipynb\n\u251c\u2500\u2500 finetuning_custom_data.ipynb\n\u251c\u2500\u2500 finetuning.ipynb\n\u251c\u2500\u2500 inference_pretrained.ipynb\n\u2514\u2500\u2500 prepare_custom_dataset.ipynb<\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">checkpoints<\/code> directory contains the pretrained models that get downloaded from LitGPT.<\/li>\n\n\n\n<li>The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">data<\/code> and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">finetuning_data<\/code> contain the predefined LitGPT dataset and the custom dataset, respectively.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">smollm2_custom_finetune<\/code> contains the model fine-tuned on the custom dataset, and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">smollm2_finetune<\/code> contains the model fine-tuned on one of the predefined LitGPT datasets.<\/li>\n\n\n\n<li>There are five Jupyter Notebooks directly inside the project directory. We will cover the necessary ones individually.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-background\" style=\"background-color:#ffb76a\"><strong><em>All the Jupyter Notebooks, custom dataset, and custom fine-tuned models are available via the download section.<\/em><\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"download-code\">Download Code<\/h3>\n\n\n\n<iframe loading=\"lazy\" src=\"https:\/\/docs.google.com\/forms\/d\/e\/1FAIpQLSf4CCRwmPmAqa5Bg2KyTsibxaDP9DmHhNyokFutgxTKflY6gA\/viewform?embedded=true\" width=\"640\" height=\"768\" frameborder=\"0\" marginheight=\"0\" marginwidth=\"0\">Loading\u2026<\/iframe>\n\n\n\n<h2 class=\"wp-block-heading\">Inference Using Pretrained Model with LitGPT<\/h2>\n\n\n\n<p>We will start with a simple inference experiment using one of the pretrained models.<\/p>\n\n\n\n<p>The code for this is present in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">inference_pretrained.ipynb<\/code> notebook.<\/p>\n\n\n\n<p>Before running inference, let&#8217;s check all the models that are available for downloading.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"inference_pretrained.ipynb\" data-enlighter-group=\"inference_pretrained_1\"># List all models available to download.\n!litgpt download list<\/pre>\n\n\n\n<p>This lists all the pretrained models available in the library. Here is the truncated output.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Please specify --repo_id &lt;repo_id>. Available values:\nallenai\/OLMo-1B-hf\nallenai\/OLMo-7B-hf\nallenai\/OLMo-7B-Instruct-hf\nBSC-LT\/salamandra-2b\nBSC-LT\/salamandra-2b-instruct\nBSC-LT\/salamandra-7b\nBSC-LT\/salamandra-7b-instruct\ncodellama\/CodeLlama-13b-hf\ncodellama\/CodeLlama-13b-Instruct-hf\ncodellama\/CodeLlama-13b-Python-hf\ncodellama\/CodeLlama-34b-hf\n.\n.\n.\ntogethercomputer\/LLaMA-2-7B-32K\nTrelis\/Llama-2-7b-chat-hf-function-calling-v2\nunsloth\/Mistral-7B-v0.2<\/pre>\n\n\n\n<p>To run inference, we just need one import, that is the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">LLM<\/code> class.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"inference_pretrained.ipynb\" data-enlighter-group=\"inference_pretrained_2\">from litgpt import LLM\n\nmodel = LLM.load('meta-llama\/Llama-3.2-1B-Instruct')\n\ntext = model.generate(\n    'Who are you and what can you do?', \n    max_new_tokens=1024\n)\n\nprint(text)<\/pre>\n\n\n\n<p>Here, we load the LLama-3.2 1B instruct model and call the model&#8217;s <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">generate<\/code> method for inference. We provide the prompt and the number of tokens to generate.<\/p>\n\n\n\n<p>The following is a sample output.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Nice to meet you! I'm a conversational AI, which means I'm a computer program designed to simulate conversations and answer questions to the best of my ability. My primary function is to assist and communicate effectively with users like you, providing helpful and relevant information, answering questions, and engaging in discussions.\n\nHere are some things I can do:\n\n1. **Answer questions**: I can process natural language queries and provide accurate and informative responses...<\/pre>\n\n\n\n<p>We can also run the generation in a streaming manner and output the text as they are generated.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"inference_pretrained.ipynb\" data-enlighter-group=\"inference_pretrained_3\">text = model.generate(\n    'Can we talk about animated videos?', \n    stream=True, \n    max_new_tokens=1024\n)\nfor resulting_text in text:\n    print(resulting_text, end='', flush=True)<\/pre>\n\n\n\n<p>Here, we provide an additional <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">stream=True<\/code> argument and keep printing the text in a streaming manner. Following is a small example of what this looks like.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt-pretrained-streaming-inference.gif\" target=\"_blank\" rel=\" noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"359\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt-pretrained-streaming-inference.gif\" alt=\"Text streaming inference with LitGPT using a pretrained model.\" class=\"wp-image-42176\"\/><\/a><figcaption class=\"wp-element-caption\">Figure 2. Text streaming inference with LitGPT using a pretrained model.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>You can choose any of the models from the list and start experimenting.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Fine-Tuning using LitGPT Predefined Dataset<\/h2>\n\n\n\n<p>Now, we will move to fine-tuning a small language model on one of the datasets that comes packaged with the LitGPT library. We will fine-tune the SmolLM2-135M Instruct model.<\/p>\n\n\n\n<p>The code for this resides in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">finetuning.ipynb<\/code> Jupyter Notebook.<\/p>\n\n\n\n<p>The notebook covers inference on a simple question before we start the fine-tuning process. This will help us understand whether the model improved after fine-tuning.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"finetuning.ipynb\" data-enlighter-group=\"finetuning_1\">from litgpt import LLM\n\nmodel = LLM.load('HuggingFaceTB\/SmolLM2-135M-Instruct')\n\ntext = model.generate(\n    'Can we talk about animated videos?', \n    stream=True, \n    max_new_tokens=1024\n)\nfor resulting_text in text:\n    print(resulting_text, end='', flush=True)<\/pre>\n\n\n\n<p>We are asking the model a simple question about animated videos here. The model gives the following response.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Absolutely! I'd be happy to tailor my answer for you. Let's talk about animated \nvideos.\n\nAnimated videos often involve animations and animations, which are can be \ncreated using different techniques and art styles while adhering to existing \ntemplates and algorithms.\n\nScalable animated videos, also known as screen-shot videos or video one thousand \nhours (VOH), are those created from AI tools. They are created using a variety \nof techniques believed to mimic the natural motion of an element created during \nfilming. These are known as \"manipulation time-lapses.\"\n\nFor a scalable animation tool to produce an animated video, these tools are \ntypically used after all compositing is done. Animations are generated from the \nnecessary physics and other physics equations during that time. Then, these are \nfed into a machine learning algorithm that creates the static animations we \noften see on screen.\n\nSummary: While it's true that animated videos can be created using different \ntechniques and algorithms, the dispute is over how they are created. With the \npurpose of creating a scalable animatronic, it is usually generated from a \nscripted AI tool. Dave's Cloud supplies, on behalf of Hugging Face, gets all this.<\/pre>\n\n\n\n<p>Because we are using a small language model, although the answer seems good, we have an unnecessary summary at the end. Let&#8217;s try to improve that by training it on GPT4-style prompts.<\/p>\n\n\n\n<p>We will use the Alpaca-GPT4 dataset that contains instruction samples generated by GPT4. This can be a good starting point to align our model more towards better responses.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Fine-Tuning SmolLM2-135 Instruct on Alpaca-GPT4 using LitGPT<\/h3>\n\n\n\n<p>Fine-tuning using LitGPT is just a single command that can also be run via the terminal. Here, we are executing it in the Jupyter Notebook.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"finetuning.ipynb\" data-enlighter-group=\"finetuning_2\"># Fine-tune SmolLM2 on Alpaca-GPT4.\n!litgpt finetune_full HuggingFaceTB\/SmolLM2-135M-Instruct \\\n    --data AlpacaGPT4 \\\n    --out_dir smollm2_finetune \\\n    --precision \"bf16-true\" \\\n    --train.save_interval 1000 \\\n    --train.log_interval 500 \\\n    --train.micro_batch_size 4 \\\n    --train.epochs 1 \\\n    --train.max_seq_length 1024 \\<\/pre>\n\n\n\n<p>Here we are using the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">finetune_full<\/code> script that fine-tunes the entire model. LitGPT also supports LoRA and adapter training, which you can <strong><a href=\"https:\/\/github.com\/Lightning-AI\/litgpt\/blob\/main\/tutorials\/finetune.md\" target=\"_blank\" rel=\"noreferrer noopener\">find here<\/a><\/strong>.<\/p>\n\n\n\n<p><strong>Arguments used:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The very first argument is the model. We use one of the models that we listed earlier via <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">litgpt download list<\/code> command.<\/li>\n\n\n\n<li>Next comes the dataset. As we are using a predefined dataset from the library, we just pass the model name to the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--data<\/code> argument.<\/li>\n\n\n\n<li>The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--out_dir<\/code> argument defines the directory where the resulting model will be saved.<\/li>\n\n\n\n<li>As the training was done on an RTX GPU, we are providing the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--precision<\/code> as <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">\"bf16-true\"<\/code>. You can omit this argument if you are not sure whether your GPU supports BF16 or not.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--train.save_interval<\/code> defines after how many backpropagation steps the model will be saved after. For us, it is 1000.<\/li>\n\n\n\n<li>We are logging the train and validation loss after every 500 steps using <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--train.log_interval<\/code>. <\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--train.micro_batch_size<\/code> defines the batch size. For us, that is 4. By default, the global batch size is 16. So, there are a total of 4 gradient accumulation steps. Backpropagation will help after 4 such steps. <\/li>\n\n\n\n<li>We are training for 1 epoch and have set the maximum sequence length to 1024.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>Let&#8217;s take a look at the outputs.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Seed set to 1337\nNumber of trainable parameters: 162,826,560\nThe longest sequence length in the train data is 769, the model's maximum sequence length is 769 and context length is 8192\nVerifying settings ...\nEpoch 1 | iter 500 step 125 | loss train: 1.455, val: n\/a | iter time: 123.73 ms (step)\nEpoch 1 | iter 1000 step 250 | loss train: 1.523, val: n\/a | iter time: 85.16 ms (step)\nEpoch 1 | iter 1500 step 375 | loss train: 1.657, val: n\/a | iter time: 95.80 ms (step)\nEpoch 1 | iter 2000 step 500 | loss train: 1.728, val: n\/a | iter time: 90.64 ms (step)\nValidating ...\nCome up with 3 interesting facts about honeybees.\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nCome up with 3 interesting facts about honeybees.\n\n### Response:\n1. Honey bees are known for their ability to learn the behavior of various food sources and their ability to recognize and distinguish between different varieties.\n\n2. Honey bees can fly up to 12 kilometers (7 miles) in a matter of minutes even while traveling from one flower or plant to another.\n\n3. Honey bees are able to obtain up to 90% of their energy from nectar, which they use to build and forage for themselves. They are also known for their ability to\n\niter 2400: val loss 1.5341, val time: 5309.79 ms\nEpoch 1 | iter 2500 step 625 | loss train: 1.427, val: 1.534 | iter time: 102.72 ms (step)\nEpoch 1 | iter 3000 step 750 | loss train: 1.414, val: 1.534 | iter time: 103.74 ms (step)\nEpoch 1 | iter 3500 step 875 | loss train: 1.442, val: 1.534 | iter time: 94.22 ms (step)\nEpoch 1 | iter 4000 step 1000 | loss train: 1.511, val: 1.534 | iter time: 83.28 ms (step)\nSaving checkpoint to 'smollm2_finetune\/step-001000'\nEpoch 1 | iter 4500 step 1125 | loss train: 1.516, val: 1.534 | iter time: 96.33 ms (step)\n.\n.\n.\niter 9600: val loss 1.3445, val time: 5518.19 ms\nEpoch 1 | iter 10000 step 2500 | loss train: 1.251, val: 1.344 | iter time: 93.27 ms (step)\nEpoch 1 | iter 10500 step 2625 | loss train: 1.460, val: 1.344 | iter time: 87.83 ms (step)\nEpoch 1 | iter 11000 step 2750 | loss train: 1.318, val: 1.344 | iter time: 98.34 ms (step)\nEpoch 1 | iter 11500 step 2875 | loss train: 1.525, val: 1.344 | iter time: 105.72 ms (step)\nEpoch 1 | iter 12000 step 3000 | loss train: 1.192, val: 1.344 | iter time: 90.05 ms (step)\nValidating ...\nCome up with 3 interesting facts about honeybees.\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nCome up with 3 interesting facts about honeybees.\n\n### Response:\n1. Honeybees are born as tiny cone-shaped larvae and grow up, to become swarms of maids or drones. While most honeybufs die shortly after laying eggs, their young also survive, growing into bees who can often serve as caretakers, pollinators, and pollinators, and actually making honey honey.\n\n2. Honeybees are one of the most intelligent organisms in the animal kingdom, exhibiting behaviors such as playing, foraging, and foraging for nectar. Honeybee colonies \u2013 colonies\n\niter 12000: val loss 1.3416, val time: 5511.72 ms\nSaving checkpoint to 'smollm2_finetune\/step-003000'\nEpoch 1 | iter 12500 step 3125 | loss train: 1.430, val: 1.342 | iter time: 127.17 ms (step)\n\n| ------------------------------------------------------\n| Token Counts\n| - Input Tokens              :  7840613\n| - Tokens w\/ Prompt          :  9921786\n| - Total Tokens (w\/ Padding) :  17134448\n| -----------------------------------------------------\n| Performance\n| - Training Time             :  1156.97 s\n| - Tok\/sec                   :  14809.72 tok\/s\n| -----------------------------------------------------\n| Memory Usage                                                                 \n| - Memory Used               :  5.97 GB                                        \n-------------------------------------------------------\n\nValidating ...\nFinal evaluation | val loss: 1.322 | val ppl: 3.749<\/pre>\n\n\n\n<p>At the end, we have a validation loss of 1.322.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Inference After Fine-Tuning<\/h3>\n\n\n\n<p>Let&#8217;s run inference using the final saved model. For this, we will use the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">litgpt chat<\/code> command and execute it in the terminal. This is necessary because the fine-tuned model adheres to a certain prompt format (Alpaca style) that gets correctly loaded via this command. Directly inferencing using <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">model.generate<\/code> causes the model to give the wrong output.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">litgpt chat smollm2_finetune\/final\/ --max_new_tokens 1024<\/pre>\n\n\n\n<p>We tell the script to generate 1024 maximum tokens.<\/p>\n\n\n\n<p>We give exactly the same prompt as before fine-tuning. Here is a small chat session<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt_predef_data_ft_infer.gif\" target=\"_blank\" rel=\" noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"601\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt_predef_data_ft_infer.gif\" alt=\"Chat inference using the model fine-tuned on AlpacaGPT4 dataset. \" class=\"wp-image-42178\"\/><\/a><figcaption class=\"wp-element-caption\">Figure 3. Chat inference using the model fine-tuned on AlpacaGPT4 dataset. <\/figcaption><\/figure>\n<\/div>\n\n\n<p>This time, the answer seems much better.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Fine-Tuning a LitGPT Model on a Custom Dataset<\/h2>\n\n\n\n<p>Now, we will move on to fine-tuning a model on a custom dataset. All fine-tuning using LitGPT happens using the Alpaca dataset format as shown below.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">[\n    {\n        \"instruction\": \"Write a limerick about a\n                        pelican.\u201d,\n        \"input\": \"\",\n        \"output\": \"There once was a pelican so fine,\n                   \\nHis beak was as colorful as\n                   sunshine,\\nHe would fish all day,\\nIn\n                   a very unique way,\\nThis pelican was\n                   truly divine!\\n\\n\\n\"\n    },\n    {\n        \"instruction\": \"Identify the odd one out from\n                        the group.\",\n        \"input\": \"Carrot, Apple, Banana, Grape\",\n        \"output\": \"Carrot\\n\\n\"\n    },\n]<\/pre>\n\n\n\n<p>Now, we will be fine-tuning the SmolLM2-135M Instruct model for German-to-English translation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Preparing Custom Dataset For LitGPT Fine-Tuning<\/h3>\n\n\n\n<p>The first step for us is to prepare the custom dataset in the Alpaca instruction format.<\/p>\n\n\n\n<p>We will use the German to English translation subset of the <strong><a href=\"https:\/\/huggingface.co\/datasets\/wmt\/wmt16\/viewer\/de-en\" target=\"_blank\" rel=\"noreferrer noopener\">WMT16 dataset from Hugging Face<\/a><\/strong>. It contains 4.55 million training, 2170 validation, and around 3000 validation samples. However, we will only use 50000 samples for training.<\/p>\n\n\n\n<p>The dataset preparation code is in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">prepare_custom_dataset.ipynb<\/code> Jupyter Notebook. Let&#8217;s go through that.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"prepare_custom_dataset.ipynb\" data-enlighter-group=\"prepare_custom_dataset_1\">from datasets import load_dataset\nfrom tqdm.auto import tqdm\n\nimport json\nimport os<\/pre>\n\n\n\n<p>We load the dataset from the Hugging Face <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">datasets<\/code> library.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"prepare_custom_dataset.ipynb\" data-enlighter-group=\"prepare_custom_dataset_2\">raw_dataset = load_dataset(\n    'wmt\/wmt16',\n    'de-en'\n)<\/pre>\n\n\n\n<p>Next, isolate the training and validation samples.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"prepare_custom_dataset.ipynb\" data-enlighter-group=\"prepare_custom_dataset_3\">train_dataset = raw_dataset['train']\nvalid_dataset = raw_dataset['validation']<\/pre>\n\n\n\n<p>Create a helper function to generate the custom dataset format.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"prepare_custom_dataset.ipynb\" data-enlighter-group=\"prepare_custom_dataset_4\">def convert_data(orig_data, num_samples=None):\n    json_list = []\n    \n    for i, data in tqdm(enumerate(orig_data), total=len(orig_data)):\n        if num_samples and i == num_samples:\n            break\n        de = data['translation']['de']\n        en = data['translation']['en']\n    \n        sample = {\n            'instruction': f\"Translate from German to English: {de}\",\n            'input': '',\n            'output': en\n        }\n    \n        json_list.append(sample)\n    return json_list<\/pre>\n\n\n\n<p>Finally, create the JSON data and save to the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">finetuning_data<\/code> directory.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"prepare_custom_dataset.ipynb\" data-enlighter-group=\"prepare_custom_dataset_5\">train_json_data = convert_data(train_dataset, num_samples=50000)\nvalid_json_data = convert_data(valid_dataset)\n\nos.makedirs('finetuning_data', exist_ok=True)\n\nwith open('finetuning_data\/train.json', 'w') as f:\n    json.dump(train_json_data, f)\n\nwith open('finetuning_data\/val.json', 'w') as f:\n    json.dump(valid_json_data, f)<\/pre>\n\n\n\n<p>This completes the dataset preparation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Fine-Tuning SmolLM2 on Custom Data<\/h3>\n\n\n\n<p>The code for fine-tuning the SmolLM2-135M Instruct model is present in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">finetuning_custom_data.ipynb<\/code> Jupyter Notebook. Let&#8217;s go through the code.<\/p>\n\n\n\n<p>Before fine-tuning, let&#8217;s check what kind of translation the pretrained model can carry out.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"finetuning_custom_data.ipynb\" data-enlighter-group=\"finetuning_custom_data_1\"># Check tranlation quality before fine-tuning.\n# From German to English.\nfrom litgpt import LLM\n\nmodel = LLM.load('HuggingFaceTB\/SmolLM2-135M-Instruct')\n\n# The English translation is:\n# What are animated videos? Let's talk about them.\n\ntext = model.generate(\n    'Translate from German to English: Was sind animierte Videos? Lassen Sie uns dar\u00fcber sprechen.', \n    stream=True, \n    max_new_tokens=1024\n)\nfor resulting_text in text:\n    print(resulting_text, end='', flush=True)<\/pre>\n\n\n\n<p>The following block shows the result.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">German: Were sich animierter Videos? Unterlagen wohin Sie darauf sprechen k\u00f6nnen.<\/pre>\n\n\n\n<p>It clearly is not capable of translating the text at the moment.<\/p>\n\n\n\n<p>We will now fine-tune the model.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"finetuning_custom_data.ipynb\" data-enlighter-group=\"finetuning_custom_data_2\"># Fine-tune SmolLM2 on Alpaca-GPT4.\n!litgpt finetune_full HuggingFaceTB\/SmolLM2-135M-Instruct \\\n    --data JSON \\\n    --data.json_path finetuning_data \\\n    --out_dir smollm2_custom_finetune \\\n    --precision \"bf16-true\" \\\n    --train.save_interval 1000 \\\n    --train.log_interval 500 \\\n    --train.global_batch_size 16 \\\n    --train.micro_batch_size 4 \\\n    --train.epochs 3 \\\n    --train.max_seq_length 1024 \\\n    --eval.interval 500 \\\n    --eval.evaluate_example \"first\"<\/pre>\n\n\n\n<p>We use almost similar arguments as our previous training experiments, with a few changes. <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--data JSON<\/code> tells the training script that we are using a JSON format dataset.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--data.json_path<\/code> argument either accepts a single JSON file or a directory containing <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">train.json<\/code> and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">val.json<\/code>. For us, it is the latter. If we provide a path to a single JSON file, then we have to provide a validation split, or a default split ratio will be used. However, we already have a validation set.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--eval.evaluate_example \"first\"<\/code> tells the training script to use the first sample from the validation set to evaluate the model in certain intervals.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>Following is the truncated output from the training.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Seed set to 1337\nNumber of trainable parameters: 162,826,560\nThe longest sequence length in the train data is 1024, the model's maximum sequence length is 1024 and context length is 8192\nVerifying settings ...\nEpoch 1 | iter 500 step 125 | loss train: 1.908, val: n\/a | iter time: 82.11 ms (step)\nEpoch 1 | iter 1000 step 250 | loss train: 1.887, val: n\/a | iter time: 83.52 ms (step)\nEpoch 1 | iter 1500 step 375 | loss train: 1.786, val: n\/a | iter time: 83.71 ms (step)\nEpoch 1 | iter 2000 step 500 | loss train: 1.773, val: n\/a | iter time: 81.81 ms (step)\nValidating ...\nTranslate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nTranslate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.\n\n### Response:\nLadies and gentlemen, the Prime Minister, Mr Indiens-Vertredem, and Mr Japens, the Prime Minister, Mr Vertredem, were in Tao at the moment.\n\niter 2000: val loss 2.3086, val time: 3744.57 ms\nEpoch 1 | iter 2500 step 625 | loss train: 1.512, val: 2.309 | iter time: 81.61 ms (step)\nEpoch 1 | iter 3000 step 750 | loss train: 1.562, val: 2.309 | iter time: 83.08 ms (step)\nEpoch 1 | iter 3500 step 875 | loss train: 1.488, val: 2.309 | iter time: 84.55 ms (step)\nEpoch 1 | iter 4000 step 1000 | loss train: 1.780, val: 2.309 | iter time: 84.65 ms (step)\nValidating ...\nTranslate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nTranslate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.\n\n### Response:\nYoung Indieners and Japan are in Tokio.\n\niter 4000: val loss 2.2462, val time: 2996.63 ms\nSaving checkpoint to 'smollm2_custom_finetune\/step-001000'\nEpoch 1 | iter 4500 step 1125 | loss train: 1.434, val: 2.246 | iter time: 82.46 ms (step)\nEpoch 1 | iter 5000 step 1250 | loss train: 1.533, val: 2.246 | iter time: 83.38 ms (step)\nEpoch 1 | iter 5500 step 1375 | loss train: 1.478, val: 2.246 | iter time: 84.04 ms (step)\nEpoch 1 | iter 6000 step 1500 | loss train: 1.545, val: 2.246 | iter time: 84.01 ms (step)\n.\n.\n.\niter 34000: val loss 2.2546, val time: 3041.75 ms\nEpoch 3 | iter 34500 step 8625 | loss train: 0.908, val: 2.255 | iter time: 88.69 ms (step)\nEpoch 3 | iter 35000 step 8750 | loss train: 0.932, val: 2.255 | iter time: 86.94 ms (step)\nEpoch 3 | iter 35500 step 8875 | loss train: 0.893, val: 2.255 | iter time: 83.31 ms (step)\nEpoch 3 | iter 36000 step 9000 | loss train: 0.962, val: 2.255 | iter time: 85.10 ms (step)\nValidating ...\nTranslate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nTranslate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.\n\n### Response:\nThe Prime Ministers of India and Japan are presiding over the meeting in Tokyo.\n\niter 36000: val loss 2.2539, val time: 3058.45 ms\nSaving checkpoint to 'smollm2_custom_finetune\/step-009000'\nEpoch 3 | iter 36500 step 9125 | loss train: 0.906, val: 2.254 | iter time: 83.75 ms (step)\nEpoch 3 | iter 37000 step 9250 | loss train: 0.921, val: 2.254 | iter time: 82.49 ms (step)\nEpoch 3 | iter 37500 step 9375 | loss train: 0.934, val: 2.254 | iter time: 83.13 ms (step)\n\n| ------------------------------------------------------\n| Token Counts\n| - Input Tokens              :  15158385\n| - Tokens w\/ Prompt          :  19657911\n| - Total Tokens (w\/ Padding) :  28767596\n| -----------------------------------------------------\n| Performance\n| - Training Time             :  2914.75 s\n| - Tok\/sec                   :  9869.65 tok\/s\n| -----------------------------------------------------\n| Memory Usage                                                                 \n| - Memory Used               :  7.44 GB                                        \n-------------------------------------------------------\n\nValidating ...\nFinal evaluation | val loss: 2.230 | val ppl: 9.296<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Running Inference using the Custom Dataset Fine-Tuned Model<\/h3>\n\n\n\n<p>We will use the final saved model for inference using the terminal chat command.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"finetuning_custom_data.ipynb\" data-enlighter-group=\"\">litgpt chat smollm2_custom_finetune\/final\/ --max_new_tokens 1024<\/pre>\n\n\n\n<p>Here is that chat session.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt_custom_data_ft_infer.gif\" target=\"_blank\" rel=\" noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"601\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/litgpt_custom_data_ft_infer.gif\" alt=\"Chat inference for machine translation using the fine-tuned SmolLM2 model.\" class=\"wp-image-42179\"\/><\/a><figcaption class=\"wp-element-caption\">Figure 4. Chat inference using LitGPT for machine translation using the fine-tuned SmolLM2 model.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>It seems our model needs much more training before it can correctly translate from German to English. The translation is only partially correct now. We will explore more advanced applications for training and inference in future posts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary and Conclusion<\/h2>\n\n\n\n<p>We covered the basics of LitGPT in this article. Starting from inference using pretrained model, fine-tuning on predefined dataset, to fine-tuning with a custom dataset, we covered a lot of concepts. In future articles, we will cover better fine-tuning strategies.<\/p>\n\n\n\n<p>If you have any questions, thoughts, or suggestions, please leave them in the comment section. I will surely address them. <\/p>\n\n\n\n<p>You can contact me using the <strong><a aria-label=\"Contact (opens in a new tab)\" href=\"https:\/\/debuggercafe.com\/contact-us\/\" target=\"_blank\" rel=\"noreferrer noopener\">Contact<\/a><\/strong> section. You can also find me on <strong><a aria-label=\"LinkedIn (opens in a new tab)\" href=\"https:\/\/www.linkedin.com\/in\/sovit-rath\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a><\/strong>, and <strong><a href=\"https:\/\/x.com\/SovitRath5\" target=\"_blank\" rel=\"noreferrer noopener\">X<\/a><\/strong>.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we explore LitGPT. We cover chatting with pretrained models, fine-tuning on custom dataset, and evaluation of model after fine-tuning.<\/p>\n","protected":false},"author":1,"featured_media":42174,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1154,960,819],"tags":[1318,1314,1320,1321,1316,1315,1317],"class_list":["post-42087","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-generative-ai","category-instruction-tuning-language-models","category-llms","tag-inference-using-litgpt","tag-litgpt-chat","tag-litgpt-chat-inference","tag-litgpt-custom-dataset-fine-tuning","tag-litgpt-evalation","tag-litgpt-fine-tuning","tag-using-litgpt-pretrained-moedls"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>LitGPT - Getting Started<\/title>\n<meta name=\"description\" content=\"LitGPT is a unified library for pretraining, fine-tuning, evaluating, and deploying Large Language Models.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/debuggercafe.com\/litgpt-getting-started\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"LitGPT - Getting Started\" \/>\n<meta property=\"og:description\" content=\"LitGPT is a unified library for pretraining, fine-tuning, evaluating, and deploying Large Language Models.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/debuggercafe.com\/litgpt-getting-started\/\" \/>\n<meta property=\"og:site_name\" content=\"DebuggerCafe\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/profile.php?id=100013731104496\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-14T00:30:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-14T00:37:24+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"640\" \/>\n\t<meta property=\"og:image:height\" content=\"360\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Sovit Ranjan Rath\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:site\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sovit Ranjan Rath\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/\"},\"author\":{\"name\":\"Sovit Ranjan Rath\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"headline\":\"LitGPT &#8211; Getting Started\",\"datePublished\":\"2025-07-14T00:30:00+00:00\",\"dateModified\":\"2025-07-14T00:37:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/\"},\"wordCount\":1507,\"commentCount\":0,\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png\",\"keywords\":[\"Inference using LitGPT\",\"LitGPT Chat\",\"LitGPT Chat Inference\",\"LitGPT Custom Dataset Fine-Tuning\",\"LitGPT Evalation\",\"LitGPT Fine-Tuning\",\"Using LitGPT Pretrained Moedls\"],\"articleSection\":[\"Generative AI\",\"Instruction Tuning Language Models\",\"LLMs\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/debuggercafe.com\/litgpt-getting-started\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/\",\"url\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/\",\"name\":\"LitGPT - Getting Started\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png\",\"datePublished\":\"2025-07-14T00:30:00+00:00\",\"dateModified\":\"2025-07-14T00:37:24+00:00\",\"author\":{\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"description\":\"LitGPT is a unified library for pretraining, fine-tuning, evaluating, and deploying Large Language Models.\",\"breadcrumb\":{\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/debuggercafe.com\/litgpt-getting-started\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/#primaryimage\",\"url\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png\",\"contentUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png\",\"width\":1920,\"height\":1080,\"caption\":\"LitGPT \u2013 Getting Started\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/debuggercafe.com\/litgpt-getting-started\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/debuggercafe.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"LitGPT &#8211; Getting Started\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/debuggercafe.com\/#website\",\"url\":\"https:\/\/debuggercafe.com\/\",\"name\":\"DebuggerCafe\",\"description\":\"Machine Learning and Deep Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/debuggercafe.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\",\"name\":\"Sovit Ranjan Rath\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"caption\":\"Sovit Ranjan Rath\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"LitGPT - Getting Started","description":"LitGPT is a unified library for pretraining, fine-tuning, evaluating, and deploying Large Language Models.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/debuggercafe.com\/litgpt-getting-started\/","og_locale":"en_US","og_type":"article","og_title":"LitGPT - Getting Started","og_description":"LitGPT is a unified library for pretraining, fine-tuning, evaluating, and deploying Large Language Models.","og_url":"https:\/\/debuggercafe.com\/litgpt-getting-started\/","og_site_name":"DebuggerCafe","article_publisher":"https:\/\/www.facebook.com\/profile.php?id=100013731104496","article_published_time":"2025-07-14T00:30:00+00:00","article_modified_time":"2025-07-14T00:37:24+00:00","og_image":[{"url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png","width":640,"height":360,"type":"image\/png"}],"author":"Sovit Ranjan Rath","twitter_card":"summary_large_image","twitter_creator":"@SovitRath5","twitter_site":"@SovitRath5","twitter_misc":{"Written by":"Sovit Ranjan Rath","Est. reading time":"16 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/#article","isPartOf":{"@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/"},"author":{"name":"Sovit Ranjan Rath","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"headline":"LitGPT &#8211; Getting Started","datePublished":"2025-07-14T00:30:00+00:00","dateModified":"2025-07-14T00:37:24+00:00","mainEntityOfPage":{"@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/"},"wordCount":1507,"commentCount":0,"image":{"@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png","keywords":["Inference using LitGPT","LitGPT Chat","LitGPT Chat Inference","LitGPT Custom Dataset Fine-Tuning","LitGPT Evalation","LitGPT Fine-Tuning","Using LitGPT Pretrained Moedls"],"articleSection":["Generative AI","Instruction Tuning Language Models","LLMs"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/debuggercafe.com\/litgpt-getting-started\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/","url":"https:\/\/debuggercafe.com\/litgpt-getting-started\/","name":"LitGPT - Getting Started","isPartOf":{"@id":"https:\/\/debuggercafe.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/#primaryimage"},"image":{"@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png","datePublished":"2025-07-14T00:30:00+00:00","dateModified":"2025-07-14T00:37:24+00:00","author":{"@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"description":"LitGPT is a unified library for pretraining, fine-tuning, evaluating, and deploying Large Language Models.","breadcrumb":{"@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/debuggercafe.com\/litgpt-getting-started\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/#primaryimage","url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png","contentUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2025\/05\/LitGPT_Getting_Started_FeaturedImage_1000x563-1.png","width":1920,"height":1080,"caption":"LitGPT \u2013 Getting Started"},{"@type":"BreadcrumbList","@id":"https:\/\/debuggercafe.com\/litgpt-getting-started\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/debuggercafe.com\/"},{"@type":"ListItem","position":2,"name":"LitGPT &#8211; Getting Started"}]},{"@type":"WebSite","@id":"https:\/\/debuggercafe.com\/#website","url":"https:\/\/debuggercafe.com\/","name":"DebuggerCafe","description":"Machine Learning and Deep Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/debuggercafe.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752","name":"Sovit Ranjan Rath","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","caption":"Sovit Ranjan Rath"}}]}},"_links":{"self":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/42087","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/comments?post=42087"}],"version-history":[{"count":107,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/42087\/revisions"}],"predecessor-version":[{"id":42867,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/42087\/revisions\/42867"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media\/42174"}],"wp:attachment":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media?parent=42087"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/categories?post=42087"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/tags?post=42087"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}