{"id":32830,"date":"2023-11-06T06:00:00","date_gmt":"2023-11-06T00:30:00","guid":{"rendered":"https:\/\/debuggercafe.com\/?p=32830"},"modified":"2024-09-15T21:15:37","modified_gmt":"2024-09-15T15:45:37","slug":"text-classification-using-transformer-encoder-in-pytorch","status":"publish","type":"post","link":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/","title":{"rendered":"Text Classification using Transformer Encoder in PyTorch"},"content":{"rendered":"\n<p>In this blog post, we will use the <strong>Transformer encoder model for text classification<\/strong>.<\/p>\n\n\n\n<p>The original transformer model by Vaswani et al. was an encoder-decoder model. Such models are excellent for language translation tasks. However, in some cases, only the encoder or the decoder part of the transformer works better. One such task is text classification. For text classification, although we can use the entire transformer model, an encoder-only network with a classifier head works much better. <\/p>\n\n\n\n<div class=\"wp-block-buttons is-horizontal is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-499968f5 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--1\"><a class=\"wp-block-button__link has-black-color has-luminous-vivid-orange-background-color has-text-color has-background wp-element-button\" href=\"#download-code\"><strong>Jump to Download Code<\/strong><\/a><\/div>\n<\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>That is what we will accomplish in this blog post. We will build an encoder-only network that is true to the original transformer language model and use it for text classification. We will use PyTorch <strong>nn.TransformeEncoderLayer<\/strong> and <strong>nn.TransformerEncoder<\/strong> to build the neural network.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-workflow.gif\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"800\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-workflow.gif\" alt=\"Workflow of text classification using Transformer Encoder model.\" class=\"wp-image-32889\"\/><\/a><figcaption class=\"wp-element-caption\">Figure 1. Workflow of text classification using Transformer Encoder model.<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><em>We will cover the following points in this blog post<\/em><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>To carry out the text classification using the transformer encoder, we will use the IMDb movie review dataset. So, we will start with a discussion of the dataset.<\/em><\/li>\n\n\n\n<li><em>Next, we will move on to the Jupyter Notebook that contains the code. Here, we will mostly focus on the <strong>encoder-only transformer model<\/strong> preparation part. <\/em><\/li>\n\n\n\n<li><em>After training the model, we will also run testing and inference. The testing will happen on a held-out set of the IMDb dataset. For inference, we will pick out some real movie reviews from the internet.<\/em><\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The IMDb Movie Review Dataset<\/h2>\n\n\n\n<p>The IMDb movie review dataset contains several thousand real-life movie reviews. Each movie review may be positive or negative.<\/p>\n\n\n\n<p>You can find the <strong><a href=\"https:\/\/www.kaggle.com\/datasets\/sovitrath\/imdb-movie-review-classification-full-and-mini\" target=\"_blank\" rel=\"noreferrer noopener\">IMDb dataset<\/a><\/strong> that we will use on Kaggle. You can go ahead, download, and extract the dataset if you wish running the training on your own. It will extract two directories, <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">aclImdb<\/code> and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">imdb_mini<\/code>. The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">imdb_mini<\/code> is a subset of the dataset. However, we will use the full dataset present in <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">aclImdb<\/code>.<\/p>\n\n\n\n<p>Here is the directory structure of <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">aclImdb<\/code>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">aclImdb\/\n\u2514\u2500\u2500 aclImdb\n    \u251c\u2500\u2500 imdbEr.txt\n    \u251c\u2500\u2500 imdb.vocab\n    \u251c\u2500\u2500 README\n    \u251c\u2500\u2500 test\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 labeledBow.feat\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 neg\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 pos\n    \u2502\u00a0\u00a0 \u251c\u2500\u2500 urls_neg.txt\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 urls_pos.txt\n    \u2514\u2500\u2500 train\n        \u251c\u2500\u2500 labeledBow.feat\n        \u251c\u2500\u2500 neg\n        \u251c\u2500\u2500 pos\n        \u251c\u2500\u2500 unsup\n        \u251c\u2500\u2500 unsupBow.feat\n        \u251c\u2500\u2500 urls_neg.txt\n        \u251c\u2500\u2500 urls_pos.txt\n        \u2514\u2500\u2500 urls_unsup.txt<\/pre>\n\n\n\n<p>We are mostly interested in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">train<\/code> and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">test<\/code> directories. In the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">train<\/code> directory, there is a <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">neg<\/code> and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">pos<\/code> subdirectory. Each of these contain <strong>12500 negative and positive<\/strong> movie review samples respectively in separate text files. While preparaing the dataset, we will use a small subset from the training set for validation.<\/p>\n\n\n\n<p>The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">test<\/code> directory is entirey reserved for testing after training the model. Here also, there are <strong>12500 negative and 12500 positive<\/strong> movie review samples.<\/p>\n\n\n\n<p>For now, we can ignore the other files and directories.<\/p>\n\n\n\n<p>Let&#8217;s take a look at one positive and one negative movie review.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/positive-imdb-review.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"230\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/positive-imdb-review.png\" alt=\"Positive movie review sample from the IMDb dataset.\" class=\"wp-image-32893\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/positive-imdb-review.png 600w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/positive-imdb-review-300x115.png 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 2. Positive movie review sample from the IMDb dataset.<\/figcaption><\/figure>\n<\/div>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/negative-imdb-review.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"599\" height=\"272\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/negative-imdb-review.png\" alt=\"Negative movie review sample from the IMDb dataset.\" class=\"wp-image-32894\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/negative-imdb-review.png 599w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/negative-imdb-review-300x136.png 300w\" sizes=\"auto, (max-width: 599px) 100vw, 599px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 3. Negative movie review sample from the IMDb dataset.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>It&#8217;s clear that the reviews can be quite long and comprehensive. We must choose the hyperparameters accordingly while training to get the best possible results.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Project Directory Structure<\/h2>\n\n\n\n<p>The following block contains the entire project directory structure.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\u251c\u2500\u2500 aclImdb\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 aclImdb\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 test\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 train\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 imdbEr.txt\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 imdb.vocab\n\u2502\u00a0\u00a0     \u2514\u2500\u2500 README\n\u251c\u2500\u2500 inference_data\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 sample_1.txt\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 sample_2.txt\n\u251c\u2500\u2500 outputs\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 accuracy.png\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 loss.png\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 model.pth\n\u2514\u2500\u2500 transformer_encoder_classification.ipynb<\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">aclImdb<\/code> directory contains the movie review dataset that we analyzed in the previous section.<\/li>\n\n\n\n<li>We have a few unseen movie review samples in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">inference_data<\/code> directory. Each text file contains one movie review.<\/li>\n\n\n\n<li>The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">outputs<\/code> directory will contain all the training outputs. These include the trained model, the accuracy, and the loss plots.<\/li>\n\n\n\n<li>Finally, we have the Jupyter Notebook that contains the code directly inside the project directory.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">PyTorch Version<\/h3>\n\n\n\n<p>The codebase for this blog post has been developed using <strong>PyTorch 2.0.1<\/strong>. You may go ahead and install PyTorch in your environment if you wish to run the notebook for transformer encoder text classification.<\/p>\n\n\n\n<p class=\"has-background\" style=\"background-color:#ffb76a\"><strong><em>The trained weights and the Jupyter Notebook will be available through the download section of this blog post. You can easily run just testing and inference using the trained model.<\/em><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Text Classification using Transformer Encoder<\/h2>\n\n\n\n<p>Let&#8217;s start with the coding part of the blog post. While going though the Jupyter Notebook, we will not go through the dataset preparation code explanation in detail. If you wish to get a detailed explanation of the dataset preparation, please go through the <strong><a href=\"https:\/\/debuggercafe.com\/text-classification-using-pytorch\/\" target=\"_blank\" rel=\"noreferrer noopener\">Getting Started with Text Classification<\/a><\/strong> post. In that post, we go through each and every component of IMDb dataset preparation in detail.<\/p>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"download-code\">Download Code<\/h3>\n\n\n\n<div class=\"wp-block-button is-style-outline center\"><a data-sumome-listbuilder-id=\"3189724c-3062-47f1-8bbd-7db2df60dcc0\" class=\"wp-block-button__link has-black-color has-luminous-vivid-orange-background-color has-text-color has-background\"><b>Download the Source Code for this Tutorial<\/b><\/a><\/div>\n\n\n\n<p>In this blog post, we will mostly focus on how to prepare a text classification model using the transformer encoder.<\/p>\n\n\n\n<p>Starting with the import statements, setting seeds, and defining the necessary directory paths.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import torch\nimport os\nimport pathlib\nimport numpy as np\nimport glob\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport matplotlib.pyplot as plt\nimport re\nimport string\nimport math\n\nfrom tqdm.auto import tqdm\nfrom collections import Counter\nfrom torch.utils.data import DataLoader, Subset, Dataset\nfrom torch import nn\n\nplt.style.use('ggplot')<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Set seed.\nseed = 42\nnp.random.seed(seed)\ntorch.manual_seed(seed)\ntorch.cuda.manual_seed(seed)\ntorch.backends.cudnn.deterministic = True\ntorch.backends.cudnn.benchmark = True<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">OUTPUTS_DIR = 'outputs'\nos.makedirs(OUTPUTS_DIR, exist_ok=True)\n\ndata_dir = os.path.join('aclImdb\/aclImdb\/')\ndataset_dir = os.path.join(data_dir)\ntrain_dir = os.path.join(dataset_dir, 'train')\nprint(os.listdir(dataset_dir))\nprint(os.listdir(train_dir))<\/pre>\n\n\n\n<p>In the above code block, we define an output directory where the trained model and the plots will be stored. Along with that, the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">data_dir<\/code> variable holds the path to the dataset root directory.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Defining the Training and Dataset Parameters <\/h3>\n\n\n\n<p>Let&#8217;s define the training and dataset parameters now.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">MAX_LEN = 1024\n# Use these many top words from the dataset. If -1, use all words.\nNUM_WORDS = 32000 # Vocabulary size.\n# Batch size.\nBATCH_SIZE = 32\nVALID_SPLIT = 0.20\nEPOCHS = 30\nLR = 0.00001<\/pre>\n\n\n\n<p>Going over each of the variables that we have defined above:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">MAX_LEN<\/code>: This is going to be the maximum length of each review to consider while preparing the dataset. If the review is less than 1024 words, we will pad it with 0s and if it is more than 1024 words, then we will truncate it.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">NUM_WORDS<\/code>: This is the number of unique words to consider from the entire dataset. The IMDb dataset contains more than 600000 unique words. However, it is not feasible to train a model with these many unique words. So, we will consider 32000 unique words from the entire dataset.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">BATCH_SIZE<\/code>: The batch size for the data loaders.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">VALID_SPLIT<\/code>: As we will split the training samples into a training and a validation set, we need a define a split. 20% of the samples will be used for validation and the rest for training.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">EPOCHS<\/code>: The number of epochs to train for.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">LR<\/code>: The learning rate for the optimizer.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dataset Preparation<\/h3>\n\n\n\n<p>Let&#8217;s get on with the dataset preparation for training our transformer encoder for text classification. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Finding the Longest and Average Review Length from the IMDb Dataset<\/h4>\n\n\n\n<p>The following is a function that finds the longest review among all the training files that we have.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def find_longest_length(text_file_paths):\n    \"\"\"\n    Find the longest review length in the entire training set. \n\n    :param text_file_paths: List, containing all the text file paths.\n\n    Returns:\n        max_len: Longest review length.\n    \"\"\"\n    max_length = 0\n    for path in text_file_paths:\n        with open(path, 'r') as f:\n            text = f.read()\n            # Remove &lt;br> tags.\n            text = re.sub('&lt;[^>]+>+', '', text)\n            corpus = [\n                word for word in text.split()\n            ]\n        if len(corpus) > max_length:\n            max_length = len(corpus)\n    return max_length\n\nfile_paths = []\nfile_paths.extend(glob.glob(os.path.join(\n    dataset_dir, 'train', 'pos', '*.txt'\n)))\nfile_paths.extend(glob.glob(os.path.join(\n    dataset_dir, 'train', 'neg', '*.txt'\n)))\nlongest_sentence_length = find_longest_length(file_paths)\nprint(f\"Longest review length: {longest_sentence_length} words\") <\/pre>\n\n\n\n<p>This gives the output as <strong>2450 words<\/strong>. It seems like the longest review is quite comprehensive.<\/p>\n\n\n\n<p>But not all reviews will be that long. The following function finds the average review length from the training set.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def find_avg_sentence_length(text_file_paths):\n    \"\"\"\n    Find the average sentence in the entire training set. \n\n    :param text_file_paths: List, containing all the text file paths.\n\n    Returns:\n        avg_len: Average length.\n    \"\"\"\n    sentence_lengths = []\n    for path in text_file_paths:\n        with open(path, 'r') as f:\n            text = f.read()\n            # Remove &lt;br> tags.\n            text = re.sub('&lt;[^>]+>+', '', text)\n            corpus = [\n                word for word in text.split()\n            ]\n        sentence_lengths.append(len(corpus))\n    return sum(sentence_lengths)\/len(sentence_lengths)\n\nfile_paths = []\nfile_paths.extend(glob.glob(os.path.join(\n    dataset_dir, 'train', 'pos', '*.txt'\n)))\nfile_paths.extend(glob.glob(os.path.join(\n    data_dir, 'train', 'neg', '*.txt'\n)))\naverage_length = find_avg_sentence_length(file_paths)\nprint(f\"Average review length: {average_length} words\")<\/pre>\n\n\n\n<p>Here, the output is <strong>229.70464 words<\/strong>. As it seems, most of the reviews are short.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Finding Word Frequency in Text Documents<\/h4>\n\n\n\n<p>We need to find the frequency of all the words from the entire training set so that we can choose the top 32000 words.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def find_word_frequency(text_file_paths, most_common=None):\n    \"\"\"\n    Create a list of tuples of the following format,\n    [('ho', 2), ('hello', 1), (\"let's\", 1), ('go', 1)]\n    where the number represents the frequency of occurance of \n    the word in the entire dataset.\n\n    :param text_file_paths: List, containing all the text file paths.\n    :param most_common: Return these many top words from the dataset.\n        If `most_common` is None, return all. If `most_common` is 3,\n        returns the top 3 tuple pairs in the list.\n\n    Returns:\n        sorted_words: A list of tuple containing each word and it's\n        frequency of the format ('ho', 2), ('hello', 1), ...]\n    \"\"\"\n    # Add all the words in the entire dataset to `corpus` list.\n    corpus = []\n    for path in text_file_paths:\n        with open(path, 'r') as f:\n            text = f.read()\n            # Remove &lt;br> tags.\n            text = re.sub('&lt;[^>]+>+', '', text)\n            corpus.extend([\n                word for word in text.split()\n            ])\n    count_words = Counter(corpus)\n    # Create a dictionary with the most common word in the corpus \n    # at the beginning.\n    # `word_frequency` will be like \n    word_frequency = count_words.most_common(n=most_common) # Returns all if n is `None`.\n    return word_frequency<\/pre>\n\n\n\n<p>In case you are wondering, here is a sample input sentence and its sample out.<\/p>\n\n\n\n<p><strong>Input:<\/strong><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">'A SAMPLE SENTENCE...\\n'\n'This notebook trains a Transformer encoder model ...\\n'\n'for text classification using PyTorch Transformer encoder ...\\n'\n'module'<\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">[('Transformer', 2), ('encoder', 2), ('...', 2), ('A', 1), ('SAMPLE', 1), \n('SENTENCE...', 1), ('This', 1), ('notebook', 1), ('trains', 1), ('a', 1), \n('model', 1), ('for', 1), ('text', 1), ('classification', 1), ('using', 1), \n('PyTorch', 1), ('module', 1)]\n<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Assigning Integer Indices to Words<\/h4>\n\n\n\n<p>We will follow a very naive tokenization technique. We will just assign an integer to each of the top 32000 words.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def word2int(input_words, num_words):\n    \"\"\"\n    Create a dictionary of word to integer mapping for each unique word.\n\n    :param input_words: A list of tuples containing the words and \n        theiry frequency. Should be of the following format,\n        [('ho', 2), ('hello', 1), (\"let's\", 1), ('go', 1)]\n    :param num_words: Number of words to use from the `input_words` list \n        to create the mapping. If -1, use all words in the dataset.\n\n    Returns:\n        int_mapping: A dictionary of word and a integer mapping as \n            key-value pair. Example, {'Hello,': 1, 'the': 2, 'let': 3}\n    \"\"\"\n\n    if num_words > -1:\n        int_mapping = {\n            w:i+1 for i, (w, c) in enumerate(input_words) \\\n                if i &lt;= num_words - 1 # -1 to avoid getting (num_words + 1) integer mapping.\n        }\n    else:\n        int_mapping = {w:i+1 for i, (w, c) in enumerate(input_words)}\n    return int_mapping<\/pre>\n\n\n\n<p>For the above example, it will return the following output.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">{'Transformer': 1, 'encoder': 2, '...': 3, 'A': 4, 'SAMPLE': 5, 'SENTENCE...': 6, '\nThis': 7, 'notebook': 8, 'trains': 9, 'a': 10, 'model': 11, 'for': 12, 'text': 13, \n'classification': 14, 'using': 15, 'PyTorch': 16, 'module': 17}<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">The Custom Dataset Class for Training the Transformer Encoder Model<\/h4>\n\n\n\n<p>The following code block defines an entire dataset class.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">class NLPClassificationDataset(Dataset):\n    def __init__(self, file_paths, word_frequency, int_mapping, max_len):\n        self.word_frequency = word_frequency\n        self.int_mapping = int_mapping\n        self.file_paths = file_paths\n        self.max_len = max_len\n\n    def standardize_text(self, input_text):\n        # Convert everything to lower case.\n        text = input_text.lower()\n        # If the text contains HTML tags, remove them.\n        text = re.sub('&lt;[^>]+>+', '', text)\n        # Remove punctuation marks using `string` module.\n        # According to `string`, the following will be removed,\n        # '!\"#$%&amp;\\'()*+,-.\/:;&lt;=>?@[\\\\]^_`{|}~'\n        text = ''.join([\n            character for character in text \\\n                if character not in string.punctuation\n        ])\n        return text\n\n    def return_int_vector(self, int_mapping, text_file_path):\n        \"\"\"\n        Assign an integer to each word and return the integers in a list.\n        \"\"\"\n        with open(text_file_path, 'r') as f:\n            text = f.read()\n            text = self.standardize_text(text)\n            corpus = [\n                word for word in text.split()\n            ] \n        # Each word is replaced by a specific integer.\n        int_vector = [\n            int_mapping[word] for word in text.split() \\\n            if word in int_mapping\n        ]\n        return int_vector\n    \n    def pad_features(self, int_vector, max_len):\n        \"\"\"\n        Return features of `int_vector`, where each vector is padded \n        with 0's or truncated to the input seq_length. Return as Numpy \n        array.\n        \"\"\"\n        features = np.zeros((1, max_len), dtype = int)\n        if len(int_vector) &lt;= max_len:\n            zeros = list(np.zeros(max_len - len(int_vector)))\n            new = zeros + int_vector\n        else:\n            new = int_vector[: max_len]\n        features = np.array(new)\n        return features\n\n    def encode_labels(self, text_file_path):\n        file_path = pathlib.Path(text_file_path)\n        class_label = str(file_path).split(os.path.sep)[-2]\n        if class_label == 'pos':\n            int_label = 1\n        else:\n            int_label = 0\n        return int_label\n\n    def __len__(self):\n        return len(self.file_paths)\n\n    def __getitem__(self, idx):\n        file_path = self.file_paths[idx]\n        int_vector = self.return_int_vector(self.int_mapping, file_path)\n        padded_features = self.pad_features(int_vector, self.max_len)\n        label = self.encode_labels(file_path)\n        return {\n            'text': torch.tensor(padded_features, dtype=torch.int32),\n            'label': torch.tensor(label, dtype=torch.long)\n        }\n<\/pre>\n\n\n\n<p>The above class will clean the text of any HTML tags and return a review and its corresponding label pair. If a review is positive, its label is 1, else it is 0.<\/p>\n\n\n\n<p>If you want to know how to prepare a custom dataset when dealing with CSV files, then take a look at the <strong><a href=\"https:\/\/debuggercafe.com\/disaster-tweet-classification-using-pytorch\/\" target=\"_blank\" rel=\"noreferrer noopener\">Disaster Tweet Classification<\/a><\/strong> article. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Preparing the Datasets and Data Loaders<\/h4>\n\n\n\n<p>The next step is preparing the datasets and the data loaders. For this, first, we need to assemble all the review text files.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># List of all file paths.\nfile_paths = []\nfile_paths.extend(glob.glob(os.path.join(\n    dataset_dir, 'train', 'pos', '*.txt'\n)))\nfile_paths.extend(glob.glob(os.path.join(\n    dataset_dir, 'train', 'neg', '*.txt'\n)))\n\n\ntest_file_paths = []\ntest_file_paths.extend(glob.glob(os.path.join(\n    dataset_dir, 'test', 'pos', '*.txt'\n)))\ntest_file_paths.extend(glob.glob(os.path.join(\n    dataset_dir, 'test', 'neg', '*.txt'\n)))<\/pre>\n\n\n\n<p>Next, we need to find the word frequency for each word and create an integer mapping for the top 32000 words.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Get the frequency of all unqiue words in the dataset.\nword_frequency = find_word_frequency(file_paths)\n# Assign a specific intenger to each word.\nint_mapping = word2int(word_frequency, num_words=NUM_WORDS)<\/pre>\n\n\n\n<p>Next, create the training, validation, and test sets.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">dataset = NLPClassificationDataset(\n    file_paths, word_frequency, int_mapping, MAX_LEN\n)\n\ndataset_size = len(dataset)\n# Calculate the validation dataset size.\nvalid_size = int(VALID_SPLIT*dataset_size)\n# Radomize the data indices.\nindices = torch.randperm(len(dataset)).tolist()\n# Training and validation sets.\ndataset_train = Subset(dataset, indices[:-valid_size])\ndataset_valid = Subset(dataset, indices[-valid_size:])\n\ndataset_test = NLPClassificationDataset(\n    test_file_paths, word_frequency, int_mapping, MAX_LEN\n)\n\n# dataset_valid = NLPClassificationDataset()\nprint(f\"Number of training samples: {len(dataset_train)}\")\nprint(f\"Number of validation samples: {len(dataset_valid)}\")\nprint(f\"Number of test samples: {len(dataset_test)}\")<\/pre>\n\n\n\n<p>As of now, we have <strong>20000 training<\/strong>, <strong>5000 validation<\/strong>, and <strong>25000 test<\/strong> samples respectively.<\/p>\n\n\n\n<p>And finally, we can create the data loaders.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">train_loader = DataLoader(\n    dataset_train, \n    batch_size=BATCH_SIZE,\n    shuffle=True, \n    num_workers=4\n)\n\nvalid_loader = DataLoader(\n    dataset_valid, \n    batch_size=BATCH_SIZE,\n    shuffle=False, \n    num_workers=4\n)\n\ntest_loader = DataLoader(\n    dataset_test, \n    batch_size=BATCH_SIZE,\n    shuffle=False,\n    num_workers=4\n)<\/pre>\n\n\n\n<p>You can increase or decrease the batch size depending on the GPU memory that you have available.<\/p>\n\n\n\n<p>In case you are interested in a more advanced post, take a look at <strong><a href=\"https:\/\/debuggercafe.com\/language-translation-using-pytorch-transformer\/\" target=\"_blank\" rel=\"noreferrer noopener\">Language Translation using PyTorch Transformer<\/a><\/strong>. This blog post takes you through an entire process of translating English sentences to French from scratch.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Counting Correct and Incorrect Predictions During Training<\/h3>\n\n\n\n<p>Now, we need to define a custom function to calculate the total number of correct and incorrect predictions during training.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def count_correct_incorrect(labels, outputs, train_running_correct):\n    # As the outputs are currently logits.\n    outputs = torch.sigmoid(outputs)\n    running_correct = 0\n    for i, label in enumerate(labels):\n        if label &lt; 0.5 and outputs[i] &lt; 0.5:\n            running_correct += 1\n        elif label >= 0.5 and outputs[i] >= 0.5:\n            running_correct += 1\n    return running_correct<\/pre>\n\n\n\n<p>It returns the total number of correct predictions from an entire batch.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Training and Validation Function<\/h3>\n\n\n\n<p>Next are the training and validation functions.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Training function.\ndef train(model, trainloader, optimizer, criterion, device):\n    model.train()\n    print('Training')\n    train_running_loss = 0.0\n    train_running_correct = 0\n    counter = 0\n    for i, data in tqdm(enumerate(trainloader), total=len(trainloader)):\n        counter += 1\n        inputs, labels = data['text'], data['label']\n        inputs = inputs.to(device)\n        labels = torch.tensor(labels, dtype=torch.float32).to(device)\n        optimizer.zero_grad()\n        # Forward pass.\n        outputs = model(inputs)\n        outputs = torch.squeeze(outputs, -1)\n        # Calculate the loss.\n        loss = criterion(outputs, labels)\n        train_running_loss += loss.item()\n        running_correct = count_correct_incorrect(\n            labels, outputs, train_running_correct\n        )\n        train_running_correct += running_correct\n        # Backpropagation.\n        loss.backward()\n        # Update the optimizer parameters.\n        optimizer.step()\n    \n    # Loss and accuracy for the complete epoch.\n    epoch_loss = train_running_loss \/ counter\n    epoch_acc = 100. * (train_running_correct \/ len(trainloader.dataset))\n    return epoch_loss, epoch_acc\n\n# Validation function.\ndef validate(model, testloader, criterion, device):\n    model.eval()\n    print('Validation')\n    valid_running_loss = 0.0\n    valid_running_correct = 0\n    counter = 0\n    \n    with torch.no_grad():\n        for i, data in tqdm(enumerate(testloader), total=len(testloader)):\n            counter += 1\n            inputs, labels = data['text'], data['label']\n            inputs = inputs.to(device)\n            labels = torch.tensor(labels, dtype=torch.float32).to(device)\n            # Forward pass.\n            outputs = model(inputs)\n            outputs = torch.squeeze(outputs, -1)\n            # Calculate the loss.\n            loss = criterion(outputs, labels)\n            valid_running_loss += loss.item()\n            running_correct = count_correct_incorrect(\n                labels, outputs, valid_running_correct\n            )\n            valid_running_correct += running_correct\n        \n    # Loss and accuracy for the complete epoch.\n    epoch_loss = valid_running_loss \/ counter\n    epoch_acc = 100. * (valid_running_correct \/ len(testloader.dataset))\n    return epoch_loss, epoch_acc<\/pre>\n\n\n\n<p>Each of the above functions returns the loss and accuracy for every epoch.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Transformer Encoder Model for Text Classification<\/h3>\n\n\n\n<p>Here comes the most important part of the blog post. We need to define the Transformer Encoder model for text classification. We need to ensure that we connect every component properly to get the best results.<\/p>\n\n\n\n<p>Before preparing the model, we need to define some model parameters.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Model parameters.\nEMBED_DIM = 256\nNUM_ENCODER_LAYERS = 3\nNUM_HEADS = 4<\/pre>\n\n\n\n<p>Our model will consist of <strong>256-dimensional embedding<\/strong>, <strong>3 transformer encoder layers<\/strong>, and <strong>4 heads for multi-head attention<\/strong>.<\/p>\n\n\n\n<p>If you need a summary of the original transformer paper, then take a look at the <strong><a href=\"https:\/\/debuggercafe.com\/transformer-neural-network\/\" target=\"_blank\" rel=\"noreferrer noopener\">Transformer Neural Network<\/a><\/strong> article. It covers all the essential components of the Attention is All You Need paper in a short and concise manner.<\/p>\n\n\n\n<p>Next is the Transformer Encoder Model class.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">class EncoderClassifier(nn.Module):\n    def __init__(self, vocab_size, embed_dim, num_layers, num_heads):\n        super(EncoderClassifier, self).__init__()\n        self.emb = nn.Embedding(vocab_size, embed_dim)\n        self.encoder_layer = nn.TransformerEncoderLayer(\n            d_model=embed_dim, \n            nhead=num_heads, \n            batch_first=True\n        )\n        self.encoder = nn.TransformerEncoder(\n            encoder_layer=self.encoder_layer,\n            num_layers=num_layers,\n        )\n        self.linear = nn.Linear(embed_dim, 1)\n        self.dropout = nn.Dropout(0.2)\n        \n    def forward(self, x):\n        x = self.emb(x)\n        x = self.encoder(x)\n        x = self.dropout(x)\n        x = x.max(dim=1)[0]\n        out = self.linear(x)\n        return out   <\/pre>\n\n\n\n<p>Now, let&#8217;s break it down a bit.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>First, we have an embedding layer that creates 256-dimensional embeddings for each word\/token.<\/li>\n\n\n\n<li>Second, we define a <strong>nn.TransformerEncoderLayer<\/strong>. This accepts the number of heads that should go into each of the Transformer Encoder layers.<\/li>\n\n\n\n<li>Third, the <strong>nn.TransformerEncoder<\/strong>, accepts an object of the Transformer Encoder Layer and how many such layers to include in the entire model.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>Here is the code to initialize the model, print its architecture, and the number of parameters.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n\nmodel = EncoderClassifier(\n    len(int_mapping)+1, \n    embed_dim=EMBED_DIM,\n    num_layers=NUM_ENCODER_LAYERS,\n    num_heads=NUM_HEADS\n).to(device)\nprint(model)\n# Total parameters and trainable parameters.\ntotal_params = sum(p.numel() for p in model.parameters())\nprint(f\"{total_params:,} total parameters.\")\ntotal_trainable_params = sum(\n    p.numel() for p in model.parameters() if p.requires_grad)\nprint(f\"{total_trainable_params:,} training parameters.\\n\")<\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-model.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"779\" height=\"598\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-model.png\" alt=\"The Transformer Encoder text classification model architecture.\" class=\"wp-image-32898\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-model.png 779w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-model-300x230.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-model-768x590.png 768w\" sizes=\"auto, (max-width: 779px) 100vw, 779px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 4. The Transformer Encoder text classification model architecture.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>All in all, we just use the encoder part of the Transformer model for text encoding, attach a linear layer for text classification, and entirely ditch the transformer decoder block.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-decoder-discarded-for-text-classification.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"600\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-decoder-discarded-for-text-classification.png\" alt=\"Transformer Decoder is discarded for text classification.\" class=\"wp-image-32899\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-decoder-discarded-for-text-classification.png 600w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-decoder-discarded-for-text-classification-300x300.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-decoder-discarded-for-text-classification-150x150.png 150w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 5. Transformer Decoder is discarded for text classification.<\/figcaption><\/figure>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\">Defining the Loss Function, Optimizer, and Starting the Training<\/h3>\n\n\n\n<p>The next code block defines the <strong>BCE loss<\/strong> function and the <strong>Adam Optimizer<\/strong>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">criterion = nn.BCEWithLogitsLoss()\noptimizer = optim.Adam(\n    model.parameters(), \n    lr=LR,\n)<\/pre>\n\n\n\n<p>Next, we start the training for the specified number of epochs.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Lists to keep track of losses and accuracies.\ntrain_loss, valid_loss = [], []\ntrain_acc, valid_acc = [], []\nleast_loss = float('inf')\n# Start the training.\nfor epoch in range(EPOCHS):\n    print(f\"[INFO]: Epoch {epoch+1} of {EPOCHS}\")\n    train_epoch_loss, train_epoch_acc = train(model, train_loader, \n                                            optimizer, criterion, device)\n    valid_epoch_loss, valid_epoch_acc = validate(model, valid_loader,  \n                                                criterion, device)\n    train_loss.append(train_epoch_loss)\n    valid_loss.append(valid_epoch_loss)\n    train_acc.append(train_epoch_acc)\n    valid_acc.append(valid_epoch_acc)\n    print(f\"Training loss: {train_epoch_loss}, training acc: {train_epoch_acc}\")\n    print(f\"Validation loss: {valid_epoch_loss}, validation acc: {valid_epoch_acc}\")\n\n    # Save model.\n    if valid_epoch_loss &lt; least_loss:\n        least_loss = valid_epoch_loss\n        print(f\"Saving best model till now... LEAST LOSS {valid_epoch_loss:.3f}\")\n        torch.save(\n            model, os.path.join(OUTPUTS_DIR, 'model.pth')\n        )\n    print('-'*50)<\/pre>\n\n\n\n<p>It saves the model whenever the model reaches a new least loss. So, in the end, we will have the best possible model with us.<\/p>\n\n\n\n<p>The model reached the best results on epoch 29. It achieved a <strong>validation accuracy of 88.039%<\/strong> and a <strong>validation loss of 0.289<\/strong>. These results are very good considering that we do not have a pretrained model. The entire model is trained from scratch.<\/p>\n\n\n\n<p>Here are the accuracy and loss plots from training the Transformer Encoder model.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-accuracy.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"700\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-accuracy.png\" alt=\"Accuracy after training the Transformer Encoder model on the IMDb dataset.\" class=\"wp-image-32901\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-accuracy.png 1000w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-accuracy-300x210.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-accuracy-768x538.png 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 6. Accuracy after training the Transformer Encoder model on the IMDb dataset.<\/figcaption><\/figure>\n<\/div>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-loss.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"700\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-loss.png\" alt=\"Loss after training the Transformer Encoder model on the IMDb dataset.\" class=\"wp-image-32902\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-loss.png 1000w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-loss-300x210.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-loss-768x538.png 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 7. Loss after training the Transformer Encoder model on the IMDb dataset.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>It looks like if we apply a learning rate scheduler, we can train the model for a bit longer.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Test and Inference<\/h2>\n\n\n\n<p>Now, let&#8217;s load the best model that we have and run it on the test dataset that we prepared earlier.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">trained_model = torch.load(\n    os.path.join(OUTPUTS_DIR, 'model.pth')\n)\n\ntest_loss, test_acc = validate(\n    trained_model, \n    test_loader,  \n    criterion, \n    device\n)\nprint(f\"Test loss: {test_loss}, test acc: {test_acc}\")<\/pre>\n\n\n\n<p>The <strong>test accuracy is 87.288%<\/strong> and the <strong>test loss is 0.297<\/strong>. Both of them are very close to the best validation accuracy and loss respectively.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Inference<\/h3>\n\n\n\n<p>For inference, there are a few text file samples in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">inference_data<\/code> directory. You can add more if you wish. As of now, one is a positive review and one is a negative review.<\/p>\n\n\n\n<p>Let&#8217;s read the files and store the reviews in a list.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># A few real-life reviews taken from the internet.\nsentences = []\ninfer_dir = 'inference_data\/'\ninfer_data = os.listdir(infer_dir)\nfor data in infer_data:\n    f = open(f\"{infer_dir}\/{data}\", 'r')\n    sentences.append(f.read())<\/pre>\n\n\n\n<p>There is a need for two helper functions here.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>One is to convert the text to integer vectors.<\/li>\n\n\n\n<li>Another is for padding the short vectors to the desired length.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def return_int_vector(int_mapping, text):\n    \"\"\"\n    Assign an integer to each word and return the integers in a list.\n    \"\"\"\n    corpus = [\n        word for word in text.split()\n    ] \n    # Each word is replaced by a specific integer.\n    int_vector = [\n        int_mapping[word] for word in text.split() \\\n        if word in int_mapping\n    ]\n    return int_vector\n\ndef pad_features(int_vector, max_len):\n    \"\"\"\n    Return features of `int_vector`, where each vector is padded \n    with 0's or truncated to the input seq_length. Return as Numpy \n    array.\n    \"\"\"\n    features = np.zeros((1, max_len), dtype = int)\n    if len(int_vector) &lt;= max_len:\n        zeros = list(np.zeros(max_len - len(int_vector)))\n        new = zeros + int_vector\n    else:\n        new = int_vector[: max_len]\n    features = np.array(new)\n    return features<\/pre>\n\n\n\n<p>Finally, we loop over the sentence and carry out the inference.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">for sentence in sentences:\n    int_vector = return_int_vector(int_mapping, sentence)\n    padded_features = pad_features(int_vector, int(longest_sentence_length))\n    input_tensor = torch.tensor(padded_features, dtype=torch.int32)\n    input_tensor = input_tensor.unsqueeze(0)\n    with torch.no_grad():\n        output = trained_model(input_tensor.to(device))\n    preds = torch.sigmoid(output)\n    print(sentence)\n    print(f\"Score: {preds.cpu().numpy()[0][0]}\")\n    if preds > 0.5:\n        print('Prediction: POSITIVE')\n    else:\n        print('Prediction: NEGATIVE')\n    print('\\n')<\/pre>\n\n\n\n<p>Following are the results.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-inference-results.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1162\" height=\"180\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-inference-results.png\" alt=\"Inference results using the trained Transformer Encoder model.\" class=\"wp-image-32904\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-inference-results.png 1162w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-inference-results-300x46.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/transformer-encoder-text-classification-inference-results-768x119.png 768w\" sizes=\"auto, (max-width: 1162px) 100vw, 1162px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 8. Inference results using the trained Transformer Encoder model.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>It seems that the trained Transformer Encoder model is able to classify both reviews correctly.<\/p>\n\n\n\n<p>For further experiments, you can play with the parameters while creating the model and see how it performs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary and Conclusion<\/h2>\n\n\n\n<p>In this blog post, we trained a Transformer Encoder model for text classification. We went through the entire process of dataset and model preparation. This gave us an idea of how to use the encoder part of the Transformer architecture creating a high accuracy classifier. I hope that this blog post was worth your time.<\/p>\n\n\n\n<p>If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.<\/p>\n\n\n\n<p>You can contact me using the <strong><a aria-label=\"Contact (opens in a new tab)\" href=\"https:\/\/debuggercafe.com\/contact-us\/\" target=\"_blank\" rel=\"noreferrer noopener\">Contact<\/a><\/strong> section. You can also find me on <strong><a aria-label=\"LinkedIn (opens in a new tab)\" href=\"https:\/\/www.linkedin.com\/in\/sovit-rath\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a><\/strong>, and <strong><a href=\"https:\/\/x.com\/SovitRath5\" target=\"_blank\" rel=\"noreferrer noopener\">X<\/a><\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we build a text classification model using the PyTorch Transformer Encoder and train it on the IMDb movie review dataset.<\/p>\n","protected":false},"author":1,"featured_media":32908,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[530,410,434],"tags":[615,611,612,610,614,613,595,609],"class_list":["post-32830","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-neural-attention","category-text-classification","category-transformer","tag-imdb-classification-using-transformer-encoder","tag-nn-transformerencoder","tag-nn-transformerencoderlayer","tag-pytorch-transformer-encoder","tag-pytorch-transformer-encoder-text-classification","tag-text-classification-using-transformer-encoder","tag-transformer-encoder","tag-transformer-encoder-for-text-classification"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Text Classification using Transformer Encoder in PyTorch<\/title>\n<meta name=\"description\" content=\"Text classification using Transformer Encoder on the IMDb movie review dataset using the PyTorch deep learning framework.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Text Classification using Transformer Encoder in PyTorch\" \/>\n<meta property=\"og:description\" content=\"Text classification using Transformer Encoder on the IMDb movie review dataset using the PyTorch deep learning framework.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/\" \/>\n<meta property=\"og:site_name\" content=\"DebuggerCafe\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/profile.php?id=100013731104496\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-06T00:30:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-09-15T15:45:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"563\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Sovit Ranjan Rath\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:site\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sovit Ranjan Rath\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/\"},\"author\":{\"name\":\"Sovit Ranjan Rath\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"headline\":\"Text Classification using Transformer Encoder in PyTorch\",\"datePublished\":\"2023-11-06T00:30:00+00:00\",\"dateModified\":\"2024-09-15T15:45:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/\"},\"wordCount\":2017,\"commentCount\":0,\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png\",\"keywords\":[\"IMDb Classification using Transformer Encoder\",\"nn.TransformerEncoder\",\"nn.TransformerEncoderLayer\",\"PyTorch Transformer Encoder\",\"PyTorch Transformer Encoder Text Classification\",\"Text Classification using Transformer Encoder\",\"Transformer Encoder\",\"Transformer Encoder for Text Classification\"],\"articleSection\":[\"Neural Attention\",\"Text Classification\",\"Transformer\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/\",\"url\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/\",\"name\":\"Text Classification using Transformer Encoder in PyTorch\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png\",\"datePublished\":\"2023-11-06T00:30:00+00:00\",\"dateModified\":\"2024-09-15T15:45:37+00:00\",\"author\":{\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"description\":\"Text classification using Transformer Encoder on the IMDb movie review dataset using the PyTorch deep learning framework.\",\"breadcrumb\":{\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#primaryimage\",\"url\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png\",\"contentUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png\",\"width\":1000,\"height\":563,\"caption\":\"Text Classification using Transformer Encoder in PyTorch\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/debuggercafe.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Text Classification using Transformer Encoder in PyTorch\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/debuggercafe.com\/#website\",\"url\":\"https:\/\/debuggercafe.com\/\",\"name\":\"DebuggerCafe\",\"description\":\"Machine Learning and Deep Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/debuggercafe.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\",\"name\":\"Sovit Ranjan Rath\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"caption\":\"Sovit Ranjan Rath\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Text Classification using Transformer Encoder in PyTorch","description":"Text classification using Transformer Encoder on the IMDb movie review dataset using the PyTorch deep learning framework.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/","og_locale":"en_US","og_type":"article","og_title":"Text Classification using Transformer Encoder in PyTorch","og_description":"Text classification using Transformer Encoder on the IMDb movie review dataset using the PyTorch deep learning framework.","og_url":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/","og_site_name":"DebuggerCafe","article_publisher":"https:\/\/www.facebook.com\/profile.php?id=100013731104496","article_published_time":"2023-11-06T00:30:00+00:00","article_modified_time":"2024-09-15T15:45:37+00:00","og_image":[{"width":1000,"height":563,"url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png","type":"image\/png"}],"author":"Sovit Ranjan Rath","twitter_card":"summary_large_image","twitter_creator":"@SovitRath5","twitter_site":"@SovitRath5","twitter_misc":{"Written by":"Sovit Ranjan Rath","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#article","isPartOf":{"@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/"},"author":{"name":"Sovit Ranjan Rath","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"headline":"Text Classification using Transformer Encoder in PyTorch","datePublished":"2023-11-06T00:30:00+00:00","dateModified":"2024-09-15T15:45:37+00:00","mainEntityOfPage":{"@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/"},"wordCount":2017,"commentCount":0,"image":{"@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png","keywords":["IMDb Classification using Transformer Encoder","nn.TransformerEncoder","nn.TransformerEncoderLayer","PyTorch Transformer Encoder","PyTorch Transformer Encoder Text Classification","Text Classification using Transformer Encoder","Transformer Encoder","Transformer Encoder for Text Classification"],"articleSection":["Neural Attention","Text Classification","Transformer"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/","url":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/","name":"Text Classification using Transformer Encoder in PyTorch","isPartOf":{"@id":"https:\/\/debuggercafe.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#primaryimage"},"image":{"@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png","datePublished":"2023-11-06T00:30:00+00:00","dateModified":"2024-09-15T15:45:37+00:00","author":{"@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"description":"Text classification using Transformer Encoder on the IMDb movie review dataset using the PyTorch deep learning framework.","breadcrumb":{"@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#primaryimage","url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png","contentUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/09\/Text-Classification-using-Transformer-Encoder-in-PyTorch-1-e1695173137995.png","width":1000,"height":563,"caption":"Text Classification using Transformer Encoder in PyTorch"},{"@type":"BreadcrumbList","@id":"https:\/\/debuggercafe.com\/text-classification-using-transformer-encoder-in-pytorch\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/debuggercafe.com\/"},{"@type":"ListItem","position":2,"name":"Text Classification using Transformer Encoder in PyTorch"}]},{"@type":"WebSite","@id":"https:\/\/debuggercafe.com\/#website","url":"https:\/\/debuggercafe.com\/","name":"DebuggerCafe","description":"Machine Learning and Deep Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/debuggercafe.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752","name":"Sovit Ranjan Rath","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","caption":"Sovit Ranjan Rath"}}]}},"_links":{"self":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/32830","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/comments?post=32830"}],"version-history":[{"count":80,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/32830\/revisions"}],"predecessor-version":[{"id":38124,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/32830\/revisions\/38124"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media\/32908"}],"wp:attachment":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media?parent=32830"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/categories?post=32830"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/tags?post=32830"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}