{"id":31647,"date":"2023-08-21T06:00:00","date_gmt":"2023-08-21T00:30:00","guid":{"rendered":"https:\/\/debuggercafe.com\/?p=31647"},"modified":"2024-09-15T21:10:49","modified_gmt":"2024-09-15T15:40:49","slug":"traffic-sign-detection-using-detr","status":"publish","type":"post","link":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/","title":{"rendered":"Traffic Sign Detection using DETR"},"content":{"rendered":"\n<p>In this article, we will create a small proof of concept for traffic sign detection. We will use the <strong>DETR object detection model<\/strong> in particular for <strong>traffic sign detection<\/strong>. We will use a very small dataset. Also, we will entirely focus on the practical steps that we take to get the best results. <\/p>\n\n\n\n<div class=\"wp-block-buttons is-horizontal is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-499968f5 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--1\"><a class=\"wp-block-button__link has-black-color has-luminous-vivid-orange-background-color has-text-color has-background wp-element-button\" href=\"#download-code\"><strong>Jump to Download Code<\/strong><\/a><\/div>\n<\/div>\n\n\n\n<p><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-detr-example-output.gif\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"540\" height=\"368\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-detr-example-output.gif\" alt=\"An example of traffic sign detection using DETR.\" class=\"wp-image-31678\"\/><\/a><figcaption class=\"wp-element-caption\">Figure 1. An example of traffic sign detection using DETR.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Traffic sign detection is an important problem in both autonomous driving and traffic surveillance. Object detection methods help in automating the process to a good extent. But solving it on a large scale requires a huge amount of data, engineering, and proper pipelines. In this article, however, we will train a few DETR object detection models on a small-scale traffic sign detection dataset. This will help us uncover how easy or difficult the process is.<\/p>\n\n\n\n<p><strong><em>We will cover the following points in this article:<\/em><\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>We will start with a discussion of the Tiny LISA Traffic Sign detection dataset.<\/em><\/li>\n\n\n\n<li><em>Then we will discuss the DETR (Detection Transformer) models that we will train on the traffic sign detection dataset.<\/em><\/li>\n\n\n\n<li><em>While discussing the technical part, we will particularly focus on dataset preparation and data augmentation techniques.<\/em><\/li>\n\n\n\n<li><em>After training, we will analyze the results of each training experiment.<\/em><\/li>\n\n\n\n<li><em>Next, we will run inference on a video to check the real-time performance of the trained DETR model.<\/em><\/li>\n\n\n\n<li><em>Finally, we will discuss a few points that will improve the project further.<\/em><\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Tiny LISA Traffic Sign Detection Dataset<\/h2>\n\n\n\n<p>We will use the <strong><a href=\"https:\/\/www.kaggle.com\/datasets\/mmontiel\/tiny-lisa-traffic-sign-detection-dataset\" target=\"_blank\" rel=\"noreferrer noopener\">Tiny LISA Traffic Sign Detection Dataset<\/a><\/strong> that is available on Kaggle. This is a subset of the larger LISA traffic sign detection dataset.<\/p>\n\n\n\n<p>There are only 900 images in this version of the dataset with a CSV annotation file. As the training codebase that we will use requires the annotations in XML format, we will do some preprocessing in the next section.<\/p>\n\n\n\n<p>The dataset contains 9 object classes in this tiny version of the dataset.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>keepRight<\/li>\n\n\n\n<li>merge<\/li>\n\n\n\n<li>pedestrianCrossing<\/li>\n\n\n\n<li>signalAhead<\/li>\n\n\n\n<li>speedLimit25<\/li>\n\n\n\n<li>speedLimit35<\/li>\n\n\n\n<li>stop<\/li>\n\n\n\n<li>yield<\/li>\n\n\n\n<li>yeildAhead<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>For now, you may go ahead and download the dataset. After extracting it, you will find the following structure of the original dataset:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">db_lisa_tiny\/\n\u251c\u2500\u2500 annotations.csv\n\u251c\u2500\u2500 sample_001.png\n\u251c\u2500\u2500 sample_002.png\n\u251c\u2500\u2500 sample_003.png\n\u251c\u2500\u2500 sample_004.png\n\u251c\u2500\u2500 sample_005.png\n\u251c\u2500\u2500 sample_006.png\n...\n\u251c\u2500\u2500 sample_899.png\n\u2514\u2500\u2500 sample_900.png<\/pre>\n\n\n\n<p>There are 900 images and one annotation file in the CSV format. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-detr-tiny-lisa-annotation.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"276\" height=\"117\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-detr-tiny-lisa-annotation.png\" alt=\"CSV file annotations format of the traffic sign dataset.\" class=\"wp-image-31681\"\/><\/a><figcaption class=\"wp-element-caption\">Figure 2. CSV file annotations format of the traffic sign dataset.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>As we can see, there are six columns. The first is the filename of the image in the directory. The next four are the bounding box coordinates in <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">&lt;xmin, ymin, xmax, ymax&gt;<\/code> format. And the final one is the class name. However, there is one problematic thing in the dataset. We can see that there is only one annotation per image. This can lead to performance issues while training where the model only learns to detect one object in an image.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Analyzing the Tiny LISA Traffic Sign Detection Images with Annotations<\/h3>\n\n\n\n<p>Let&#8217;s take a look at a few images and annotations from the dataset.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-speedlimit25.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"467\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-speedlimit25.png\" alt=\"Traffic sign dataset &quot;speedLimit25&quot; ground truth images.\" class=\"wp-image-31683\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-speedlimit25.png 1200w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-speedlimit25-300x117.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-speedlimit25-768x299.png 768w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 3. Traffic sign dataset &#8220;speedLimit25&#8221; ground truth images.<\/figcaption><\/figure>\n<\/div>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-yield-and-merge.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"428\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-yield-and-merge.png\" alt=\"Traffic sign dataset with &quot;Yieldahead&quot; and &quot;merge&quot; ground truth images.\" class=\"wp-image-31684\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-yield-and-merge.png 1200w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-yield-and-merge-300x107.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-yield-and-merge-768x274.png 768w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 4. Traffic sign dataset with &#8220;Yieldahead&#8221; and &#8220;merge&#8221; ground truth images.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The images look quite challenging. As all the videos were captured from a dashcam, the frames are blurry, pixelated, and have different lighting depending on the time of the day. It will be a good challenge for the DETR model.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Project Directory Structure<\/h2>\n\n\n\n<p>Now, let&#8217;s take a look at the entire project directory structure.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\u251c\u2500\u2500 input\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 db_lisa_tiny\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 inference_data\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 train_annots\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 train_images\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 valid_annots\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 valid_images\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 train.csv\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 valid.csv\n\u251c\u2500\u2500 notebooks\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 visualizations.ipynb\n\u251c\u2500\u2500 vision_transformers\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 data\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 examples\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 example_test_data\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 readme_images\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 runs\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 tools\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 vision_transformers\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 LICENSE\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 requirements.txt\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 setup.py\n\u251c\u2500\u2500 csv_to_xml.py\n\u2514\u2500\u2500 split_data.py<\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The input directory contains all the data that we need. The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">db_lisa_tiny<\/code> is the original dataset directory that we saw above with 900 images and one CSV file. However, it does not contain any splits. So, after creating the splits and the XML annotation files, we will get <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">train_images<\/code>, <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">train_annots<\/code>, <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">valid_images<\/code>, and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">valid_annots<\/code> directories. Before we create these directories, we will also create the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">train.csv<\/code> and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">valid.csv<\/code> files. <\/li>\n\n\n\n<li>The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">notebooks<\/code> directory contains a Jupyter Notebook that we can use for visualizing the images and annotations.<\/li>\n\n\n\n<li>Then we have the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers<\/code> directory. This is a GitHub repository that we will later clone. This contains a lot of vision transformer models for classification and detection. I have been maintaining this project for a few months now and will expand it in the future.<\/li>\n\n\n\n<li>Directly inside the parent project directory, we have the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">csv_to_xml.py<\/code> and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">split_data.py<\/code> Python files. The former converts the CSV data to XML format and the latter creates the training and validation CSV files along with copying the split images into their new directories. We will use these later to obtain the complete dataset.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-background\" style=\"background-color:#ffb76a\"><strong><em>The trained weights and the data conversion files will be available via the download section of this article. If you wish to just run inference, please download the trained zip file which contains the trained weights. If you plan on training the model as well, please download the dataset and we will create the additional files and folders in the following sections.<\/em><\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"download-code\">Download Code<\/h3>\n\n\n\n<div class=\"wp-block-button is-style-outline center\"><a data-sumome-listbuilder-id=\"926d7bdd-d0a8-43b8-9886-1ff1ad9896dc\" class=\"wp-block-button__link has-black-color has-luminous-vivid-orange-background-color has-text-color has-background\"><b>Download the Source Code for this Tutorial<\/b><\/a><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">The Vision Transformers Repository<\/h2>\n\n\n\n<p>Please go ahead and download the content for this article. After extracting it, enter the directory and clone the <strong><a href=\"https:\/\/github.com\/sovit-123\/vision_transformers\" target=\"_blank\" rel=\"noreferrer noopener\">Vision Transformers repository<\/a><\/strong>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">git clone https:\/\/github.com\/sovit-123\/vision_transformers.git<\/pre>\n\n\n\n<p>After cloning the repository, make it the current directory and install the library.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">cd vision_transformers<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pip install .<\/pre>\n\n\n\n<p>The above will install the repository as a library and also install PyTorch as a dependency.<\/p>\n\n\n\n<p>In case you face PyTorch and CUDA issues, please install PyTorch 2.0.1 (the latest at the time of writing this) using the following command.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Installing Dependencies<\/h3>\n\n\n\n<p>Next, we need to install a few mandatory libraries. These are <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">torchinfo<\/code> and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">pycocotools<\/code>. The requirements file inside the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers<\/code> directory will handle it.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pip install -r requirements.txt<\/pre>\n\n\n\n<p>The above are the most important installation steps. In case you are missing any other libraries along the way, please install them.<\/p>\n\n\n\n<p><strong><em>Note: If you plan on running inference only, after downloading and extracting the content for this post, please copy the content of <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">trained_weights<\/code> directory into <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers\/runs\/training<\/code> directory. You may need to create <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">runs\/training<\/code> directory as it only gets created after the first training experiment.<\/em><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Small Scale Traffic Sign Detection using DETR<\/h2>\n\n\n\n<p>From here on, we will start the coding part. <\/p>\n\n\n\n<p>The first thing that we need to do is create the training and validation CSV files, the data splits, and the XML files.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Preparing the Dataset<\/h3>\n\n\n\n<p><strong><em>Note: The commands in this section (dataset preparation) should be executed within the parent project directory and not within the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers<\/code> directory.<\/em><\/strong><\/p>\n\n\n\n<p>The very first step is creating the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">train_images<\/code> &amp; <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">valid_images<\/code> directories, and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">train.csv<\/code> &amp; and <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">valid.csv<\/code> files in the input directory.<\/p>\n\n\n\n<p>For this, we need to execute the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">split_data.py<\/code> script.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python split_data.py<\/pre>\n\n\n\n<p>After executing the command, please take a look in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">input<\/code> directory to ensure the new CSV files and directories are present.<\/p>\n\n\n\n<p>The next step is creating the XML files. For this, we need to execute the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">csv_to_xml.py<\/code> script twice. Once for the training data and once for the validation data.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python csv_to_xml.py --input input\/train.csv --output input\/train_annots<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python csv_to_xml.py --input input\/valid.csv --output input\/valid_annots<\/pre>\n\n\n\n<p>In the above commands, we use the following command line arguments:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--input<\/code>: Path to the CSV file.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--output<\/code>: Path to the directory where we want to store the XML files.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>After this, we have all the files and folders in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">input<\/code> directory as we observed in the directory tree structure. We can now move to the deep learning part of the project.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Preparing the Dataset YAML File<\/h3>\n\n\n\n<p>The YAML file inside <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers\/data<\/code> directory contains all the information related to the dataset. This includes the paths to the image and label directories, the class names, and the number of classes.<\/p>\n\n\n\n<p>For training the DETR model on the traffic sign detection dataset, <strong>we have the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">lisa_traffic.yaml<\/code> file<\/strong>. Here are its contents:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"yaml\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"lisa_traffic.yaml\" data-enlighter-group=\"lisa_traffic_1\"># Images and labels direcotry should be relative to train.py\nTRAIN_DIR_IMAGES: '..\/input\/train_images'\nTRAIN_DIR_LABELS: '..\/input\/train_annots'\nVALID_DIR_IMAGES: '..\/input\/valid_images'\nVALID_DIR_LABELS: '..\/input\/valid_annots'\n\n# Class names.\nCLASSES: [\n    '__background__',\n    'keepRight',\n    'merge',\n    'pedestrianCrossing',\n    'signalAhead',\n    'speedLimit25',\n    'speedLimit35',\n    'stop',\n    'yield',\n    'yieldAhead'\n]\n\n# Number of classes (object classes + 1 for background).\nNC: 10\n\n# Whether to save the predictions of the validation set while training.\nSAVE_VALID_PREDICTION_IMAGES: True<\/pre>\n\n\n\n<p>As we will execute the training script within the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers<\/code> directory, all the paths to relative to that.<\/p>\n\n\n\n<p>After the paths, we have the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">CLASSES<\/code> attribute which includes all the object classes along with the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">__background__<\/code> class. The total number of classes, <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">NC<\/code>, is the total number of classes including the background class.<\/p>\n\n\n\n<p>You will find this file within the downloaded zip file as well. You just need to copy it into the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers\/data<\/code> directory.<\/p>\n\n\n\n<p>Let&#8217;s get into the training part now.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Training the DETR ResNet50 with Augmentation<\/h3>\n\n\n\n<p>We will start with training the <strong><a href=\"https:\/\/debuggercafe.com\/detr\/\" target=\"_blank\" rel=\"noreferrer noopener\">DETR ResNet50<\/a><\/strong> model which is the smallest among the four models available. We will train it with a lot of augmentations to avoid overfitting.<\/p>\n\n\n\n<p>Most of the heavy lifting for <strong>dataset preparation is done by the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">datasets.py<\/code><\/strong> file inside <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers\/tool\/utils<\/code> directory.  <\/p>\n\n\n\n<p>We will pass a command line argument to <strong><a href=\"https:\/\/debuggercafe.com\/applying-different-augmentations-to-bounding-boxes-in-object-detection-using-albumentations\/\" target=\"_blank\" rel=\"noreferrer noopener\">augment the images<\/a><\/strong> while training. The script uses the following augmentations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Any one of Blur, MotionBlur, or MedianBlur<\/li>\n\n\n\n<li>ToGray<\/li>\n\n\n\n<li>RandomBrightnessContrast<\/li>\n\n\n\n<li>ColorJitter<\/li>\n\n\n\n<li>RandomGamma<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>Let&#8217;s take a look at a few images to observe how they look after applying augmentations.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-augmented-images.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"600\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-augmented-images.png\" alt=\"Traffic sign dataset with augmented images that are used for training.\" class=\"wp-image-31687\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-augmented-images.png 1200w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-augmented-images-300x150.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-augmented-images-768x384.png 768w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 5. Traffic sign dataset with augmented images that are used for training.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>We can see that the augmentations add quite a lot of variability to the images. This will surely help in preventing overfitting.<\/p>\n\n\n\n<p>Now, let&#8217;s execute the training command within the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers<\/code> library.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python tools\/train_detector.py --data data\/lisa_traffic.yaml --epochs 75 --name detr_resnet50_75e --use-train-aug<\/pre>\n\n\n\n<p>The following are all the command line arguments that we use:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--data<\/code>: The path to the dataset YAML file.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--epochs<\/code>: The number of epochs to train for.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--name<\/code>: This is the directory name where all the results will be saved. For this training, it will be <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">runs\/training\/detr_resnet50_75e<\/code><\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--use-train-aug<\/code>: This is a boolean argument indicating to use the training augmentations.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>By default, the model is DETR ResNet50. So, we do not need to pass any model name here.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Analyzing the DETR ResNet50 Training Results<\/h3>\n\n\n\n<p>The following block shows the validation results from the best training epoch from the terminal. We track the mAP metric for object detection.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Test:  [ 0\/23]  eta: 0:00:05  class_error: 0.00  loss: 0.4392 (0.4392)  loss_ce: 0.0236 (0.0236)  loss_bbox: 0.0504 (0.0504)  loss_giou: 0.3651 (0.3651)  loss_ce_unscaled: 0.0236 (0.0236)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0101 (0.0101)  loss_giou_unscaled: 0.1825 (0.1825)  cardinality_error_unscaled: 0.0000 (0.0000)  time: 0.2340  data: 0.1748  max mem: 2204\nTest:  [22\/23]  eta: 0:00:00  class_error: 50.00  loss: 0.4169 (0.4326)  loss_ce: 0.0266 (0.0480)  loss_bbox: 0.0479 (0.0546)  loss_giou: 0.2933 (0.3300)  loss_ce_unscaled: 0.0266 (0.0480)  class_error_unscaled: 0.0000 (11.9565)  loss_bbox_unscaled: 0.0096 (0.0109)  loss_giou_unscaled: 0.1466 (0.1650)  cardinality_error_unscaled: 0.5000 (0.4130)  time: 0.0579  data: 0.0057  max mem: 2204\nTest: Total time: 0:00:01 (0.0677 s \/ it)\nAveraged stats: class_error: 50.00  loss: 0.4169 (0.4326)  loss_ce: 0.0266 (0.0480)  loss_bbox: 0.0479 (0.0546)  loss_giou: 0.2933 (0.3300)  loss_ce_unscaled: 0.0266 (0.0480)  class_error_unscaled: 0.0000 (11.9565)  loss_bbox_unscaled: 0.0096 (0.0109)  loss_giou_unscaled: 0.1466 (0.1650)  cardinality_error_unscaled: 0.5000 (0.4130)\nAccumulating evaluation results...\nDONE (t=0.07s).\nIoU metric: bbox\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.547\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.815\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.633\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.443\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.565\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.714\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.643\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.716\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.725\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.733\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.892\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.716\n\nBEST VALIDATION mAP: 0.5465645512672217\n\nSAVING BEST MODEL FOR EPOCH: 71<\/pre>\n\n\n\n<p>The best model was saved after epoch number 72 (epoch index starts from 0). We have the <strong>best validation mAP of 54.7%<\/strong> using the DETR ResNet50 model.<\/p>\n\n\n\n<p>Let&#8217;s take a look at the mAP graph.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-map.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"700\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-map.png\" alt=\"mAP graph after training the DETR ResNet50 on the traffic sign detection dataset.\" class=\"wp-image-31689\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-map.png 1000w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-map-300x210.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-map-768x538.png 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 6. mAP graph after training the DETR ResNet50 on the traffic sign detection dataset.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>It looks like we could have trained for a few more epochs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Training the DETR ResNet50 DC5 with Augmentation on the Traffic Sign Dataset<\/h3>\n\n\n\n<p>We will train another model here. It is the DETR ResNet50 DC5 model which is slightly better compared to the previous model. Basically, it uses dilated convolution so the features maps are double the size compared to the DETR ResNet50 model.<\/p>\n\n\n\n<p>We just need one additional command line argument in this case.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python tools\/train_detector.py --model detr_resnet50_dc5 --data data\/lisa_traffic.yaml --epochs 75 --name detr_resnet50_dc5_75e --use-train-aug<\/pre>\n\n\n\n<p>This time we are passing a <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--model<\/code> argument whose value is <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">detr_resnet50_dc5<\/code>. Also, the result directory <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--name<\/code> is different.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Analyzing the DETR ResNet50 DC5 Training Results<\/h3>\n\n\n\n<p>The DETR ResNet50 DC5 model reaches the best mAP on epoch number 62.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Test:  [ 0\/23]  eta: 0:00:06  class_error: 0.00  loss: 0.4462 (0.4462)  loss_ce: 0.0096 (0.0096)  loss_bbox: 0.0479 (0.0479)  loss_giou: 0.3887 (0.3887)  loss_ce_unscaled: 0.0096 (0.0096)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0096 (0.0096)  loss_giou_unscaled: 0.1943 (0.1943)  cardinality_error_unscaled: 0.2500 (0.2500)  time: 0.2933  data: 0.1880  max mem: 7368\nTest:  [22\/23]  eta: 0:00:00  class_error: 0.00  loss: 0.3595 (0.3595)  loss_ce: 0.0090 (0.0200)  loss_bbox: 0.0370 (0.0427)  loss_giou: 0.2867 (0.2968)  loss_ce_unscaled: 0.0090 (0.0200)  class_error_unscaled: 0.0000 (5.4348)  loss_bbox_unscaled: 0.0074 (0.0085)  loss_giou_unscaled: 0.1434 (0.1484)  cardinality_error_unscaled: 0.2500 (0.3913)  time: 0.1007  data: 0.0061  max mem: 7368\nTest: Total time: 0:00:02 (0.1111 s \/ it)\nAveraged stats: class_error: 0.00  loss: 0.3595 (0.3595)  loss_ce: 0.0090 (0.0200)  loss_bbox: 0.0370 (0.0427)  loss_giou: 0.2867 (0.2968)  loss_ce_unscaled: 0.0090 (0.0200)  class_error_unscaled: 0.0000 (5.4348)  loss_bbox_unscaled: 0.0074 (0.0085)  loss_giou_unscaled: 0.1434 (0.1484)  cardinality_error_unscaled: 0.2500 (0.3913)\nAccumulating evaluation results...\nDONE (t=0.08s).\nIoU metric: bbox\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.629\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.881\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.708\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.617\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.711\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.761\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.700\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.760\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.768\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.800\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.875\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.762\n\nBEST VALIDATION mAP: 0.6292527183321788\n\nSAVING BEST MODEL FOR EPOCH: 61<\/pre>\n\n\n\n<p>This time, we have a <strong>much higher mAP of 62.9%<\/strong> on the traffic sign detection dataset.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-dc5-map.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"700\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-dc5-map.png\" alt=\"mAP graph after training the DETR ResNet50 DC5 on the traffic sign detection dataset.\" class=\"wp-image-31690\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-dc5-map.png 1000w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-dc5-map-300x210.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-light-detection-detr-resnet50-dc5-map-768x538.png 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 7. mAP graph after training the DETR ResNet50 DC5 on the traffic sign detection dataset.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>It is clearly visible that the model has already started overfitting. So, training for any longer would not be beneficial.<\/p>\n\n\n\n<p>As the DETR ResNet50 DC5 model has a better <strong><a href=\"https:\/\/debuggercafe.com\/evaluation-metrics-for-object-detection\/\" target=\"_blank\" rel=\"noreferrer noopener\">object detection metric<\/a><\/strong>, we will use this model for inference next.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Traffic Sign Detection Video Inference<\/h2>\n\n\n\n<p>As we have the best weights for the DETR ResNet50 DC5 model, we will use that for video inference. <\/p>\n\n\n\n<p>The video that we will use has been generated from the original full set of LISA Traffic Sign Detection data. The frames were combined at 15 FPS to create a small video. You will find that within the downloadable zip file that comes with this article.<\/p>\n\n\n\n<p>We will use <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">inference_video_detect.py<\/code> script inside the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">tools<\/code> directory to run the inference. We will execute the following command within the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">vision_transformers<\/code> directory.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python tools\/inference_video_detect.py --input ..\/input\/inference_data\/video_1.mp4 --weights runs\/training\/detr_resnet50_dc5_75e\/best_model.pth --show --imgsz 640<\/pre>\n\n\n\n<p>The following are the command line arguments that we use:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--input<\/code>: The path to the input video.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--weights<\/code>: Path to the best trained weights.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--show<\/code>: This is a boolean argument indicating that we want to visualize the inference on screen.<\/li>\n\n\n\n<li><code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">--imgsz<\/code>: This takes in an integer and the images will be resized to the square size of whatever value we pass. So, in this case, the images will be resized to 640&#215;640 resolution.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>The results will be saved in <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">runs\/inference<\/code> directory.<\/p>\n\n\n\n<p><strong><em>Note: There are some repeated frames in the original dataset.<\/em><\/strong><\/p>\n\n\n\n<figure class=\"wp-block-video aligncenter\"><video height=\"480\" style=\"aspect-ratio: 704 \/ 480;\" width=\"704\" autoplay controls loop muted src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/traffic-sign-detection-using-detr-video-inference.mp4\"><\/video><figcaption class=\"wp-element-caption\">Clip 1. Traffic sign detection in video using DETR.<\/figcaption><\/figure>\n\n\n\n<p>As we can see, the model is able to detect most of the objects correctly. However, there are a few wrong detections. For example, wherever there are signs of a speed limit other than 25 or 35, the model is detecting them as either <span style=\"font-size: revert; color: initial;\"><em>speedLimit25<\/em><\/span> or <span style=\"font-size: revert; color: initial;\"><em>speedLimit35<\/em><\/span>. This is not entirely the model&#8217;s fault as it never got to see those objects and annotations during training.<\/p>\n\n\n\n<p>Other than that, the model is able to detect multiple objects in a scene even though no multiple annotations were present in a single image.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Further Improvements<\/h2>\n\n\n\n<p>To make the training process even more robust, we can train it on the entire LISA Traffic Sign Detection dataset which contains more than 6000 images and 49 classes.<\/p>\n\n\n\n<p>We will try to tackle this in a future post.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary and Conclusion<\/h2>\n\n\n\n<p>In this article, we trained DETR object detection models to detect traffic signs. We trained the models on the Tiny LISA Traffic Sign Detection dataset and compared their performances. After that, we also ran video inference and saw how the best model is performing. Finally, we discussed some improvement points. I hope that this article was worth your time.<\/p>\n\n\n\n<p>If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.<\/p>\n\n\n\n<p>You can contact me using the <strong><a aria-label=\"Contact (opens in a new tab)\" href=\"https:\/\/debuggercafe.com\/contact-us\/\" target=\"_blank\" rel=\"noreferrer noopener\">Contact<\/a><\/strong> section. You can also find me on <strong><a aria-label=\"LinkedIn (opens in a new tab)\" href=\"https:\/\/www.linkedin.com\/in\/sovit-rath\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a><\/strong>, and <strong><a href=\"https:\/\/x.com\/SovitRath5\" target=\"_blank\" rel=\"noreferrer noopener\">X<\/a><\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we train DETR ResNet50 and DETR ResNet50 DC models for traffic sign detection using PyTorch.<\/p>\n","protected":false},"author":1,"featured_media":31697,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[59,120,90],"tags":[437,435,522,521,517,523,518,519,520],"class_list":["post-31647","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-deep-learning","category-object-detection","category-pytorch","tag-detection-transformer","tag-detr","tag-detr-resnet50-dc-training","tag-detr-resnet50-training","tag-detr-traffic-sign-detection","tag-fine-tune-detr-for-traffic-sign-detection","tag-traffic-sign-detection","tag-traffic-sign-detection-pytorch","tag-traffic-sign-detection-using-detection-transformer"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Traffic Sign Detection using DETR<\/title>\n<meta name=\"description\" content=\"Traffic sign detection using DETR ResNet50 and DETR ResNet50 DC5 using PyTorch deep learning library. Run inference in real-time using DETR.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Traffic Sign Detection using DETR\" \/>\n<meta property=\"og:description\" content=\"Traffic sign detection using DETR ResNet50 and DETR ResNet50 DC5 using PyTorch deep learning library. Run inference in real-time using DETR.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/\" \/>\n<meta property=\"og:site_name\" content=\"DebuggerCafe\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/profile.php?id=100013731104496\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-21T00:30:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-09-15T15:40:49+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"563\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Sovit Ranjan Rath\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:site\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sovit Ranjan Rath\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"23 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/\"},\"author\":{\"name\":\"Sovit Ranjan Rath\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"headline\":\"Traffic Sign Detection using DETR\",\"datePublished\":\"2023-08-21T00:30:00+00:00\",\"dateModified\":\"2024-09-15T15:40:49+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/\"},\"wordCount\":2161,\"commentCount\":2,\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png\",\"keywords\":[\"Detection Transformer\",\"DETR\",\"DETR ResNet50 DC Training\",\"DETR ResNet50 Training\",\"DETR Traffic Sign Detection\",\"Fine Tune DETR for Traffic Sign Detection\",\"Traffic Sign Detection\",\"Traffic Sign Detection PyTorch\",\"Traffic Sign Detection using Detection Transformer\"],\"articleSection\":[\"Deep Learning\",\"Object Detection\",\"PyTorch\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/\",\"url\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/\",\"name\":\"Traffic Sign Detection using DETR\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png\",\"datePublished\":\"2023-08-21T00:30:00+00:00\",\"dateModified\":\"2024-09-15T15:40:49+00:00\",\"author\":{\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"description\":\"Traffic sign detection using DETR ResNet50 and DETR ResNet50 DC5 using PyTorch deep learning library. Run inference in real-time using DETR.\",\"breadcrumb\":{\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#primaryimage\",\"url\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png\",\"contentUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png\",\"width\":1000,\"height\":563,\"caption\":\"Traffic Sign Detection using DETR\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/debuggercafe.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Traffic Sign Detection using DETR\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/debuggercafe.com\/#website\",\"url\":\"https:\/\/debuggercafe.com\/\",\"name\":\"DebuggerCafe\",\"description\":\"Machine Learning and Deep Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/debuggercafe.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\",\"name\":\"Sovit Ranjan Rath\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"caption\":\"Sovit Ranjan Rath\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Traffic Sign Detection using DETR","description":"Traffic sign detection using DETR ResNet50 and DETR ResNet50 DC5 using PyTorch deep learning library. Run inference in real-time using DETR.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/","og_locale":"en_US","og_type":"article","og_title":"Traffic Sign Detection using DETR","og_description":"Traffic sign detection using DETR ResNet50 and DETR ResNet50 DC5 using PyTorch deep learning library. Run inference in real-time using DETR.","og_url":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/","og_site_name":"DebuggerCafe","article_publisher":"https:\/\/www.facebook.com\/profile.php?id=100013731104496","article_published_time":"2023-08-21T00:30:00+00:00","article_modified_time":"2024-09-15T15:40:49+00:00","og_image":[{"width":1000,"height":563,"url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png","type":"image\/png"}],"author":"Sovit Ranjan Rath","twitter_card":"summary_large_image","twitter_creator":"@SovitRath5","twitter_site":"@SovitRath5","twitter_misc":{"Written by":"Sovit Ranjan Rath","Est. reading time":"23 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#article","isPartOf":{"@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/"},"author":{"name":"Sovit Ranjan Rath","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"headline":"Traffic Sign Detection using DETR","datePublished":"2023-08-21T00:30:00+00:00","dateModified":"2024-09-15T15:40:49+00:00","mainEntityOfPage":{"@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/"},"wordCount":2161,"commentCount":2,"image":{"@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png","keywords":["Detection Transformer","DETR","DETR ResNet50 DC Training","DETR ResNet50 Training","DETR Traffic Sign Detection","Fine Tune DETR for Traffic Sign Detection","Traffic Sign Detection","Traffic Sign Detection PyTorch","Traffic Sign Detection using Detection Transformer"],"articleSection":["Deep Learning","Object Detection","PyTorch"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/","url":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/","name":"Traffic Sign Detection using DETR","isPartOf":{"@id":"https:\/\/debuggercafe.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#primaryimage"},"image":{"@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png","datePublished":"2023-08-21T00:30:00+00:00","dateModified":"2024-09-15T15:40:49+00:00","author":{"@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"description":"Traffic sign detection using DETR ResNet50 and DETR ResNet50 DC5 using PyTorch deep learning library. Run inference in real-time using DETR.","breadcrumb":{"@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#primaryimage","url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png","contentUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2023\/07\/Traffic-Sign-Detection-using-DETR-e1689039149152.png","width":1000,"height":563,"caption":"Traffic Sign Detection using DETR"},{"@type":"BreadcrumbList","@id":"https:\/\/debuggercafe.com\/traffic-sign-detection-using-detr\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/debuggercafe.com\/"},{"@type":"ListItem","position":2,"name":"Traffic Sign Detection using DETR"}]},{"@type":"WebSite","@id":"https:\/\/debuggercafe.com\/#website","url":"https:\/\/debuggercafe.com\/","name":"DebuggerCafe","description":"Machine Learning and Deep Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/debuggercafe.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752","name":"Sovit Ranjan Rath","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","caption":"Sovit Ranjan Rath"}}]}},"_links":{"self":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/31647","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/comments?post=31647"}],"version-history":[{"count":61,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/31647\/revisions"}],"predecessor-version":[{"id":38114,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/31647\/revisions\/38114"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media\/31697"}],"wp:attachment":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media?parent=31647"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/categories?post=31647"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/tags?post=31647"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}