{"id":34994,"date":"2024-03-25T06:00:00","date_gmt":"2024-03-25T00:30:00","guid":{"rendered":"https:\/\/debuggercafe.com\/?p=34994"},"modified":"2025-06-16T06:58:28","modified_gmt":"2025-06-16T01:28:28","slug":"improving-face-keypoint-detection","status":"publish","type":"post","link":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/","title":{"rendered":"Improving Face Keypoint Detection"},"content":{"rendered":"\n<p>In this article, we will make additional improvements to our previous custom <strong><a href=\"https:\/\/debuggercafe.com\/robust-facial-keypoint-detection-model\/\" target=\"_blank\" rel=\"noreferrer noopener\">robust ke<\/a><a href=\"https:\/\/debuggercafe.com\/robust-facial-keypoint-detection-model\/\">ypoint detection model<\/a><\/strong>. In addition to a faster face detection model, we will optimize the face keypoint regressor model, as well as the inference pipeline. All in all, this article is all about <strong><em>improving the face keypoint detection model<\/em><\/strong> and pipeline. <\/p>\n\n\n\n<div class=\"wp-block-buttons is-horizontal is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-499968f5 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--1\"><a class=\"wp-block-button__link has-black-color has-luminous-vivid-orange-background-color has-text-color has-background wp-element-button\" href=\"#download-code\"><strong>Jump to Download Code<\/strong><\/a><\/div>\n<\/div>\n\n\n\n<p><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving_face_keypoint_detection_video_3-output.com-optimize.gif\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"540\" height=\"285\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving_face_keypoint_detection_video_3-output.com-optimize.gif\" alt=\"Improved face keypoint detection pipeline output.\" class=\"wp-image-35036\"\/><\/a><figcaption class=\"wp-element-caption\">Figure 1. Improved face keypoint detection pipeline output.<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><em>We will cover the following topics in this article<\/em><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>We will start with a discussion of the issues with our previous approach to face keypoint detection.<\/em><\/li>\n\n\n\n<li><em>Then we will move to the discussion of the changes that we will make to make the face keypoint detection pipeline faster.<\/em><\/li>\n\n\n\n<li><em>Next, we will cover the code changes that we are making for improving the face keypoint detection. Overall, we aim<strong> to make the face keypoint detection pipeline around 2x faster<\/strong> while keeping the accuracy intact. <\/em><\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>We will cover only the essential changes to the Python code files. You may go through the <strong><a href=\"https:\/\/debuggercafe.com\/robust-facial-keypoint-detection-model\/\" target=\"_blank\" rel=\"noreferrer noopener\">previous article<\/a><\/strong> to cover the code explanation in detail. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Issues with Previous Keypoint Detection Approach and Steps to Improving It<\/h2>\n\n\n\n<p>In our previous article, we used the pretrained MTCNN model for face detection and a custom ResNet50 model for facial keypoint detection. Primarily, there were two issues with the approach.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The pretrained MTCNN model was too slow.<\/li>\n\n\n\n<li>The MTCNN model predicted too many false positives leading to false positives in face keypoint detection as well. <\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>There are better and faster custom face detection models that will speed up the entire pipeline considerably. <\/p>\n\n\n\n<p>In this article, we will specifically focus on the speed improvements. This needs a change in two components.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We will use a YOLOv8 model which has been custom trained on the WIDER Face dataset.<\/li>\n\n\n\n<li>Along with that, we will train a smaller and faster face keypoint detection model. We will switch the model from ResNet50 to ShuffleNet.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Getting the Dataset<\/h2>\n\n\n\n<p>We will use the preprocessed dataset that we discussed in the first article. You can find the cropped dataset <strong><a href=\"https:\/\/www.kaggle.com\/datasets\/sovitrath\/cropped-face-keypoint-dataset-68-landmarks\" target=\"_blank\" rel=\"noreferrer noopener\">here on Kaggle<\/a><\/strong>.<\/p>\n\n\n\n<p>You may also go through the <strong>previous article<\/strong> to learn more about the dataset preparation steps.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/facial_keypoint_dataset_ground_truth_uncropped.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"600\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/facial_keypoint_dataset_ground_truth_uncropped.png\" alt=\"Facial keypoint detection ground truth dataset.\" class=\"wp-image-35038\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/facial_keypoint_dataset_ground_truth_uncropped.png 600w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/facial_keypoint_dataset_ground_truth_uncropped-300x300.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/facial_keypoint_dataset_ground_truth_uncropped-150x150.png 150w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 2. Facial keypoint detection ground truth dataset.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>For brevity, here, we will only talk about the major changes and additions that we make to the dataset preparation steps.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Project Directory Structure<\/h2>\n\n\n\n<p>Let&#8217;s take a look at the directory structure before covering the code for improving the face keypoint detection.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\u251c\u2500\u2500 input\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 inference_images\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 image_1.jpg\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 image_2.jpg\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 image_3.jpg\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 image_4.jpg\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 inference_videos\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u251c\u2500\u2500 video_1.mp4\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 video_2.mp4\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 new_data\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 test [766 entries exceeds filelimit, not opening dir]\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 training [3438 entries exceeds filelimit, not opening dir]\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 test.csv\n\u2502\u00a0\u00a0     \u2514\u2500\u2500 training.csv\n\u251c\u2500\u2500 outputs [203 entries exceeds filelimit, not opening dir]\n\u251c\u2500\u2500 weights\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 yolov8n_100e.onnx\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 yolov8n_100e.pt\n\u251c\u2500\u2500 config.py\n\u251c\u2500\u2500 datasets.py\n\u251c\u2500\u2500 export.py\n\u251c\u2500\u2500 inference_image.py\n\u251c\u2500\u2500 inference_video.py\n\u251c\u2500\u2500 model.py\n\u251c\u2500\u2500 train.py\n\u2514\u2500\u2500 utils.py<\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">input<\/code> directory contains the training and inference dataset.<\/li>\n\n\n\n<li>The <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">outputs<\/code> directory contains the face keypoint detection model weights and the inference outputs.<\/li>\n\n\n\n<li>In the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">weights<\/code> folder, we have the YOLOv8 face detection model weights that we will discuss further in the article.<\/li>\n\n\n\n<li>Finally, we have all the Python files in the parent project directory.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-background\" style=\"background-color:#ffb76a\"><strong><em>All the code files and trained weights are available via the download section.<\/em><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Library Dependencies<\/h2>\n\n\n\n<p>There are some major library dependencies for this article. You may install them as you see necessary.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ultralytics for YOLOv8<\/strong><\/li>\n<\/ul>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pip install ultralytics<\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>ONNX, ONNX Simplifier, and ONNX Runtime<\/strong><\/li>\n<\/ul>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pip install onnx 1.15.0\npip install onnx-simplifier 0.4.35\npip install onnxruntime-gpu 1.17.0<\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/debuggercafe.com\/image-augmentation-using-pytorch-and-albumentations\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Albumentations<\/strong> <strong>for augmentation<\/strong><\/a><\/li>\n<\/ul>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pip install -U albumentations --no-binary qudida,albumentations<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>Along with that, we also need the PyTorch framework. You can install it according to your configuration from the <strong><a href=\"https:\/\/pytorch.org\/get-started\/locally\/\" target=\"_blank\" rel=\"noreferrer noopener\">official website<\/a><\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Improving Face Keypoint Detection<\/h2>\n\n\n\n<p>Let&#8217;s jump into the technical aspects and go through the code changes that we have to make for improving the speed and accuracy of our face keypoint detection pipeline.<\/p>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-center\" id=\"download-code\">Download Code<\/h3>\n\n\n\n<div class=\"wp-block-button is-style-outline center\"><a data-sumome-listbuilder-id=\"126951c5-1fd5-41e9-93cd-5149c4a729bc\" class=\"wp-block-button__link has-black-color has-luminous-vivid-orange-background-color has-text-color has-background\"><b>Download the Source Code for this Tutorial<\/b><\/a><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Dataset Pipeline<\/h3>\n\n\n\n<p>In the previous article&#8217;s dataset preparation process, we did not apply any augmentations to the images. This time, we will apply augmentations to make the learning process of the model more robust. As we need to deal with keypoints as well, we will use the Albumentations library for augmentations.<\/p>\n\n\n\n<p>Here are the additional training and validation augmentation &amp; transforms that we define in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">datasets.py<\/code> file.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">train_transform = A.Compose([\n    A.Resize(width=224, height=224, always_apply=True),\n    A.RandomBrightnessContrast(p=0.5),\n    A.Rotate(limit=35, p=0.5),\n    A.RandomGamma(p=0.3),\n    A.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225)\n    )\n], keypoint_params=A.KeypointParams(format='xy', remove_invisible=False))\n\nvalid_transform = A.Compose([\n    A.Resize(width=224, height=224, always_apply=True),\n    A.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225)\n    )\n], keypoint_params=A.KeypointParams(format='xy', remove_invisible=False))<\/pre>\n\n\n\n<p>We apply <strong>random brightness &amp; contrast<\/strong>, <strong>rotation<\/strong>, and <strong>random gamma<\/strong> augmentations to the training samples. We resize the samples to 224&#215;224 resolution and apply the ImageNet normalization values as we will <strong><a href=\"https:\/\/debuggercafe.com\/transfer-learning-using-pytorch-shufflenetv2\/\" target=\"_blank\" rel=\"noreferrer noopener\">fine-tune a pretrained ShuffleNet<\/a><\/strong> model for the keypoint regression.<\/p>\n\n\n\n<p>Additionally, we need to handle the keypoints as well. When we rotate an image horizontally, the keypoints need to be transformed accordingly. For this, we use the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">keypoint_params<\/code> argument of Albumentations. We provide the keypoint format first. Ours will be the X and Y coordinates in a list for each image and therefore, we provide <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">format='xy'<\/code>. Furthermore, sometimes, during dataset preparation or augmentation, some keypoints may fall outside the image borders. To avoid value errors in Albumentations, we can pass <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">remove_invisible=False<\/code>.<\/p>\n\n\n\n<p>Minor changes also happen in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">FaceKeypointDataset<\/code> class as we need not handle the resize and normalization manually.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">class FaceKeypointDataset(Dataset):\n    def __init__(self, samples, path, is_train=False):\n        self.data = samples\n        self.path = path\n        self.is_train = is_train\n\n    def __len__(self):\n        return len(self.data)\n    \n    def __getitem__(self, index):\n        image = cv2.imread(f\"{self.path}\/{self.data.iloc[index][0]}\")\n        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n\n        # Get the keypoints.\n        keypoints = self.data.iloc[index][1:]\n        keypoints = np.array(keypoints, dtype='int')\n        # Reshape the keypoints.\n        keypoints = keypoints.reshape(-1, 2)\n\n        if self.is_train:\n            transformed_data = train_transform(image=image, keypoints=keypoints)\n        else:\n            transformed_data = valid_transform(image=image, keypoints=keypoints)\n\n        image = transformed_data['image']\n        keypoints = transformed_data['keypoints']\n\n        # Transpose for getting the channel size to index 0.\n        image = np.transpose(image, (2, 0, 1))\n\n        return {\n            'image': torch.tensor(image, dtype=torch.float),\n            'keypoints': torch.tensor(keypoints, dtype=torch.float),\n        }<\/pre>\n\n\n\n<p>The above are all the changes that we need for the dataset preparation step.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The ShuffleNet Model for Improving Keypoint Detection Speed<\/h3>\n\n\n\n<p>Previously, we used the <strong><a href=\"https:\/\/debuggercafe.com\/building-resnets-from-scratch-using-pytorch\/\" target=\"_blank\" rel=\"noreferrer noopener\">ResNet50 model<\/a><\/strong> for the keypoint regression training and detection. However, it contains more than 25 million parameters. With proper augmentations and hyperparameters, we can train a much smaller model. This time, we will train one of the smallest <strong>ShuffleNetV2 models from PyTorch<\/strong>.<\/p>\n\n\n\n<p>This is the entire dataset class that goes into the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">model.py<\/code> file.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">class FaceKeypointModel(nn.Module):\n    def __init__(self, pretrained=False, requires_grad=True):\n        super(FaceKeypointModel, self).__init__()\n        if pretrained:\n            self.model = shufflenet_v2_x0_5(weights='DEFAULT')\n        else:\n            self.model = shufflenet_v2_x0_5(weights=None)\n\n        if requires_grad:\n            for param in self.model.parameters():\n                param.requires_grad = True\n            print('Training intermediate layer parameters...')\n        else:\n            for param in self.model.parameters():\n                param.requires_grad = False\n            print('Freezing intermediate layer parameters...')\n\n        # change the final layer\n        self.model.fc = nn.Linear(in_features=1024, out_features=136)\n\n    def forward(self, x):\n        out = self.model(x)\n        return out<\/pre>\n\n\n\n<p>The final model for our use contains just <strong>481,192 parameters<\/strong>. We can use a much higher learning rate and a higher batch size.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Training the ShuffleNetV2 Model for Improving Face Keypoint Detection<\/h3>\n\n\n\n<p>Following are the hyperparameters for this training phase:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A batch size of 64<\/li>\n\n\n\n<li>A learning rate of 0.001<\/li>\n\n\n\n<li>And training for 100 epochs<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>You can find the details in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">config.py<\/code> file.<\/p>\n\n\n\n<p>All the training and inference experiments were carried out on a system with <strong>10 GB RTX 3080 GPU<\/strong>, <strong>10th generation i7 CPU<\/strong>, and <strong>32 GB RAM<\/strong>.<\/p>\n\n\n\n<p>We can execute the following command to start the training.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python train.py <\/pre>\n\n\n\n<p>The training script saves the best model to disk whenever the validation loss reaches a new lower value. So, we get the best model in the end. Following are the logs from the best epoch.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Epoch 98 of 100\nTraining\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 54\/54 [00:06&lt;00:00,  7.89it\/s]\nValidating\n  0%|                                                                                                                                                                                       | 0\/12 [00:00&lt;?, ?it\/s]Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 12\/12 [00:01&lt;00:00,  9.97it\/s]\nTrain Loss: 3.5155\nVal Loss: 3.2896\nSAVING BEST MODEL FOR LEAST LOSS TILL NOW...<\/pre>\n\n\n\n<p>The best model was obtained on epoch 98 with a <strong>validation loss of 3.2896<\/strong>.<\/p>\n\n\n\n<p>Here is the loss graph after training.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-facial-keypoint-detection-loss.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"700\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-facial-keypoint-detection-loss.png\" alt=\"The loss graph while improving the face keypoint detection pipeline.\" class=\"wp-image-35040\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-facial-keypoint-detection-loss.png 1000w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-facial-keypoint-detection-loss-300x210.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-facial-keypoint-detection-loss-768x538.png 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 3. The loss graph while improving the face keypoint detection pipeline. It looks like with a bit of learning rate scheduling, we can train the model for even longer.<\/figcaption><\/figure>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\">The YOLO Object Detection Model<\/h3>\n\n\n\n<p>We are using a YOLOv8 model which has been pretrained on the WIDER Face dataset for <strong><a href=\"https:\/\/debuggercafe.com\/automatic-face-and-facial-landmark-detection-with-facenet-pytorch\/\" target=\"_blank\" rel=\"noreferrer noopener\">face detection<\/a><\/strong>. You can find the <strong><a href=\"https:\/\/github.com\/Yusepp\/YOLOv8-Face\" target=\"_blank\" rel=\"noreferrer noopener\">repository here<\/a><\/strong>. We are using the YOLOv8 Nano model from this repository and have kept it in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">weights<\/code> directory.<\/p>\n\n\n\n<p>To optimize the inference process even further, we have exported the model to ONNX format using the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">export.py<\/code> script to run on 320&#215;320 resolution images.<\/p>\n\n\n\n<p>Thanks to the Ultralytics library, the code for exporting it is pretty straightforward.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\"\"\"\nScript to export YOLO face detection model.\n\"\"\"\n\nfrom ultralytics import YOLO\n\nmodel = YOLO('weights\/yolov8n_100e.pt')\n\nmodel.export(format='onnx', imgsz=320)<\/pre>\n\n\n\n<p>You will get access to the original <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">.pt<\/code> file and also the ONNX format when downloading the zip file for this article.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Inference on Images<\/h2>\n\n\n\n<p>Let&#8217;s check the code to run inference on images. In this section, we will combine the detection outputs from the YOLOv8 model and our own keypoint detection model. A lot of the code will also remain similar to the previous post in the series. For a detailed explanation, I would recommend going through that post. The code for this resides in the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">inference_image.py<\/code> script.<\/p>\n\n\n\n<p>First, we have all the import statements.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"image_inference.py\" data-enlighter-group=\"image_inference_1\">import torch\nimport numpy as np\nimport cv2\nimport glob\nimport os\nimport albumentations as A\n\nfrom model import FaceKeypointModel\nfrom ultralytics import YOLO\nfrom tqdm import tqdm<\/pre>\n\n\n\n<p>Next, we have the helper function to crop and pad the detected image.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"11\" data-enlighter-title=\"inference_image.py\" data-enlighter-group=\"inference_image_2\"># Crop the detected face with padding and return it.\ndef crop_image(box, image, pre_padding=7, post_padding=7):\n    x1, y1, x2, y2 = box\n    x1 = x1 - pre_padding\n    y1 = y1 - pre_padding\n    x2 = x2 + post_padding\n    y2 = y2 + post_padding\n    cropped_image = image[int(y1):int(y2), int(x1):int(x2)]\n    return cropped_image, x1, y1\n\nvalid_transform = A.Compose([\n    A.Resize(width=224, height=224, always_apply=True),\n    A.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225)\n    )\n])<\/pre>\n\n\n\n<p>Then we need to define the computation device, load both models, and define all the image paths.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"28\" data-enlighter-title=\"inference_image.py\" data-enlighter-group=\"inference_image_3\">device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n\nout_dir = os.path.join('outputs', 'image_inference')\nos.makedirs(out_dir, exist_ok=True)\n\nmodel = FaceKeypointModel(pretrained=False, requires_grad=False).to(device)\n# Load the keypoint model checkpiont.\ncheckpoint = torch.load('outputs\/model.pth')\nmodel.load_state_dict(checkpoint['model_state_dict'])\nmodel.eval()\n\n# Load the YOLO model.\nyolo = YOLO('weights\/yolov8n_100e.onnx')\n\ninput_path = 'input\/inference_images'\n\nall_image_paths = glob.glob(os.path.join(input_path, '*'))<\/pre>\n\n\n\n<p>Finally, loop over the image paths and carry out the inference.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"45\" data-enlighter-title=\"inference_image.py\" data-enlighter-group=\"inference_image_4\">for image_path in tqdm(all_image_paths, total=len(all_image_paths)):\n    image_name = image_path.split(os.path.sep)[-1]\n\n    orig_image = cv2.imread(image_path)\n\n    with torch.no_grad():\n        results = yolo(orig_image, imgsz=320)\n\n    # Iterate over the YOLO `results` object.\n    for r in results:\n        # Detect keypoints if face is detected.\n        if len(r.boxes.xyxy) > 0:\n            for box in r.boxes.xyxy:\n                cropped_image, x1, y1 = crop_image(box, orig_image)\n\n                image = cropped_image.copy()\n\n                if image.shape[0] > 1 and image.shape[1] > 1:\n                    image = cv2.resize(image, (224, 224))\n                \n                    image = image \/ 255.0\n                \n                    image = np.transpose(image, (2, 0, 1))\n                    image = torch.tensor(image, dtype=torch.float)\n                    image = image.unsqueeze(0).to(device)\n                \n                    with torch.no_grad():\n                        outputs = model(image)\n                \n                    outputs = outputs.cpu().detach().numpy()\n                \n                    outputs = outputs.reshape(-1, 2)\n                    keypoints = outputs\n\n                    # Draw keypoints on face.\n                    for i, p in enumerate(keypoints):\n                        p[0] = p[0] \/ 224 * cropped_image.shape[1]\n                        p[1] = p[1] \/ 224 * cropped_image.shape[0]\n                \n                        p[0] += x1\n                        p[1] += y1\n                        \n                        cv2.circle(\n                            orig_image, \n                            (int(p[0]), int(p[1])),\n                            2, \n                            (0, 0, 255), \n                            -1, \n                            cv2.LINE_AA\n                        )\n                \n    cv2.imwrite(os.path.join(out_dir, image_name), orig_image)<\/pre>\n\n\n\n<p>On <strong>lines 50 and 51<\/strong>, we pass the image through the YOLOv8 model to detect the faces. Then, for each <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">result<\/code> object, we only move forward with the keypoint detection if any faces are detected. If the YOLOv8 model detects faces, then we crop the face area along with padding on each side. Next, we pass this cropped face through the improved keypoint detection model. After each inference step, we annotate the image with the keypoints and save it to disk.<\/p>\n\n\n\n<p>Let&#8217;s execute the script and check the results.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python inference_image.py<\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-face-keypoint-detection-image-inference.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"600\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-face-keypoint-detection-image-inference.png\" alt=\"Image inference outputs after improving the face keypoint detection pipeline.\" class=\"wp-image-35042\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-face-keypoint-detection-image-inference.png 600w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-face-keypoint-detection-image-inference-300x300.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving-face-keypoint-detection-image-inference-150x150.png 150w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 4. Image inference outputs after improving the face keypoint detection pipeline.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The results look good. However, we can see that if the faces are slightly tilted, then the keypoints are also a bit out of place. This can be easily solved with more data and variations in the training set.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Inference on Videos<\/h2>\n\n\n\n<p>Our main aim here is to improve the speed and accuracy that we had over our <strong><a href=\"https:\/\/debuggercafe.com\/robust-facial-keypoint-detection-model\/\" target=\"_blank\" rel=\"noreferrer noopener\">previous method<\/a><\/strong>. Let&#8217;s run inference on one of the same videos and check whether we succeed or not.<\/p>\n\n\n\n<p>We run inference on videos by executing the <code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">inference_video.py<\/code> script.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python inference_video.py --input input\/inference_videos\/video_2.mp4<\/pre>\n\n\n\n<figure class=\"wp-block-video aligncenter\"><video height=\"720\" style=\"aspect-ratio: 1280 \/ 720;\" width=\"1280\" autoplay controls loop muted src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving_face_keypoint_detection_video_2.mp4\"><\/video><figcaption class=\"wp-element-caption\">Clip 1. Real-time face keypoint detection using our improved pipeline. With the same video and using YOLOV8 ONNX model and a light weight keypoint regressor, we are able to reach more than 40 FPS compared to our previous method.<\/figcaption><\/figure>\n\n\n\n<p>There are two major improvements that we can see right away:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>There are way less false positives this time.<\/li>\n\n\n\n<li>A boost in FPS leads to an <strong>average of 50 FPS<\/strong> on the same hardware.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>However, whenever there are too many faces in a frame, the FPS still takes a hit.<\/p>\n\n\n\n<p>Let&#8217;s try a video where there are multiple faces in almost all the frames.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python inference_video.py --input input\/inference_videos\/video_3.mp4<\/pre>\n\n\n\n<figure class=\"wp-block-video aligncenter\"><video height=\"338\" style=\"aspect-ratio: 640 \/ 338;\" width=\"640\" autoplay controls loop muted src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/improving_face_keypoint_detection_video_3.mp4\"><\/video><figcaption class=\"wp-element-caption\">Clip 2. The detection looks quite good here. However, when there are many faces in a single frame, the FPS takes a hit.<\/figcaption><\/figure>\n\n\n\n<p>This time the FPS dips to an average of 7 FPS. This shows that our pipeline is not completely optimized as of yet.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Takeaways<\/h2>\n\n\n\n<p>We tried to improve our two-stage face keypoint detection as much as we could. There are a few more ways like converting the keypoint detection model to ONNX format as well but that will not help much with the speed. Although a bit slower with multiple faces, our face keypoint regression model can carry out detections from several angles and in different lighting conditions. <\/p>\n\n\n\n<p>This shows that integrated single stage models may be faster but with a two stage pipeline, our detections are far superior. We can tune the face detection and keypoint detection models independently on hundreds of thousands of images.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Further Reading<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><a href=\"https:\/\/debuggercafe.com\/getting-started-with-facial-keypoint-detection-using-pytorch\/\" target=\"_blank\" rel=\"noreferrer noopener\">Getting Started with Facial Keypoint Detection using Deep Learning and PyTorch<\/a><\/strong><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/debuggercafe.com\/advanced-facial-keypoint-detection-with-pytorch\/\" target=\"_blank\" rel=\"noreferrer noopener\">Advanced Facial Keypoint Detection with PyTorch<\/a><\/strong><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/debuggercafe.com\/simple-facial-keypoint-detection-using-tensorflow-and-keras\/\" target=\"_blank\" rel=\"noreferrer noopener\">Simple Facial Keypoint Detection using TensorFlow and Keras<\/a><\/strong><\/li>\n\n\n\n<li><strong><a href=\"https:\/\/debuggercafe.com\/face-landmark-detection-using-dlib\/\" target=\"_blank\" rel=\"noreferrer noopener\">Face Landmark Detection using Dlib<\/a><\/strong><\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary and Conclusion<\/h2>\n\n\n\n<p>In this article, we tried improving our two-stage keypoint detection pipeline in terms of speed. Along with a lightweight face detection model and converting it to ONNX, we also trained a much smaller keypoint regressor model. Although we were able to succeed to an extent, we also found potential drawbacks. We also discussed why single stage models may suit better for such use cases. I hope that this article was worth your time.<\/p>\n\n\n\n<p>If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.<\/p>\n\n\n\n<p>You can contact me using the <strong><a aria-label=\"Contact (opens in a new tab)\" href=\"https:\/\/debuggercafe.com\/contact-us\/\" target=\"_blank\" rel=\"noreferrer noopener\">Contact<\/a><\/strong> section. You can also find me on <strong><a aria-label=\"LinkedIn (opens in a new tab)\" href=\"https:\/\/www.linkedin.com\/in\/sovit-rath\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a><\/strong>, and <strong><a href=\"https:\/\/x.com\/SovitRath5\" target=\"_blank\" rel=\"noreferrer noopener\">X<\/a><\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we try improving out face keypoint detection pipeline by using the YOLOV8 Nano model as the face detector and training a custom face keypoint regressor.<\/p>\n","protected":false},"author":1,"featured_media":35051,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[76,184,766,90,295],"tags":[783,1387,782,785,784,786,1388],"class_list":["post-34994","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-vision","category-face-applications","category-keypoint-detection","category-pytorch","category-yolo","tag-face-keypoint-detection-with-yolov8","tag-face-keypoint-with-yolo","tag-improving-face-keypoint-detection","tag-real-time-face-keypoint-detection","tag-shufflenet-for-face-keypoint-detection","tag-two-stage-face-keypoint-detection","tag-yolo-and-facial-keypoint-detection"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Improving Face Keypoint Detection<\/title>\n<meta name=\"description\" content=\"Improving face keypoint detection using the YOLOv8 Nano ONNX model for face detection and ShuffleNetV2 model for keypoint regression.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Improving Face Keypoint Detection\" \/>\n<meta property=\"og:description\" content=\"Improving face keypoint detection using the YOLOv8 Nano ONNX model for face detection and ShuffleNetV2 model for keypoint regression.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/\" \/>\n<meta property=\"og:site_name\" content=\"DebuggerCafe\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/profile.php?id=100013731104496\" \/>\n<meta property=\"article:published_time\" content=\"2024-03-25T00:30:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-16T01:28:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"563\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Sovit Ranjan Rath\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:site\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sovit Ranjan Rath\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"18 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/\"},\"author\":{\"name\":\"Sovit Ranjan Rath\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"headline\":\"Improving Face Keypoint Detection\",\"datePublished\":\"2024-03-25T00:30:00+00:00\",\"dateModified\":\"2025-06-16T01:28:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/\"},\"wordCount\":1805,\"commentCount\":0,\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png\",\"keywords\":[\"Face Keypoint Detection with YOLOv8\",\"Face Keypoint with YOLO\",\"Improving Face Keypoint Detection\",\"Real-Time Face Keypoint Detection\",\"ShuffleNet for Face Keypoint Detection\",\"Two-Stage Face Keypoint Detection\",\"YOLO and Facial Keypoint Detection\"],\"articleSection\":[\"Computer Vision\",\"Face Applications\",\"Keypoint Detection\",\"PyTorch\",\"YOLO\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/\",\"url\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/\",\"name\":\"Improving Face Keypoint Detection\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png\",\"datePublished\":\"2024-03-25T00:30:00+00:00\",\"dateModified\":\"2025-06-16T01:28:28+00:00\",\"author\":{\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"description\":\"Improving face keypoint detection using the YOLOv8 Nano ONNX model for face detection and ShuffleNetV2 model for keypoint regression.\",\"breadcrumb\":{\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#primaryimage\",\"url\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png\",\"contentUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png\",\"width\":1000,\"height\":563,\"caption\":\"Improving Face Keypoint Detection\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/debuggercafe.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Improving Face Keypoint Detection\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/debuggercafe.com\/#website\",\"url\":\"https:\/\/debuggercafe.com\/\",\"name\":\"DebuggerCafe\",\"description\":\"Machine Learning and Deep Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/debuggercafe.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\",\"name\":\"Sovit Ranjan Rath\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"caption\":\"Sovit Ranjan Rath\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Improving Face Keypoint Detection","description":"Improving face keypoint detection using the YOLOv8 Nano ONNX model for face detection and ShuffleNetV2 model for keypoint regression.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/","og_locale":"en_US","og_type":"article","og_title":"Improving Face Keypoint Detection","og_description":"Improving face keypoint detection using the YOLOv8 Nano ONNX model for face detection and ShuffleNetV2 model for keypoint regression.","og_url":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/","og_site_name":"DebuggerCafe","article_publisher":"https:\/\/www.facebook.com\/profile.php?id=100013731104496","article_published_time":"2024-03-25T00:30:00+00:00","article_modified_time":"2025-06-16T01:28:28+00:00","og_image":[{"width":1000,"height":563,"url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png","type":"image\/png"}],"author":"Sovit Ranjan Rath","twitter_card":"summary_large_image","twitter_creator":"@SovitRath5","twitter_site":"@SovitRath5","twitter_misc":{"Written by":"Sovit Ranjan Rath","Est. reading time":"18 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#article","isPartOf":{"@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/"},"author":{"name":"Sovit Ranjan Rath","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"headline":"Improving Face Keypoint Detection","datePublished":"2024-03-25T00:30:00+00:00","dateModified":"2025-06-16T01:28:28+00:00","mainEntityOfPage":{"@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/"},"wordCount":1805,"commentCount":0,"image":{"@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png","keywords":["Face Keypoint Detection with YOLOv8","Face Keypoint with YOLO","Improving Face Keypoint Detection","Real-Time Face Keypoint Detection","ShuffleNet for Face Keypoint Detection","Two-Stage Face Keypoint Detection","YOLO and Facial Keypoint Detection"],"articleSection":["Computer Vision","Face Applications","Keypoint Detection","PyTorch","YOLO"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/","url":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/","name":"Improving Face Keypoint Detection","isPartOf":{"@id":"https:\/\/debuggercafe.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#primaryimage"},"image":{"@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png","datePublished":"2024-03-25T00:30:00+00:00","dateModified":"2025-06-16T01:28:28+00:00","author":{"@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"description":"Improving face keypoint detection using the YOLOv8 Nano ONNX model for face detection and ShuffleNetV2 model for keypoint regression.","breadcrumb":{"@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#primaryimage","url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png","contentUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/02\/Improving-Face-Keypoint-Detection-e1707268332867.png","width":1000,"height":563,"caption":"Improving Face Keypoint Detection"},{"@type":"BreadcrumbList","@id":"https:\/\/debuggercafe.com\/improving-face-keypoint-detection\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/debuggercafe.com\/"},{"@type":"ListItem","position":2,"name":"Improving Face Keypoint Detection"}]},{"@type":"WebSite","@id":"https:\/\/debuggercafe.com\/#website","url":"https:\/\/debuggercafe.com\/","name":"DebuggerCafe","description":"Machine Learning and Deep Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/debuggercafe.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752","name":"Sovit Ranjan Rath","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","caption":"Sovit Ranjan Rath"}}]}},"_links":{"self":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/34994","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/comments?post=34994"}],"version-history":[{"count":81,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/34994\/revisions"}],"predecessor-version":[{"id":38143,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/34994\/revisions\/38143"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media\/35051"}],"wp:attachment":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media?parent=34994"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/categories?post=34994"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/tags?post=34994"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}