{"id":38932,"date":"2024-12-16T06:00:00","date_gmt":"2024-12-16T00:30:00","guid":{"rendered":"https:\/\/debuggercafe.com\/?p=38932"},"modified":"2024-10-31T06:28:18","modified_gmt":"2024-10-31T00:58:18","slug":"exploring-hq-sam","status":"publish","type":"post","link":"https:\/\/debuggercafe.com\/exploring-hq-sam\/","title":{"rendered":"Exploring HQ-SAM"},"content":{"rendered":"\n<p>In this article, we will explore <strong>HQ-SAM<\/strong> (<strong>High Quality Segment Anything Model<\/strong>), one of the derivative works of SAM.<\/p>\n\n\n\n<p>The Segment Anything (SAM) model by Meta revolutionized the way we think about image segmentation. Moving from a hundred thousand mask labels to more than a billion mask labels for training. From class-specific segmentation to class-agnostic segmentation, it paved the way for new possibilities. However, the very first version of SAM had its limitations. This also led the way for innovative derivative works, like <strong><em>HQ-SAM<\/em><\/strong>. This will be our primary focus in this article while absorbing as much detail as possible from the released paper.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-sample-results.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"800\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-sample-results.png\" alt=\"HQ-SAM - some sample results and COCO AP comparison with other SAM model.\" class=\"wp-image-38970\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-sample-results.png 800w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-sample-results-300x300.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-sample-results-150x150.png 150w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-sample-results-768x768.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 1. HQ-SAM &#8211; some sample results and COCO AP comparison with the original SAM model.<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><em>What are we going to cover in HQ-SAM?<\/em><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>What is HQ-SAM and why do we need it?<\/em><\/li>\n\n\n\n<li><em>What are the architectural changes made to the original SAM to create HQ-SAM?<\/em><\/li>\n\n\n\n<li><em>How was the dataset curated to train HQ-SAM?<\/em><\/li>\n\n\n\n<li><em>What was the training strategy?<\/em><\/li>\n\n\n\n<li><em>How does HQ-SAM stack up against the original SAM?<\/em><\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-button is-style-outline center\"><a data-sumome-listbuilder-id=\"3321a264-c276-4c66-a339-b7403bcb65ba\" class=\"wp-block-button__link has-black-color has-luminous-vivid-orange-background-color has-text-color has-background\"><b>SIGN UP TO RECEIVE WEEKLY UPDATES<\/b><\/a><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">What is HQ-SAM and Why Do We Need It?<\/h2>\n\n\n\n<p>HQ-SAM was introduced in the paper titled <em>Segment Anything in High Quality<\/em> by <em>Ke et al.<\/em> It is a modification to the original Segment Anything Model for obtaining high quality masks when segmenting small, detailed, and intricate objects.<\/p>\n\n\n\n<p>The original SAM model changed the way we think about object segmentation. We got a promptable segmentation model that could take user&#8217;s input via points, bounding boxes, and representation masks. This flexibility paired with the training on 1.1 billion masks, made SAM an indispensable tool in many computer vision tasks, and mainly annotation.<\/p>\n\n\n\n<p>However, SAM was not good at segmenting smaller objects or those with intricate designs. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/sam-failure-intricate-objects.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1512\" height=\"287\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/sam-failure-intricate-objects.png\" alt=\"Failure cases of SAM when trying to segment intricate objects.\" class=\"wp-image-38972\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/sam-failure-intricate-objects.png 1512w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/sam-failure-intricate-objects-300x57.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/sam-failure-intricate-objects-768x146.png 768w\" sizes=\"auto, (max-width: 1512px) 100vw, 1512px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 2. Failure cases of SAM when trying to segment intricate objects.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Sometimes, it missed the entire context of a prompt. For example, when the user provided a bounding for segmenting a chair, there was a high chance that it would segment the surrounding areas well.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/sam-segmenting-surrounding-area.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"501\" height=\"491\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/sam-segmenting-surrounding-area.png\" alt=\"SAM segmenting the surrounding area when prompted to segment the chair using bounding box.\" class=\"wp-image-38974\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/sam-segmenting-surrounding-area.png 501w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/sam-segmenting-surrounding-area-300x294.png 300w\" sizes=\"auto, (max-width: 501px) 100vw, 501px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 3. SAM segmenting the surrounding area when prompted to segment the chair using bounding box.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>These limitations can slow down image annotation processes.<\/p>\n\n\n\n<p>The solution came with HQ-SAM, which kept the promptable architecture of SAM intact while mitigating the above issues. HQ-SAM can predict high quality and accurate segmentation masks even for challenging images and scenes. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-when-segmenting-intricate-objects.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1324\" height=\"606\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-when-segmenting-intricate-objects.png\" alt=\"HQ-SAM vs SAM when segmenting intricate objects.\" class=\"wp-image-38976\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-when-segmenting-intricate-objects.png 1324w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-when-segmenting-intricate-objects-300x137.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-when-segmenting-intricate-objects-768x352.png 768w\" sizes=\"auto, (max-width: 1324px) 100vw, 1324px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 4. HQ-SAM vs SAM when segmenting intricate objects.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The above figure shows a few comparisons between SAM and HQ-SAM highlighting the latter&#8217;s capabilities. The adaptation to SAM for building HQ-SAM adds less than 0.5% parameters avoiding extra computational requirements.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What are the Architectural Changes Made to the Original SAM to Create HQ-SAM?<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-architecture.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1340\" height=\"549\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-architecture.png\" alt=\"HQ-SAM architecture.\" class=\"wp-image-38980\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-architecture.png 1340w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-architecture-300x123.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-architecture-768x315.png 768w\" sizes=\"auto, (max-width: 1340px) 100vw, 1340px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 5. HQ-SAM architecture.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>We can summarize the architectural changes in three short points:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>High quality token output<\/strong>: Instead of directly modifying or training the SAM decoder, the authors use the concept of an output token (similar to object query in DETR). They add a learnable High-Quality Output Token to the SAM mask decoder. This token is designed to improve mask prediction quality by leveraging both SAM&#8217;s original features and new fused features.<\/li>\n\n\n\n<li><strong>Global-local feature fusion<\/strong>: It is difficult to predict high quality segmentation masks without the knowledge of object boundaries. However, as the image passes through several layers in a model, this boundary information is often lost. HQ-SAM employs information from three layers for this:\n<ul class=\"wp-block-list\">\n<li>Features from the early SAM ViT encoder layers which preserves the local and boundary details of the images.<\/li>\n\n\n\n<li>The feature from the final SAM ViT layer that preserves the global features.<\/li>\n\n\n\n<li>And the feature from SAM&#8217;s mask decoder layer.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li> <strong>Minimal parameter addition:<\/strong> All of this adds only 0.5% of the original SAM model which does not make the new architecture computationally expensive.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>I highly recommend going through the paper sections 3.2.1 and 3.2.2 to get the numerical details of the feature fusion step. The authors lay out the process in a clear manner and mention the layers that are used for fusion.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How was the Dataset Curated to Train HQ-SAM?<\/h2>\n\n\n\n<p>As SAM was already pretrained on SA-1B dataset, further fine-tuning on the same dataset does not yield better results. Instead, the authors curate <strong>HQSeg-44K<\/strong>, a dataset containing more than 44000 images with extremely accurate segmentation masks.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-seg-44k-samples.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1305\" height=\"1093\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-seg-44k-samples.png\" alt=\"Samples from the HQ-Seg-44K dataset.\" class=\"wp-image-38982\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-seg-44k-samples.png 1305w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-seg-44k-samples-300x251.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-seg-44k-samples-768x643.png 768w\" sizes=\"auto, (max-width: 1305px) 100vw, 1305px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 6. Samples from the HQ-Seg-44K dataset.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The HQSeg-44K is a combination of six existing datasets:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>DIS<\/li>\n\n\n\n<li>ThinObject-5K<\/li>\n\n\n\n<li>FSS-1000<\/li>\n\n\n\n<li>ECSSD<\/li>\n\n\n\n<li>MSRA-10K<\/li>\n\n\n\n<li>DUT-OMRON<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>All of the above contain extremely accurate mask annotations which is perfect for training a model like HQ-SAM. The final dataset contains more than 1000 diverse classes of objects.<\/p>\n\n\n\n<p>This dataset overcomes one of the major limitations of SA-1B, the lack of fine-grained segmentation masks for training. Furthermore, the new dataset&#8217;s small size allows for rapid training experimentation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What was the Strategy and Setup for Training HQ-SAM?<\/h2>\n\n\n\n<p>During training, the new HQ-SAM layers are learnable while keeping the original SAM layers frozen. This makes the output token, the three MLP layers, and three convolution operations learnable in the proposed HQ-SAM architecture.<\/p>\n\n\n\n<p>The following are the hyperparameters for the training process:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Learning rate: 0.001<\/li>\n\n\n\n<li>Number of epochs: 12 with the learning rate dropping after 10 epochs<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>As the dataset contains only around 44000 images, training HQ-SAM took only 4 hours on 8 RTX-3090 GPUs with a batch size of 32.<\/p>\n\n\n\n<p>The authors also use large-scale jittering as an augmentation step to make the model learn more fine-grained features in objects of varying scales.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Comparison Against SAM<\/h2>\n\n\n\n<p>In this section, we will cover the various comparisons the authors make between SAM and HQ-SAM. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Training and Inference Compute<\/h3>\n\n\n\n<p>Let&#8217;s start with the simplest comparison, the resources required for training and inference on each model.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-training-inference-compute.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1355\" height=\"321\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-training-inference-compute.png\" alt=\"Training and inference compute comparison between HQ-SAM and SAM.\" class=\"wp-image-38984\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-training-inference-compute.png 1355w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-training-inference-compute-300x71.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-training-inference-compute-768x182.png 768w\" sizes=\"auto, (max-width: 1355px) 100vw, 1355px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 7. Training and inference compute comparison between HQ-SAM and SAM.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The above table shows a stark contrast between the resource requirements of both models. On the one hand, the original SAM model requires 128 A100 GPUs while running at 5 FPS during inference. On the other hand, HQ-SAM while keeping the inference requirements the same, reduces the training compute drastically by having only 5 million training parameters with a batch size of 32. All this while taking only 4 hours to train.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">COCO Segmentation Results<\/h3>\n\n\n\n<p>Following from here, we will cover the general experimental results as mentioned by the authors.<\/p>\n\n\n\n<p>The next table shows the benchmarks between SAM and HQ-SAM on the COCO dataset.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-coco-comparison.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1344\" height=\"504\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-coco-comparison.png\" alt=\"Mean mask IoU and mean border IoU comparison between HQ-SAM and SAM on the COCO dataset.\" class=\"wp-image-38987\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-coco-comparison.png 1344w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-coco-comparison-300x113.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-coco-comparison-768x288.png 768w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 8. Mean mask IoU and mean border IoU comparison between HQ-SAM and SAM on the COCO dataset.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>In the above figure, mIoU is mask IoU and mBIoU is border IoU. It is clear that the new model is ahead of SAM in both metrics. In fact, the authors also go on to show that retraining the entire mask decoder of SAM does not yield the same result as their new HQ-SAM architecture. For the above experiments, a SOTA FocalNet-DINO detector was used as the bounding box prompt generator.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Visual Results<\/h3>\n\n\n\n<p>Following are some of the visual results provided in the paper.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-dis-dataset.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1333\" height=\"660\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-dis-dataset.png\" alt=\"Visual comparison between the two models on the DIS dataset.\" class=\"wp-image-38989\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-dis-dataset.png 1333w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-dis-dataset-300x149.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-vs-sam-dis-dataset-768x380.png 768w\" sizes=\"auto, (max-width: 1333px) 100vw, 1333px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 9. Visual comparison between the two models on the DIS dataset.<\/figcaption><\/figure>\n<\/div>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-bbox-mistake-robustness.png\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1033\" height=\"275\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-bbox-mistake-robustness.png\" alt=\"Samples showing how HQ-SAM is robust to small errors in the bounding box prompts.\" class=\"wp-image-38992\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-bbox-mistake-robustness.png 1033w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-bbox-mistake-robustness-300x80.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/hq-sam-bbox-mistake-robustness-768x204.png 768w\" sizes=\"auto, (max-width: 1033px) 100vw, 1033px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 10. Samples showing how HQ-SAM is robust to small errors in the bounding box prompts.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>In Figure 10, we can see that the HQ-SAM model is robust to small mistakes in the bounding box prompts when the box does not cover the entire object.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary and Conclusion<\/h2>\n\n\n\n<p>In this article, we covered the HQ-SAM architecture in brief. We started with the need for HQ-SAM, its architecture, training, and experimental results. In conclusion, the final HQ-SAM model gives superior segmentation masks while keeping the inference requirements the same as SAM. I hope that this article was worth your time.<\/p>\n\n\n\n<p>If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.<\/p>\n\n\n\n<p>You can contact me using the <strong><a aria-label=\"Contact (opens in a new tab)\" href=\"https:\/\/debuggercafe.com\/contact-us\/\" target=\"_blank\" rel=\"noreferrer noopener\">Contact<\/a><\/strong> section. You can also find me on <strong><a aria-label=\"LinkedIn (opens in a new tab)\" href=\"https:\/\/www.linkedin.com\/in\/sovit-rath\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a><\/strong>, and <strong><a href=\"https:\/\/x.com\/SovitRath5\" target=\"_blank\" rel=\"noreferrer noopener\">X<\/a><\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><a href=\"https:\/\/arxiv.org\/abs\/2306.01567\" target=\"_blank\" rel=\"noreferrer noopener\">Segment Anything in High Quality<\/a><\/strong><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>In this article, we explore HQ-SAM, a modified version of SAM that overcomes some of the limitations when trying to segment small and intricate objects.<\/p>\n","protected":false},"author":1,"featured_media":38994,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[76,129,500],"tags":[1091,1087,1089,1088,1090],"class_list":["post-38932","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-vision","category-image-segmentation","category-instance-segmentation","tag-hq-sam-for-high-quality-segmentation","tag-hq-sam-paper-walkthrough","tag-hq-sam-results-and-benchmarks","tag-hq-sam-vs-sam","tag-promptable-hq-sam"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Exploring HQ-SAM<\/title>\n<meta name=\"description\" content=\"HQ-SAM for segmenting small and intricate objects in high quality by adding a high quality output token to the original SAM architecture.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/debuggercafe.com\/exploring-hq-sam\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Exploring HQ-SAM\" \/>\n<meta property=\"og:description\" content=\"HQ-SAM for segmenting small and intricate objects in high quality by adding a high quality output token to the original SAM architecture.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/debuggercafe.com\/exploring-hq-sam\/\" \/>\n<meta property=\"og:site_name\" content=\"DebuggerCafe\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/profile.php?id=100013731104496\" \/>\n<meta property=\"article:published_time\" content=\"2024-12-16T00:30:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"563\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Sovit Ranjan Rath\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:site\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sovit Ranjan Rath\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/\"},\"author\":{\"name\":\"Sovit Ranjan Rath\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"headline\":\"Exploring HQ-SAM\",\"datePublished\":\"2024-12-16T00:30:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/\"},\"wordCount\":1282,\"commentCount\":0,\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png\",\"keywords\":[\"HQ-SAM for High Quality Segmentation\",\"HQ-SAM Paper Walkthrough\",\"HQ-SAM Results and Benchmarks\",\"HQ-SAM vs SAM\",\"Promptable HQ-SAM\"],\"articleSection\":[\"Computer Vision\",\"Image Segmentation\",\"Instance Segmentation\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/debuggercafe.com\/exploring-hq-sam\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/\",\"url\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/\",\"name\":\"Exploring HQ-SAM\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png\",\"datePublished\":\"2024-12-16T00:30:00+00:00\",\"author\":{\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"description\":\"HQ-SAM for segmenting small and intricate objects in high quality by adding a high quality output token to the original SAM architecture.\",\"breadcrumb\":{\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/debuggercafe.com\/exploring-hq-sam\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/#primaryimage\",\"url\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png\",\"contentUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png\",\"width\":1000,\"height\":563,\"caption\":\"Exploring HQ-SAM\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/debuggercafe.com\/exploring-hq-sam\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/debuggercafe.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Exploring HQ-SAM\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/debuggercafe.com\/#website\",\"url\":\"https:\/\/debuggercafe.com\/\",\"name\":\"DebuggerCafe\",\"description\":\"Machine Learning and Deep Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/debuggercafe.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\",\"name\":\"Sovit Ranjan Rath\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"caption\":\"Sovit Ranjan Rath\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Exploring HQ-SAM","description":"HQ-SAM for segmenting small and intricate objects in high quality by adding a high quality output token to the original SAM architecture.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/debuggercafe.com\/exploring-hq-sam\/","og_locale":"en_US","og_type":"article","og_title":"Exploring HQ-SAM","og_description":"HQ-SAM for segmenting small and intricate objects in high quality by adding a high quality output token to the original SAM architecture.","og_url":"https:\/\/debuggercafe.com\/exploring-hq-sam\/","og_site_name":"DebuggerCafe","article_publisher":"https:\/\/www.facebook.com\/profile.php?id=100013731104496","article_published_time":"2024-12-16T00:30:00+00:00","og_image":[{"width":1000,"height":563,"url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png","type":"image\/png"}],"author":"Sovit Ranjan Rath","twitter_card":"summary_large_image","twitter_creator":"@SovitRath5","twitter_site":"@SovitRath5","twitter_misc":{"Written by":"Sovit Ranjan Rath","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/#article","isPartOf":{"@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/"},"author":{"name":"Sovit Ranjan Rath","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"headline":"Exploring HQ-SAM","datePublished":"2024-12-16T00:30:00+00:00","mainEntityOfPage":{"@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/"},"wordCount":1282,"commentCount":0,"image":{"@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png","keywords":["HQ-SAM for High Quality Segmentation","HQ-SAM Paper Walkthrough","HQ-SAM Results and Benchmarks","HQ-SAM vs SAM","Promptable HQ-SAM"],"articleSection":["Computer Vision","Image Segmentation","Instance Segmentation"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/debuggercafe.com\/exploring-hq-sam\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/","url":"https:\/\/debuggercafe.com\/exploring-hq-sam\/","name":"Exploring HQ-SAM","isPartOf":{"@id":"https:\/\/debuggercafe.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/#primaryimage"},"image":{"@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png","datePublished":"2024-12-16T00:30:00+00:00","author":{"@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"description":"HQ-SAM for segmenting small and intricate objects in high quality by adding a high quality output token to the original SAM architecture.","breadcrumb":{"@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/debuggercafe.com\/exploring-hq-sam\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/#primaryimage","url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png","contentUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2024\/10\/Exploring-HQ-SAM-e1730250660701.png","width":1000,"height":563,"caption":"Exploring HQ-SAM"},{"@type":"BreadcrumbList","@id":"https:\/\/debuggercafe.com\/exploring-hq-sam\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/debuggercafe.com\/"},{"@type":"ListItem","position":2,"name":"Exploring HQ-SAM"}]},{"@type":"WebSite","@id":"https:\/\/debuggercafe.com\/#website","url":"https:\/\/debuggercafe.com\/","name":"DebuggerCafe","description":"Machine Learning and Deep Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/debuggercafe.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752","name":"Sovit Ranjan Rath","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","caption":"Sovit Ranjan Rath"}}]}},"_links":{"self":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/38932","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/comments?post=38932"}],"version-history":[{"count":63,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/38932\/revisions"}],"predecessor-version":[{"id":39008,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/38932\/revisions\/39008"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media\/38994"}],"wp:attachment":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media?parent=38932"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/categories?post=38932"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/tags?post=38932"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}