Halide Compression

Halide Compression Hi, we're Halide Compression. Zola 2026-02-04T00:00:00+00:00 https://halide.cx/atom.xml Same Image, Different Score? 2026-02-04T00:00:00+00:00 2026-02-04T00:00:00+00:00 Halide Team https://halide.cx/blog/chroma-handling/ <div class="image-container"> <picture> <img src="https://halide.cx/img/rocks-hdr.avif" width="1536" height="864" alt="Rocks" /> </picture> </div> <p>In developing our proprietary encoder <a href="/iris/">Iris</a> for WebP, our aim with public and private testing is to properly demonstrate the value of the encoder compared to alternatives. Cheating benchmarks, overfitting for metrics, or unfairly testing other encoders does not help sell our product, which is meant to provide quality-of-experience improvements for human users above all else.</p> <p>Investigating WebP's decoding performance led us to begin evaluating different means of presenting the decoded images to metrics. Even within the <code>dwebp</code> reference decoder, there are a number of different options that affect how images are decoded.</p> <p>Beyond WebP, we test competing open-source encoders as well. These should be tested in such a way that they represent their best performance in real-world production scenarios, where client-side decoder options are still highly relevant.</p> <p>Our findings here are not final, but this blog post aims to get the ball rolling for evaluating decoder differences and means for handling chroma in post-processing.</p> <h2 id="chroma-subsampling">Chroma Subsampling</h2> <p>Chroma subsampling is a useful compression technique to improve compression efficiency at low to medium-high fidelity by taking advantage of the human visual system's higher sensitivity to luma-only detail. The YCbCr color space utilizes principles of <a rel="external" href="https://en.wikipedia.org/wiki/Opponent_process">opponent color theory</a> and separates luma from chroma, so many encoder implementations still find value in using this colorspace to halve the resolution of the chroma planes (Cb, Cr) and concentrate bits into the luma plane (Y). This is the theoretical basis behind 4:2:0 chroma subsampling: 4 luma pixels for every 1 chroma pixel in each plane. At higher fidelity, maintaining full-resolution chroma planes is often more valuable, and thus 4:4:4 chroma subsampling (all planes at full-resolution) can be much better.</p> <p>Arguably, one of WebP's most difficult limitations is mandatory 4:2:0 chroma subsampling. This limitation isn't present in JPEG or AVIF, which both support YCbCr in 4:4:4 alongside 4:2:0. There has been some <a rel="external" href="https://skal65535.github.io/yuv/">very cool work</a> by <a rel="external" href="https://skal65535.github.io">Pascal Massimino</a> on this limitation in libwebp, but properly handling chroma subsampling is a codec-agnostic issue due to Web quality ranges often favoring 4:2:0. The better utilized the lower-resolution chroma planes are, the better the output images will be – this is true for both encoding and decoding.</p> <h2 id="encoding">Encoding</h2> <p>The aforementioned work by Pascal focuses on taking a full-color input and optimally downsampling the chroma planes. This doesn't involve the WebP codec whatsoever; it is true that <code>dwebp</code>'s "fancy" chroma upsampling at decode time may pair well with "Sharp YUV" encodes, but "Sharp YUV" is not a WebP encoder feature – it is a preprocessing feature that libwebp supports, and may be used by any encoder in theory. This is a matter of opinion, but we're not partial to considering gains through "Sharp YUV" preprocessing as <em>encoder</em> efficiency, as they may apply to any encoder for any format supporting 4:2:0 chroma subsampling. Measuring pure encoder efficiency should be done by controlling the color conversion process between encoders, as is done via <a rel="external" href="https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/test/benchmarking/README.md">SVT-AV1's open-source benchmarking tools</a>.</p> <p>There's lots to discuss with regards to encoding here, but for the sake of this post we're going to focus primarily on decoding.</p> <h2 id="decoding">Decoding</h2> <p>Compression researcher <a rel="external" href="http://sneyers.info">Jon Sneyers</a> once said that "The video codec philosophy has always been 'we just compress matrices of numbers, how to interpret them is not our problem'," which can be interpreted as a call for image compression researchers to do better. While it is true that pre- and post-processing are not coupled to encoder or decoder efficiency, they are still relevant to overall <em>compression efficiency</em>. With YCbCr 4:2:0 inputs, we must decide how to represent our one chroma sample per four luma samples via post-processing after decoding. So, what should we do with the decoder's matrices of numbers?</p> <p>For this blog post, we're going to test a couple of different implementations with some open-source codecs. We'll invariably end up investigating some level of decoder performance here as well, particularly with JPEG.</p> <h2 id="methodology">Methodology</h2> <p>For encoders, we are testing Google's <a rel="external" href="https://github.com/google/jpegli">jpegli</a> for JPEG, <a rel="external" href="https://libwebp.com/">libwebp</a> and Iris-WebP for WebP, and <a rel="external" href="https://gitlab.com/AOMediaCodec/SVT-AV1/">SVT-AV1</a> for AVIF. For jpegli, color conversion is done internally. For libwebp, we test default color conversion with FFmpeg, internal color conversion, and internal "Sharp YUV" color conversion. Color conversion is done with FFmpeg for SVT-AV1 and Iris, as the differences demonstrated by libwebp's results should get the point across. We have our own input chroma processing algorithm for Iris, but we don't think this is the right place to discuss its impact.</p> <p>To convert the encoded outputs back to pixels, we are looking at FFmpeg, jpegli's decoder, ImageMagick, <code>dwebp</code> from libwebp, and <a rel="external" href="https://www.videolan.org/projects/dav1d.html">dav1d</a>. Note that WebP and AVIF are both decoded by libwebp and dav1d respectively with every option here; what wraps each decoder is what is different. Alongside default FFmpeg, we're testing a custom filter for chroma: <code>"scale=flags=lanczos+accurate_rnd+full_chroma_int:param0=5,format=rgb24"</code>. This string specifies we're using a sharp 5-tap Lanczos scaling algorithm with mathematically accurate rounding and high-quality chroma interpolation, which may result in higher fidelity outputs post-decode.</p> <p>For metrics, we are looking at <a rel="external" href="https://github.com/gianni-rosato/fssimu2">fssimu2</a>, <a rel="external" href="https://github.com/libjxl/libjxl/blob/main/tools/butteraugli_main.cc">Butteraugli from libjxl</a> at 3-pnorm and an intensity target of 203 nits, and our own <a rel="external" href="https://github.com/halidecx/fcvvdp">fcvvdp</a> with the default "fhd" display. These are all perceptual metrics aimed at producing results relevant to the human visual system.</p> <p>Testing is done on the <a rel="external" href="https://github.com/gianni-rosato/gb82-image-set">gb82 image set</a>, a diverse photographic image dataset of 25 images all at 576x576. The script used for testing can be found in the open source <a rel="external" href="https://github.com/gianni-rosato/decbench">decbench repo</a>.</p> <h2 id="results">Results</h2> <p>It is VERY important to note that <em>this is not an encoder efficiency test</em>, and that size & quality results between encoders are not controlled in any way. The only relevant results are from the decoder & post-processor implementations, which all come from the same inputs and therefore represent some kind of efficiency improvement if the scores are higher.</p> <p>Please note that the harmonic mean is not super useful for Butteraugli; there isn't much utility in biasing toward lower Butteraugli scores, as they are better. <code>dwebp_nofancy</code> disables the libwebp decoder's internal "fancy" chroma upsampling.</p> <h3 id="jpegli">jpegli</h3> <p><code>./dec.py jpegli ~/Pictures/gb82-image-set/png/*.png</code></p> <p><img src="/img/chroma_handling/jpegli_fssimu2.svg" alt="jpegli_fssimu2" /></p> <p><img src="/img/chroma_handling/jpegli_butter.svg" alt="jpegli_butter" /></p> <p><img src="/img/chroma_handling/jpegli_fcvvdp.svg" alt="jpegli_fcvvdp" /></p> <h3 id="iris-webp-ffmpeg-color-conversion">Iris-WebP (FFmpeg color conversion)</h3> <p><img src="/img/chroma_handling/iris_webp_fssimu2.svg" alt="iris_webp_fssimu2" /></p> <p><img src="/img/chroma_handling/iris_webp_butteraugli.svg" alt="iris_webp_butter" /></p> <p><img src="/img/chroma_handling/iris_webp_fcvvdp.svg" alt="iris_webp_fcvvdp" /></p> <h3 id="libwebp-ffmpeg-color-conversion">libwebp (FFmpeg color conversion)</h3> <p><code>./dec.py libwebp ~/Pictures/gb82-image-set/png/*.png</code></p> <p><img src="/img/chroma_handling/libwebp_fssimu2.svg" alt="libwebp_fssimu2" /></p> <p><img src="/img/chroma_handling/libwebp_butter.svg" alt="libwebp_butter" /></p> <p><img src="/img/chroma_handling/libwebp_fcvvdp.svg" alt="libwebp_fcvvdp" /></p> <h3 id="libwebp-sharp-yuv-color-conversion">libwebp (Sharp YUV color conversion)</h3> <p><code>./dec.py libwebp_sharpyuv ~/Pictures/gb82-image-set/png/*.png</code></p> <p><img src="/img/chroma_handling/libwebp_sharpyuv_fssimu2.svg" alt="libwebp_sharpyuv_fssimu2" /></p> <p><img src="/img/chroma_handling/libwebp_sharpyuv_butter.svg" alt="libwebp_sharpyuv_butter" /></p> <p><img src="/img/chroma_handling/libwebp_sharpyuv_fcvvdp.svg" alt="libwebp_sharpyuv_fcvvdp" /></p> <h3 id="libwebp-internal-color-conversion">libwebp (internal color conversion)</h3> <p><code>./dec.py libwebp_default ~/Pictures/gb82-image-set/png/*.png</code></p> <p><img src="/img/chroma_handling/libwebp_internal_fssimu2.svg" alt="libwebp_internal_fssimu2" /></p> <p><img src="/img/chroma_handling/libwebp_internal_butter.svg" alt="libwebp_internal_butter" /></p> <p><img src="/img/chroma_handling/libwebp_internal_fcvvdp.svg" alt="libwebp_internal_fcvvdp" /></p> <h3 id="svt-av1">SVT-AV1</h3> <p><code>./dec.py svtav1 ~/Pictures/gb82-image-set/png/*.png</code></p> <p><img src="/img/chroma_handling/svtav1_fssimu2.svg" alt="svtav1_fssimu2" /></p> <p><em>avifdec's PNG outputs crashed the <code>butteraugli_main</code> tool</em></p> <p><img src="/img/chroma_handling/svtav1_butter.svg" alt="svtav1_butter" /></p> <p><img src="/img/chroma_handling/svtav1_fcvvdp.svg" alt="svtav1_fcvvdp" /></p> <h2 id="perceptual-results">Perceptual Results</h2> <p>Click the buttons to switch between decoding/post-processing options on this challenging image.</p> <section class="image-switcher"> <div id="image-switcher-chroma-decoder" class="image-container"></div> </section> <script src="/js/image_switcher.js"></script> <script> document.addEventListener("DOMContentLoaded", function () { const container = document.getElementById("image-switcher-chroma-decoder"); if (!container) return; // Clear any fallback content container.innerHTML = ""; const images = ["/img/chroma_handling/cmp/original.png","/img/chroma_handling/cmp/jpegli.jpg","/img/chroma_handling/cmp/ffmpeg_filtered.png","/img/chroma_handling/cmp/djpegli.png","/img/chroma_handling/cmp/magick.png","/img/chroma_handling/cmp/ffmpeg.png"]; const subtitles = ["Source\nImage","cjpegli --chroma_subsampling 420 -d 1.0 original.png jpegli.jpg","ffmpeg -y -i jpegli.jpg -vf\nscale=flags=lanczos+accurate_rnd+full_chroma_int:param0=5,format=rgb24 -f image2\n-update 1 -frames:v 1 ffmpeg_filtered.png","djpegli jpegli.jpg djpegli.png","magick jpegli.jpg magick.png","ffmpeg -y -i jpegli.jpg -pix_fmt rgb24 -f\nimage2 -update 1 -frames:v 1 ffmpeg.png"]; const labels = ["Source","Your Browser","FFmpeg (filtered)","djpegli","magick","FFmpeg"]; try { // Ensure the first image has a reasonable alt; ImageSwitcher currently hardcodes alt, // but we set it here as a data attribute in case the JS is updated to use it later. container.setAttribute("data-alt", "Decoder comparison"); new ImageSwitcher("image-switcher-chroma-decoder", images, subtitles, labels); } catch (error) { console.error("Failed to initialize image switcher:", error); container.innerHTML = "<p>Failed to load image comparison tool.</p>"; } }); </script><h2 id="conclusion">Conclusion</h2> <p>The Butteraugli results are shocking, and likely merit further investigation. Aside from that, the fact that a >2% fssimu2 efficiency improvement is achievable compared to the baseline in almost every test is valuable; compression researchers fight very hard for 2%, and we get it for free here.</p> <p><code>ffmpeg_filtered</code> shows very good results across the board. There is potentially room for further investigation here through using other scaling algorithms.</p> <p>The noteworthy outliers are <code>djpegli</code> winning according to fssimu2, and <code>dwebp</code> winning when "Sharp YUV" color conversion was used with libwebp.</p> <p>JPEG decoding is a complex topic we are not going to explore in detail here, but it boils down to the fact that the JPEG spec has a of ambiguity regarding the way images are encoded and decoded.</p> <p>For libwebp, Pascal's page says: "We utilise the upsampling used at decoding time (dubbed 'fancy upsampling' in libjpeg e.g.) to our advantage," with regards to Sharp YUV. It may be the case that with minor tweaks, Sharp YUV may be made to work better with other chroma scaling methods like we see in <code>ffmpeg_filtered</code>. It also isn't conclusive that "fancy upsampling" as it is implemented in libwebp's decoder is actually a net positive; with fancy upsampling disabled, <code>dwebp_nofancy</code> ekes out some wins over <code>dwebp</code> when Sharp YUV isn't used. Sharp YUV is also disabled by default in libwebp due to its computational complexity (Pascal: "Sharp-YUV locally optimizes the conversion loss, so is more expensive. That's why <code>-sharp_yuv</code> is not the default option in cwebp!"), so should <code>dwebp</code> be best prepared for the most popular encode use cases, or those that achieve the best performance? Sharp YUV isn't universally perceptually beneficial either, so the problem becomes harder to solve with that in mind.</p> <p>For our research direction stated at the beginning of this post, we see promising results that tell us using the default tooling for other codecs might be holding them back. <code>ffmpeg_filtered</code> wins in many cases, so at least for SVT-AV1 and jpegli, it seems like a valuable option to consider. A future direction may be to explore the computational complexity of different decoders and decode options, or to do more subjective testing and go beyond metrics.</p> <p>There's always more to explore with multimedia compression, and we've only scratched the surface of pre- and post-processing for 4:2:0 YCbCr here. Halide Compression is built on frontier compression expertise, so if you believe we could be valuable to you, we offer consulting services. Feel free to contact us at our email below if you have any questions about decoder optimization for your pipeline, deploying WebP at scale, or using Iris-WebP to maximize the efficiency of your image delivery solution. Thanks for reading!</p> <div class="call-to-action"> <a href="mailto:mail@halide.cx" class="cta-button" > Email Us </a> </div> Introducing fcvvdp 2025-12-28T00:00:00+00:00 2025-12-28T00:00:00+00:00 Halide Team https://halide.cx/blog/fcvvdp/ <div class="image-container"> <picture> <img src="https://halide.cx/img/slate.avif" width="1536" height="864" alt="Slate" /> </picture> </div> <h2 id="why">Why?</h2> <p>The aphorism "all models are wrong, but some are useful" is commonly attributed to George E. P. Box, a British statistician. The concept is especially relevant in multimedia compression where we have lots of models to choose from for evaluating lossy image and video compression.</p> <p>Lots of metrics exist and are easily accessible; we are intimately familiar with a wide breadth of metrics and their various pros and cons for benchmarking image compression algorithms, but there will always be blind spots regardless of how many we test. When we found ColorVideoVDP (CVVDP), we discovered it was able to catch some edge cases that other powerful perceptual metrics (like SSIMULACRA2) weren't able to; despite the fact that it has its own edge cases, it immediately became interesting to us because of this.</p> <p>The only issue we faced was that the <a rel="external" href="https://github.com/gfxdisp/colorvideovdp">reference Python implementation</a> was not fast enough for our use case, increasing our Iris benchmark script's runtime dramatically. This wasn't an acceptable trade-off for our productivity, so we decided to build <a rel="external" href="https://github.com/halidecx/fcvvdp">fcvvdp</a> as an open-source C implementation of CVVDP for the benefit of everyone who may have faced the same issues we did.</p> <h2 id="metrics">Metrics</h2> <p>The strongest full-reference perceptual fidelity metrics we have access to are SSIMULACRA2, Butteraugli, and (to some degree) MS-SSIM. PSNR-HVS provides some level of perceptual utility as well. SSIM and eSSIM are occasionally useful for investigating a certain class of finer artifacts; the same can be said about PSNR to some degree. VMAF isn't particularly useful for images in our experience. We don't outright shun or ignore any metrics, but our preference is to build technology that is valuable for the end-user experience (so, the human eye). We've established CVVDP is relevant to the last point, so what additional criteria must we meet to use an implementation?</p> <p>The Python implementation of CVVDP is compelling research-grade software, and a <a rel="external" href="https://github.com/Line-fr/Vship">fully GPU-accelerated implementation</a> exists for video. While performant GPU acceleration is compelling for benchmarking videos, images have different needs:</p> <ul> <li>GPU initialization time causes slowdowns</li> <li>Threading isn't important, because each encode/metric worker gets its own thread in the benchmark script</li> <li>Batch processing on the GPU fixes the first issue, but requires re-architecting parts of our benchmark script for one metric</li> </ul> <p>So, fcvvdp should be able to slot into existing workflows as easily as SSIMULACRA2 or Butteraugli might relative to a legacy image benchmarking suite.</p> <h2 id="implementation">Implementation</h2> <p>fcvvdp is based on the GPU-accelerated implementation mentioned earlier, and is written in C. It is, predictably, strongest when it comes to images. The reference implementation takes (on average) 1.69 seconds and 928 MB of RAM to score one 576x576 pairwise image comparison. fcvvdp takes (on average) 85.5ms, and uses 61.5 MB of RAM. Scores are within a reasonable margin of perceptual error.</p> <p>On a 360p video, fcvvdp is ~18% faster in terms of wall clock time. The benefits described above generalize in terms of user time and RAM usage, but wall clock time isn't much better on videos due to the fact that fcvvdp doesn't feature any sort of threading. This is the implementation's biggest limitation; while it is still faster than the reference implementation (which does feature threading) by a bit, threading would allow the relative improvement we see with images to generalize to video.</p> <p>If you're interested in learning about how fcvvdp works, see our <a rel="external" href="https://github.com/halidecx/fcvvdp/blob/main/doc/cvvdp.md">implementation docs</a>.</p> <h2 id="conclusion">Conclusion</h2> <p>Our code is public under the <a rel="external" href="https://github.com/halidecx/fcvvdp?tab=Apache-2.0-1-ov-file#readme">Apache 2.0 license</a>. We are always proud of our capability to give back to the FOSS ecosystem when we can. While Iris is a closed source product, we hope to use Iris's impact and utility as a means of subsidizing work on open source when it helps support our mission. In this case, fcvvdp was the perfect excuse to do something great for Halide Compression while giving something valuable back to the field. We hope you enjoy fcvvdp!</p> <div class="call-to-action"> <a href="mailto:mail@halide.cx" class="cta-button" > Email Us </a> </div> Measuring Image Encoder Consistency 2025-09-14T00:00:00+00:00 2025-09-14T00:00:00+00:00 Halide Team https://halide.cx/blog/consistency/ <div class="image-container"> <picture> <img src="https://halide.cx/img/streak.avif" width="1536" height="864" alt="Light Streak" /> </picture> </div> <h2 id="what-is-consistency">What Is Consistency?</h2> <p>Consistency could mean a number of things in the context of image compression, but the specific definition of consistency used in this blog post measures how closely an image encoder's user-configurable quality index matches a perceptual quality index.</p> <p>Here's an example: your encoder has a quality slider from 1 to 100. Ideally, if you pass a quality value of 80, this should target some internal definition of what "quality 80" means with every image it encodes. At quality 80, if some images look incredible and some look clearly awful, there is a consistency issue. If images all end up around the same quality visually, that is the mark of a consistent encoder.</p> <h2 id="why-is-consistency-important">Why Is Consistency Important?</h2> <p>It is very common for image compression workflows to include a target quality loop of some kind, where a metric is utilized alongside an image encoder to provide feedback about how good the image looks. If it doesn't look good enough, re-encode; similarly, if it is too high-quality and bits could be saved by aiming lower, re-encode. Considering image encoders and powerful metrics are quite fast, these workflows are easy to configure and often run quickly enough.</p> <p>In speed- or resource-constrained scenarios, it may not be wise to use a target quality loop. If you do, you may be limited to faster but far less meaningful metrics; for example, targeting <a rel="external" href="https://wiki.x266.mov/docs/metrics/PSNR">PSNR</a> is not useful for delivering images at a consistent quality baseline because our eyes don't agree with PSNR's definition of quality very often. Two separate encodes of different sources that have the same PSNR score often look very different in terms of visual quality, which brings us back to where we started. In these scenarios, our definition of consistency becomes relevant; an encoder's ability to reliably encode images close to a given quality becomes a make-or-break consideration for this kind of workflow. Applications that process vast quantities of user-generated content can be subject to these constraints.</p> <p>A consistent encoder additionally provides a boon to user experience. Encoders like <a rel="external" href="https://github.com/libjxl/libjxl">libjxl</a>'s encoder (cjxl for JPEG XL images) and the <a rel="external" href="https://github.com/google/jpegli">jpegli</a> JPEG encoder have two user-accessible quality indexes; they provide a Q scale from 0 (or 1) through 100 like most image encoders, but they also provide a "distance" scale. The benefit of this is that quality scales measured in Q are internally defined and often arbitrary – it isn't clear how good "quality 80" will actually be externally, and the visual correlation for most encoder quality scales is usually sparsely documented. On the other hand, "distance" is not arbitrary.</p> <p>A "distance" parameter allows users to directly target a tangible <em>visual distance</em> value; roughly speaking, this indicates how far away a user needs to be from their screen to see artifacts. A value of 1.0 is usually considered visually lossless, and JPEG XL and jpegli are inspired by the Butteraugli metric in how this is defined. The benefits to a user are clear; you can set-and-forget your encoder to a distance of 1.0, and your images will always be the smallest possible size to achieve visually lossless fidelity given your encoder is perfectly consistent.</p> <p>Our encoder is called <a rel="external" href="https://halide.cx/iris/">Iris-WebP</a>, and features similar functionality to libjxl and libjpegli through its own "distance" parameter for the reasons stated above. But everything we just described is useless if the distance value isn't consistently achievable; so, how do we measure consistency?</p> <h2 id="measuring-consistency">Measuring Consistency</h2> <p>This blog post's title promises that we will measure this, so let's take a look at some methodology.</p> <p>At a high level, here is how we measure encoder consistency holistically:</p> <ul> <li>We sweep an encoder’s user-facing quality index Q across a chosen range</li> <li>For each image and each Q, we encode once, then compute one or more perceptual metrics against the original</li> <li>For each Q, we aggregate the metric values across all images and write a CSV with the mean and standard deviation</li> </ul> <p>Here, the per-Q standard deviation is the important value. Lower standard deviations per Q mean the encoder achieves more uniform visual quality across diverse inputs at that Q.</p> <p>Internally, this testing is done with a number of different metrics; for the purposes of this blog post, we'll report all of our numbers with <a rel="external" href="https://github.com/cloudinary/ssimulacra2">SSIMULACRA2</a> because it is the most perceptually correlated open-source metric at the time of writing.</p> <p>The <a rel="external" href="https://aomedia.googlesource.com/aom/">libaom</a> AV1 encoder is configured at speed 7, using an improved tune iq introduced in v3.13.0 (if you'd like to learn more about some of the ways AVIF has gotten better in the past year, read <a rel="external" href="https://halide.cx/blog/improving-avif-in-open-source">our blog post on open source AVIF developments</a>.) We also tested libjpeg-turbo, libjxl, libjpegli, <a rel="external" href="https://chromium.googlesource.com/webm/libwebp/">libwebp</a>, and Iris-WebP. We configured libaom to encode 10-bit 4:4:4 images and libwebp to run at its slowest encoding preset (method 6), and everything else was left to defaults for the other encoders. The image dataset we're testing on is <a rel="external" href="https://github.com/WyohKnott/image-formats-comparison/tree/gh-pages/comparisonfiles/subset1/Original">Daala's subset1</a>, which should give us a good baseline for medium-resolution photographic content.</p> <p>Our results will focus on:</p> <ul> <li>The average of standard deviations for Q levels between SSIMULACRA2 30 and 80</li> <li>The movement of std dev per Q level the range between SSIMULACRA2 30 and 80</li> </ul> <p>The 30 to 80 range was chosen due to its relevance for general multimedia delivery use cases.</p> <h2 id="results">Results</h2> <p><img src="/img/avg_stddev_ssimu2.svg" alt="Average standard deviation across Q levels" /></p> <p>The above graph shows us consistency numbers averaged across Q levels that resulted in average qualities between 30 and 80 SSIMULACRA2 on the subset1 dataset we mentioned earlier. And our winner is libjpeg-turbo! On the quality front, libjpeg-turbo is not remotely competitive with these encoders, but it scores well for consistency – we'll think more about this in the next section.</p> <p>Next, we have standard deviation over our range:</p> <p><img src="/img/stddev_graphed_ssimu2.svg" alt="Standard deviation graphed" /></p> <p>This paints an interesting picture; we see that libaom is actually the best at SSIMULACRA2 80, but performance drops off rapidly below SSIMULACRA2 ~70. Iris is a well-rounded strong performer, with concessions to libjpeg-turbo below SSIMULACRA2 ~47 (low fidelity). Curiously, while libjpegli does well, libjxl is not all that consistent overall.</p> <h2 id="conclusions">Conclusions</h2> <p>Iris-WebP's strong consistency performance coupled with its known speed and efficiency make it a strong performer, but consistency wins alone are not worth celebrating; they can only support an already fast and efficient encoder.</p> <p>In a target quality loop with an inefficient encoder, bits are wasted by default even if a particular target is readily hit; even though you are sacrificing predictability, a less consistent encoder that is more efficient is a more desirable choice because you can just have your target quality workflow shift potential inconsistency into overshooting. Overshot results might be larger than necessary, but they may still be smaller than worse looking outputs from a less efficient encoder that is still on target.</p> <p>Similarly, a consistent encoder that isn't competitively fast is not worthwhile either. If at the same speed target, another encoder is more efficient, that encoder is considered faster and you're leaving compression efficiency on the table.</p> <p>At Halide Compression, we believe image encoders that value efficiency, speed, and consistency are both desirable and possible. While it is true that highly efficient encoders may suffer consistency issues due to their spiky but still generally incredible performance, we believe Iris has been able to successfully mitigate potential consistency issues without sacrificing efficiency or speed.</p> <div class="call-to-action"> <a href="mailto:mail@halide.cx" class="cta-button" > Email Us </a> </div> An Interview With Julio Barba 2025-08-29T00:00:00+00:00 2025-08-29T00:00:00+00:00 Halide Team https://halide.cx/blog/julio-barba-interview/ <div class="image-container"> <picture> <img src="https://halide.cx/img/ocean.avif" width="1536" height="864" alt="Ocean" /> </picture> </div> <h2 id="who-are-you">Who are you?</h2> <p>I'm Julio Barba, a developer who works on video and image compression technology, focusing on the AV1 format and its successor, AV2. I started in backend development but pivoted to multimedia compression in 2023 by contributing to popular open-source AV1 projects like <a rel="external" href="https://aomedia.googlesource.com/aom/">libaom</a> and <a rel="external" href="https://gitlab.com/AOMediaCodec/SVT-AV1/">SVT-AV1</a>. I'm now also a contributor to AV2, the next-generation video standard from the <a rel="external" href="https://aomedia.org">Alliance for Open Media (AOMedia)</a>.</p> <h2 id="how-did-you-get-involved-in-multimedia-compression">How did you get involved in multimedia compression?</h2> <p>At 10 years old, I discovered MP3s and Winamp and was amazed that you could shrink CD music by 10x with very little quality loss. That sparked my curiosity in compression.</p> <p>Soon after, I learned about the royalty-free Ogg Vorbis audio format, which was even better than MP3. That led me down a rabbit hole of royalty-free video formats like Theora, VP9, and eventually AV1. In 2023, I started contributing to AV1 myself, focusing on improving its video and image quality.</p> <p>In 2024, I teamed up with Gianni Rosato and two friends to create <a rel="external" href="https://svt-av1-psy.com">SVT-AV1-PSY</a>, a version of the SVT-AV1 encoder focused on making videos look as good as possible to the human eye. We've since contributed many of our improvements back to the main SVT-AV1 project, making it more flexible, higher quality, and easier to use.</p> <h2 id="what-is-your-role-at-google">What is your role at Google?</h2> <p>I work with Google's image compression team on a feature called tune IQ, a brand-new mode in the libaom encoder designed for still images. It improves quality and consistency by intelligently directing more data to the parts of an image our eyes notice most, which means you get smaller files for the same visual quality. Tune IQ also includes a new detector that dramatically improves compression for content like screenshots, simple graphics, and animations.</p> <p>Today, tune IQ is already being used by customers like <em><a rel="external" href="https://www.theguardian.com/us">The Guardian</a></em>, and we've received great feedback! We're now working to make it the default setting for creating AVIF images and help it become widely adopted.</p> <h2 id="how-did-you-become-part-of-the-av2-development-effort">How did you become part of the AV2 development effort?</h2> <p>My work with Google's image team was a natural entry point to contributing to AV2's image compression capabilities. Since Google is a founding member of AOMedia, it was easy to get involved. That said, the project is open source, so anyone can contribute, not just members!</p> <h2 id="we-have-webp-from-vp8-heic-from-hevc-and-avif-from-av1-will-there-be-an-image-format-based-on-av2">We have WebP (from VP8), HEIC (from HEVC), and AVIF (from AV1). Will there be an image format based on AV2?</h2> <p>Given AV2's compression gains over AV1, I strongly believe the industry will want an image format based on it. There's already work being done to add support for AVM (AV2's reference software) into libavif, which is a popular library for handling AVIF images.</p> <h2 id="what-av2-features-are-you-most-excited-about-for-still-image-compression">What AV2 features are you most excited about for still image compression?</h2> <p>I'm very excited about features like user-defined Quantization Matrices (QMs). This unlocks some powerful applications, most notably the ability to convert JPEG images into the AV2 format without the additional quality loss that normally happens when you transcode between formats. On top of that, you can apply deblocking filters to these converted images to smooth out artifacts and improve their perceived quality even more.</p> <p>AV2 also uses higher precision math for standard 8-bit content. This helps prevent "banding" — those ugly, visible steps in what should be a smooth color gradient — which can be caused by rounding errors during compression.</p> <h2 id="h-264-helped-enable-hd-video-on-the-web-while-formats-like-av1-drove-4k-and-hdr-what-new-experiences-might-av2-unlock">H.264 helped enable HD video on the web, while formats like AV1 drove 4K and HDR. What new experiences might AV2 unlock?</h2> <p>That's a great question! There's a growing demand for high-quality Virtual Reality (VR) and Augmented Reality (AR) experiences, driven by products like the Apple Vision Pro and Meta Quest. These applications require streaming video at very high resolutions (4K or higher), with a wide field of view (up to 360 degrees), and often with multiple views (e.g., one for each eye). AV2 is being designed with new compression tools specifically to handle this kind of demanding video more efficiently.</p> <h2 id="what-adoption-challenges-do-you-foresee-for-av2-and-how-can-they-be-solved">What adoption challenges do you foresee for AV2, and how can they be solved?</h2> <p>The main challenges will be the same ones every new codec faces: ensuring cheap, widespread hardware support and developing fast, efficient software for encoding and decoding. For AV2 to succeed, the entire ecosystem -- from chip manufacturers to codec developers and streaming services -- needs to work together.</p> <p>There will be growing pains, but if we learn from the AV1 rollout, we can speed things up. Developing a very fast software decoder early on (like dav1d was for AV1) and optimizing the software encoders will be key to driving adoption.</p> <h2 id="can-you-speculate-on-a-timeline-for-widespread-av2-deployment">Can you speculate on a timeline for widespread AV2 deployment?</h2> <p>It's hard to say for sure since the AV2 standard is still under development. However, seeing the close collaboration between all the AOMedia partners, I think the rollout could be even faster than AV1's. I wouldn't be surprised to see the first devices with AV2 hardware support by 2027. An optimistic guess for widespread deployment would be around 2030.</p> <h2 id="how-do-you-see-video-and-image-compression-evolving-in-the-next-5-to-10-years">How do you see video and image compression evolving in the next 5 to 10 years?</h2> <p>I'm betting we'll see a lot more machine learning (ML) and neural networks (NN) used in codec design. This could mean using AI to clean up and enhance the final decoded image, or it could mean building ML-based techniques directly into the compression process to improve quality from the start. I know of several research efforts already underway, and I hope to see them become part of real products in the future.</p> <h2 id="what-are-your-thoughts-on-machine-learning-in-future-compression-standards">What are your thoughts on machine learning in future compression standards?</h2> <p>As I said, I believe ML will become essential. I expect it to be adopted gradually -- first by using ML to create smarter filters that clean up compression artifacts, and then expanding to other parts of the codec as device performance allows.</p> <p>The ultimate "holy grail" would be a codec that uses machine learning extensively in every step of the process. We might even see codecs that are essentially a single, large neural network. Companies like <a rel="external" href="https://deeprender.ai">Deep Render</a> have shown this is possible; we just need to make them fast enough to run in real-time on affordable, everyday hardware.</p> <h2 id="if-you-could-instantly-solve-one-problem-in-compression-what-would-it-be">If you could instantly solve one problem in compression, what would it be?</h2> <p>My dream is to perfect the way we handle film grain in videos. I'd want to create a fully automated system that can intelligently preserve or synthesize film grain to match the director's creative intent, without needing manual tweaking for every single movie. To do that, we'd also need to develop a new quality metric that can actually understand and measure the visual appeal of film grain.</p> <p><em>The world of multimedia compression is <a rel="external" href="https://giannirosato.com/blog/post/the-multimedia-renaissance/">moving more quickly than ever</a>, and Julio is at the forefront of it all. I'm consistently impressed with his work, and If you want to learn more about him, I've linked his website below. Thanks for your time in this interview, Julio!</em></p> <p><em>– Gianni</em></p> <div class="call-to-action"> <a href="https://juliobbv.com" class="cta-button" > Julio's Website </a> </div> Improving AVIF in Open Source 2025-07-13T00:00:00+00:00 2025-07-13T00:00:00+00:00 Halide Team https://halide.cx/blog/improving-avif-in-open-source/ <div class="image-container"> <picture> <img src="https://halide.cx/img/fall_leaves.avif" width="1536" height="864" alt="Red Autumn Leaves" /> </picture> </div> <h2 id="introduction">Introduction</h2> <p><a rel="external" href="https://wiki.x266.mov/docs/images/AVIF">AVIF (AV1 Image File Format)</a> is growing in popularity for web images, thanks to its impressive compression and quality. However, open-source AVIF encoders struggled with consistency, usability, and overall compression efficiency for a long time due to their development cycles and (inherently) the way video encoders are designed.</p> <p>My name is Gianni Rosato, the founder of Halide Compression. My compression background has a foundation in working on the SVT-AV1 project with Meta as well as working with Two Orioles, the main authors behind the <a rel="external" href="https://wiki.x266.mov/docs/utilities/dav1d">dav1d software AV1 decoder</a>. My journey began with founding the <a rel="external" href="https://svt-av1-psy.com">SVT-AV1-PSY</a> project, aimed at providing a community-developed enhanced SVT-AV1 encoder for perceptual quality. One of the things I worked on while involved with SVT-AV1-PSY was considerably improving the state of the art for AVIF.</p> <h2 id="why-avif">Why AVIF?</h2> <p>AVIF wasn't on our radar as video encoder developers, but a community member suggested we try it out and we saw promising results instantly with our existing featureset. This prompted us to begin escalating our focus on still images; as a community-built open source project, we were not beholden to the interests of companies that only derived value from our video work, so we were able to shift focus without much trouble.</p> <p>This is something I want to highlight up front in this blog post: modern image codecs on the Web tend to be derivations of video standards (e.g. WebP images being VP8 keyframes, same with HEIC/HEVC as well as AVIF/AV1) with reference and production encoders designed for video. Because of this, image encoding is a poorly considered externality (with the exception of WebP, which has an image-first reference library separate from <a rel="external" href="https://wiki.x266.mov/docs/encoders/vpxenc">libvpx</a> in the form of <a rel="external" href="https://chromium.googlesource.com/webm/libwebp/">libwebp</a>).</p> <p>This is where the Web ecosystem is headed; build powerful video encoders with associated image formats, and hope that being good at video means images will benefit. This is usually effective, but to truly unlock value in these formats, boutique image-first design considerations are necessary. This became more clearly true as I continued to work on AVIF in SVT-AV1-PSY.</p> <h2 id="design-overview">Design Overview</h2> <p>Improving still picture AVIF encoding (ignoring animations, which are essentially videos after all) means improving <em>all-intra coding</em>. In video terminology, intra-coded frames are frames which do not reference data from other frames (they are standalone pictures).</p> <p>"Tune Still Picture" (also called "Tune 4") delineates SVT-AV1-PSY's intra-optimized compression mode, differentiating it from the other tuning options in the encoder.</p> <p>Tune Still Picture is comprised primarily of the following techniques under the hood:</p> <ol> <li>A quantization matrix scaling curve</li> <li>Deblocking loop filter sharpness adjustment</li> <li>More sensitive variance-adaptive quantization</li> <li>Photography-tuned variance-adaptive quantization scaling</li> <li>A custom screen-content detection algorithm</li> <li>Modifications to lambda weight modulation</li> </ol> <p>These techniques were the primary contributors to Tune 4's strength in metrics as well as perceptual quality. I'll explain what each option does in more detail below.</p> <h3 id="1-quantization-matrix-scaling">1. Quantization Matrix Scaling</h3> <p>After a frame is transformed from the spatial domain to the frequency domain (a process that separates a group of pixels into different frequency components), a quantization matrix (QM) is applied. This matrix contains different scaling factors for various frequencies. By using a non-uniform quantization matrix, an encoder can specify different levels of quantization to different frequency components (e.g. low versus high-frequency), which may allow for more graceful degradation according to the human eye as data is discarded.</p> <p>The AV1 specification includes a set of 15 predefined QMs. Encoders can select one of these for luma (light) and chroma (color) in each frame. AV1's predefined QMs are designed to be reasonably effective for a wide range of content. SVT-AV1-PSY enables QMs by default for better visual quality, and specifies a QM range that the encoder can use when encoding a video.</p> <p>For still images, we care less about QMs over time and more about how carefully choosing QMs during the encoding process for a single intra-coded frame (our image). In order to identify the best QMs for our use case, we used an industry-standard image dataset (the <a rel="external" href="https://cloudinary.com/labs/cid22">CID22</a> Validation Set) and measured a <em>convex hull</em> (how quality changes relative to size) according to the <a rel="external" href="https://github.com/cloudinary/ssimulacra2">SSIMULACRA2</a> image quality metric for each QM.</p> <p>We found that for different quality levels, on average, different QMs performed better. We selected the best QMs for each range in order to achieve the best overall convex hull.</p> <h3 id="2-deblocking-loop-filter-sharpness">2. Deblocking Loop Filter Sharpness</h3> <p>This was a simpler change, despite being potentially the most effective.</p> <p>SVT-AV1-PSY features user-facing controls to modify the encoder's internal deblocking loop filter sharpness. AV1 divides video frames into blocks in order to compress different regions of a frame differently. The deblocking loop filter in an encoder controls how the boundaries between blocks in each frame are smoothed into one another, and can be modified to be smoother or sharper depending on internal controls.</p> <p>We tried each sharpness level on a convex hull (as we did with QMs) and landed on the best overall level to set as the default for Tune Still Picture. This particular case illustrates the difference between an image encoder and a video encoder. While smoother deblocking might help a video encoder by potentially improving inter-frame consistency and leading to better compression, working with a single frame tells a different story. Thus, an image encoder ends up making drastically different decisions than a video encoder, even with the same set of tools.</p> <h3 id="3-variance-adaptive-quantization-sensitivity">3. Variance-Adaptive Quantization Sensitivity</h3> <p>Variance Adaptive Quantization (VAQ) is a feature that comes from the x264 days, helping to drastically improve visual quality while also improving metrics due to the nature of quantization in the face of low-variance image data (this <a rel="external" href="https://github.com/psy-ex/svt-av1-psy/blob/master/Docs/Appendix-Variance-Boost.md">explainer by Julio Barba</a>, the author of VAQ in SVT-AV1(-PSY), is a very good guide on how it works).</p> <p>VAQ only makes an encoder better when it is used properly. In the case of still images, increasing the strength of VAQ helped improve our convex hull, but the changes to VAQ didn't stop there.</p> <h3 id="4-variance-adaptive-quantization-scaling">4. Variance-Adaptive Quantization Scaling</h3> <p>The scaling algorithm for the default VAQ implementation in SVT-AV1 follows this equation:</p> <p>q = pow(1.018, strengths[strength] * (-10 * log2((double)variance) + 80))</p> <p>If we take strength as a configurable variable instead of a look-up table for the sake of demonstration, we can plot a curve that looks like this:</p> <p><img src="/img/varboost_0.webp" alt="Variance Boost Video Curve" /></p> <p>The shape of this curve should generally illustrate how variance adaptive quantization works, if we think about the x-axis as our input variance value and our y-axis as our returned quantization scaling value. Less variance means we "boost" the amount of bits sent to an area to improve its quality.</p> <p>Tuning for photographic content meant using a modified curve, defined by the following equation:</p> <p>q = 0.15 * strength * (-log2((double)variance) + 10) + 1;</p> <p>Here is the associated visual, with the black line representing the Still Picture curve:</p> <p><img src="/img/varboost_1.webp" alt="Variance Boost Still Picture Curve" /></p> <p>Finding this curve required considering the type of data present in photographs, the sensitivity of quality to quantization in intra-coded frames, and how our convex hull responded. One interesting thing about this curve is that while low-variance data isn't boosted as eagerly, higher variance data is tapered back much more slowly.</p> <h3 id="5-screen-content-detection">5. Screen Content Detection</h3> <p>AV1 happens to have some special tools (namely Intra Block Copy/IBC & palette mode) that help immensely with non-photographic "screen content" (e.g. text screenshots, lineart, digital drawings) when compared to photographs.</p> <p>Making screen content tools useful was accompanied by the goal of generally better internal tuning when facing screen content. However, in order to improve efficiency on screen content, you need to know when you're encoding it. The default screen content detection algorithm in SVT-AV1 wasn't effective for our use case, so we worked on engineering a new one.</p> <p>Julio & I both came up with separate implementations, and Julio's ended up being our choice of implementation in the end. <a rel="external" href="https://github.com/gianni-rosato/photodetect2">Reference Zig code</a> is provided if you want more technical details, but the algorithm is able to detect screen content effectively as well as differentiate between different kinds of screen content. There is a basic classification, as well as high-variance, medium confidence, and high confidence. This implementation allowed us to strengthen an already strong use case for AVIF, where older codecs (namely JPEG) fell short.</p> <h3 id="6-lambda">6. Lambda</h3> <p>The lambda is a parameter used in rate-distortion optimization (RDO). RDO is the process by which an encoder decides the best way to encode a block of pixels by evaluating a cost function that balances two competing goals. These goals are minimal distortion (how much the encoded block differs from the original) and minimal rate (how much data is required to encode a block). Lower rate means a smaller file. The RDO cost function is typically expressed via the equation below.</p> <p><em>Cost = Distortion + λ * Rate</em></p> <p>Due to the nature of this very simple equation, you can see that a high lambda prioritizes rate reduction while a lower lambda will favor reducing distortion.</p> <p>In simple terms, what Tune Still Picture does is modulate the lambda depending on the amount of quantization we desire. At higher and lower quantization (the lowest & highest ends of the quality spectrum respectively), we ramp down the lambda. In the middle, we ramp it up. This improved our convex hull.</p> <h2 id="aftermath">Aftermath</h2> <p>The result of Tune Still Picture was up to 15% better compression for AVIF, as well as significantly better consistency and greater flexibility for SVT-AV1 as our features are merged (this is still an ongoing effort). See for yourself on the <a rel="external" href="https://svt-av1-psy.com/avif/">SVT-AV1-PSY AVIF page</a>. The effort for better still image performance with SVT-AV1 also involved reducing the minimum size supported by the encoder to below 64x64 as well as implementing support for odd dimensions.</p> <p>Eventually, the bulk of our Tune Still Picture changes were merged into libaom's aomenc, the reference AV1 encoder developed by Google. They live on as aomenc's tune iq (for "image quality") and our gains are still visible there.</p> <p><img src="/img/libaom_tune_iq.svg" alt="libaom's tune iq performance" /></p> <p>The results above were achieved on the Kodak True Color image dataset on libaom v3.12.1 via libavif.</p> <h2 id="what-now">What Now?</h2> <p>Now you know the gist of our still image improvements for AVIF! Researching & building open-source image encoding improvements was fun, but the future may look different for image codecs going forward.</p> <p>I am hopeful that AV2 will be an exciting development for the still image world, but the modern Web image compression ecosystem still has some glaring issues. In libaom, tune iq still suffers from consistency issues due to strange encoder decisions that are byproducts of images being second-class to video. Additionally, the fastest libaom preset often requires almost 80% more encoding time than the fastest libwebp preset with a much higher memory footprint.</p> <p>Potentially the biggest issue of all is that working full-time on community-supported encoders is impossible to justify without compensation, especially when you don't have a clientele that needs strong still image performance.</p> <p>At Halide Compression, my goal is to fundamentally change these incentives. For many companies, images are highly expensive, and a highly efficient licensable encoder alongside an expert consulting team is a valuable thing. <a href="/iris/">Iris-WebP</a> is already changing the narrative for WebP by providing unprecedented efficiency gains over a reference implementation that is already designed with images in mind. An image-first ecosystem, supported by a dedicated team, becomes necessary to make modern image formats usable.</p> <p>I hope you enjoyed the read and learned something. If you'd like to talk to me or Halide about my open-source work, Iris, or anything else, shoot us an email! Thanks for reading!</p> <div class="call-to-action"> <a href="mailto:mail@halide.cx" class="cta-button" > Email Us </a> </div> Introducing Iris for WebP 2025-06-04T00:00:00+00:00 2025-06-04T00:00:00+00:00 Halide Team https://halide.cx/blog/introducing-iris/ <div class="image-container"> <picture> <img src="https://halide.cx/img/sky.avif" width="1536" height="864" alt="Sky" /> </picture> </div> <h2 id="why-webp">Why WebP?</h2> <p>WebP was introduced in 2010 with the goal of providing better compression for Web images. While it claimed to offer significant efficiency advantages over JPEG, in practice this wasn't always true. Its adoption was also slow due to an initial lack of widespread browser support and further lackluster support outside of the Web ecosystem. This led to WebP being perceived as a confusing addition to the Web.</p> <p>Desipte its reputation and unclear benefits, WebP has gained significant traction on the Web. It is available in over 95% of Web browsers, and large digital asset management companies serve billions of WebP images every day.</p> <p>Iris-WebP provides a fast, efficient WebP encoder designed for the human eye. Images encoded with Iris-WebP look significantly better than those encoded with the reference WebP encoder, and Iris-WebP performance outclasses encoders for slower, newer Web-first formats like AVIF.</p> <h2 id="our-encoder">Our Encoder</h2> <p>Our primary goals building Iris-WebP are speed, compression efficiency, and consistency. We want to consistenty output high-quality results from our encoder quickly, and in doing so provide an implementation that delivers on WebP's initial quality promises without compromise.</p> <p>In order to meet our goals, we've developed robust tooling to measure visual fidelity with SSIMULACRA2 and Butteraugli. Visual performance is paramount, and we work hard to ensure Iris-WebP isn't just overfit for metrics. Our featureset includes novel image compression tech designed through meticulous psychovisual research, allowing us to provide unrivaled performance.</p> <p>To learn more about Iris-WebP and how it may benefit your workflow, visit the <a href="/iris/">Iris project page</a>. At the time of writing, we don't have metrics to share, but they will be coming soon to the Iris project page. We're excited to see how Iris can help make the web faster, lighter, and more beautiful!</p> <div class="call-to-action"> <a href="/iris" class="cta-button" > Learn More About Iris </a> </div>