Fixing Blocking Artifacts from JPEG Images

Link to Dataset: http://r0k.us/graphics/kodak/ or https://github.com/MohamedBakrAli/Kodak-Lossless-True-Color-Image-Suite

Who: Jesse Gallant (jgallan1) Oren Kohavi (okohavi) Michael Li (mli103) Introduction: Image compression is a key part of the modern internet: Without compressing images, you’d never be able to view your favorite photo albums, memes, etc. online. -- They would be prohibitively large to store and transmit. However, image compression comes at a cost. Specifically, JPEG compression operates on 8x8 blocks of pixels, meaning that when an image is compressed, there are often ‘blocking’ artifacts which show up on the compressed version of an image that did not exist in the original. The paper we are re-implementing, BlockCNN, is a convolutional neural network designed to remove blocking artifacts from JPEG images, achieving higher quality with the same storage size than normal JPEG compression. Related Work: We drew inspiration from “BlockCNN: A Deep Network for Artifact Removal and Image Compression” where they demonstrated a method for artifact removal and image compression to minimize image size while retaining as much image information as possible. This paper can be found here: https://bit.ly/3rht21L. Removing compression artifacts is a common goal amongst many researchers. Other papers attempting to remove compression artifacts from images are: “Compression Artifacts Removal Using Convolutional Neural Networks” (https://arxiv.org/abs/1605.00366) and ”Compression Artifact Removal with Stacked Multi-context Channel-wise Attention Network” (http://www.cs.umanitoba.ca/~ywang/papers/icip19.pdf)

We found an existing implementation of BlockCNN in PyTorch, and will therefore be making our implementation in Tensorflow https://github.com/DaniMlk/BlockCNN

Data: We are using the Kodak Lossless Image dataset linked in the project handout. There are 24 images, each either 768x512 or 512x768. Each image contains 3 channels, representing the RGB value of corresponding pixels.

With our preprocessing methodology (detailed below), this should result in roughly 150k input/label pairs. (512(height) * 768(width) * 25(images in dataset) / 64(pixels per block) = 153,600)

Methodology: The architecture of our model is detailed in the paper: It consists of a series of convolution layers and residual blocks (each residual block consisting primarily of batch-normalization, with some 1x1 convolution mixed in)

The model will be trained using inputs of size 24x24x3 and outputs of size 8x8x3, which will be generated by a preprocessing script by splitting up larger images. The large input images to this preprocessing script will be lossless images from our dataset.

“Similar to JPEG, we partition an image into 8×8 blocks and process each block separately. We use a convolutional neural network that inputs a block together with its adjacent blocks (a 24 × 24 image), and outputs the processed block in the center.” - BlockCNN

Anticipated challenges include the large number of layers, which might make training slow, and the complex structure of the residual block, which will be non-trivial to implement in Tensorflow.

Metrics: Base: We are planning on calculating the difference between two images using P-SNR and SSIM (two methods cited in the BlockCNN paper). Our goal is to have an output image that has higher similarity to the original image than the compressed JPEG image does. Stretch: After reaching our base goal, there isn’t much to do besides making it more and more similar to the original image. Ideally, we should be able to have a side-by-side comparison of JPEG images before and after they have been processed by our network, with a visible difference between the two.

Ethics: Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? Image compression is utilized in all realms of our society: from big companies to everyday individuals. Large companies compress images when putting images on their websites, uploading video etc. Everyday users compress images when they upload to instagram, send an image text message to a friend etc. A mistake in our algorithm, although it may not drastically change the look of the image, could certainly make changes to small details. It is important not to over promise what our algorithm can do and to make sure that we try our best to maintain as much information possible

How are you planning to quantify or measure error or success? What implications does your quantification have?

We plan on using peak signal-to-noise ratio (P-SNR) and structural similarity index measurement (SSIM) as metrics for our success. P-SNR measures the ratio between the original image (signal) and the noise introduced by compression (noise). SSIM measures the similarity between two images, which will indicate how faithfully our model can reproduce the original, uncompressed image. For both metrics, a higher value indicates a better result.

The implications of these quantifications are that the quality of the result is determined strictly by measurements between the original and output images, so that the outputs shouldn’t be biased towards/against any specific groups, and there should be no difficulty in determining a “good” output from a “bad” one.

Division of labor:

Preprocessing - Oren CNN - Everyone Poster - Michael Jesse Final Writeup - Everyone

Final Reflection: https://docs.google.com/document/d/1ZA3JvugaDOyRsVObIKfGqAwsoGehs2-OnsrE1lVp0bE/edit?usp=sharing