YUV encoding/deconding usually works well enough in fixed-point arithmetic and it is faster than floating point. Therefore, it may be possible to replace current JPEG encoding pipeline with Q16 multiplication. Notably, libjpeg-turbo also done in fixed-point arithmetic, so there will be no significant deviation from almost reference implementation.
However, switching to fixed-point is unlikely to yield significant improvements, there is only a few percent gains in end-to-end encoding.
Here is an experimental implementation to try out this approach: https://github.com/awxkee/image/tree/test_yuv_jpeg_fixed_point
cargo bench --bench fixed_point
Tests shows that fixed-point faster something about 20%, but in end-to-end encoding gain are varies from 2% to 7%.
P.S.
Anyway, it seems here .round() is missing, so luma and chroma values are truncated on each pass. Over time, this truncation leads to #2311
|
fn rgb_to_ycbcr<P: Pixel>(pixel: P) -> (u8, u8, u8) { |
|
use crate::traits::Primitive; |
|
use num_traits::cast::ToPrimitive; |
|
|
|
let [r, g, b] = pixel.to_rgb().0; |
|
let max: f32 = P::Subpixel::DEFAULT_MAX_VALUE.to_f32().unwrap(); |
|
let r: f32 = r.to_f32().unwrap(); |
|
let g: f32 = g.to_f32().unwrap(); |
|
let b: f32 = b.to_f32().unwrap(); |
|
|
|
// Coefficients from JPEG File Interchange Format (Version 1.02), multiplied for 255 maximum. |
|
let y = 76.245 / max * r + 149.685 / max * g + 29.07 / max * b; |
|
let cb = -43.0185 / max * r - 84.4815 / max * g + 127.5 / max * b + 128.; |
|
let cr = 127.5 / max * r - 106.7685 / max * g - 20.7315 / max * b + 128.; |
|
|
|
(y as u8, cb as u8, cr as u8) |
|
} |
YUV encoding/deconding usually works well enough in fixed-point arithmetic and it is faster than floating point. Therefore, it may be possible to replace current JPEG encoding pipeline with Q16 multiplication. Notably, libjpeg-turbo also done in fixed-point arithmetic, so there will be no significant deviation from almost reference implementation.
However, switching to fixed-point is unlikely to yield significant improvements, there is only a few percent gains in end-to-end encoding.
Here is an experimental implementation to try out this approach: https://github.com/awxkee/image/tree/test_yuv_jpeg_fixed_point
Tests shows that fixed-point faster something about 20%, but in end-to-end encoding gain are varies from 2% to 7%.
P.S.
Anyway, it seems here
.round()is missing, so luma and chroma values are truncated on each pass. Over time, this truncation leads to #2311image/src/codecs/jpeg/encoder.rs
Lines 773 to 789 in 5080752