Skip to content

Decrease embedded img size for inlineImages#836

Merged
Yuyz0112 merged 4 commits intorrweb-io:masterfrom
zytedata:master
Feb 25, 2022
Merged

Decrease embedded img size for inlineImages#836
Yuyz0112 merged 4 commits intorrweb-io:masterfrom
zytedata:master

Conversation

@croqaz
Copy link
Contributor

@croqaz croqaz commented Feb 16, 2022

Hi.

This is a small PR for decreasing the final snapshot size, for the majority of pages.

For example a page with lots of images like: https://www.1samsautos.com/2005-GMC-Safari-Passenger-Denver-CO-Fort-Collins-Colorado-Springs-Kansas-City-DENVER/used_car/Wrqim%5bzGkRg%3d ; currently the snapshot size jumps between 96MB and 120MB, depending on how much I'm waiting for the page to load, or if I'm scrolling, because the images are loaded lazily on scroll event.

If we merge this PR, the snapshot size will be only 12MB, so 10x smaller.

However there may be some issues with this hard-coded convert-all-to-JPEG approach:

  • what if the original image is a GIF, or PNG with transparency? the snapshot will make the transparency white, which can make some pages look broken or ugly
  • what if the canvas.toDataURL() function would return a smaller string if we used PNG? there are large, mostly monochrome images that can be smaller as PNG.

There doesn't seem to be a built-in way to check the HTMLImageElement Content-type, or MIME-type. If the image was GIF or PNG we could just convert it to PNG, otherwise would be JPEG. We could use the img src extension, but that's not reliable.

Ping @Mark-Fenng again, because you know this feature very well already. I hope you don't mind 😄

@croqaz
Copy link
Contributor Author

croqaz commented Feb 16, 2022

Alternatively, there's also WEBP.

According to https://developers.google.com/speed/webp/docs/webp_lossless_alpha_study it supports transparency and should be smaller.
According to https://caniuse.com/?search=webp it is widely available.

For reference, I tried the same page, with canvasService!.toDataURL('image/webp', 0.7) and the snapshot is also 11MB.

@YunFeng0817
Copy link
Member

Thank you for pinging me! I think the work is great. About the compression switch, compression ratio, and final format, personally, I suggest making them configurable. How do you think about it @Yuyz0112

@Yuyz0112
Copy link
Member

Same here, extending inline images config to an object may help.

@croqaz
Copy link
Contributor Author

croqaz commented Feb 17, 2022

Guys, how about creating an optional param object dataURLOptions or something like that.
The default would be { type: null, quality: null }.

Then if you provide something like { type: "image/png", quality: 0.7 }, it would be applied to both record canvas and inline images? Because both tags use canvas.toDataURL(...).
My reasoning is that you can save for both tags and if the browser supports webp, it makes sense to use it everywhere.
I can't think of any reason to use separate image types for canvas, versus img.

What do you guys thing?

@YunFeng0817
Copy link
Member

Good point! I think it could work. If changing canvas recording, I think @Juice10 may take a look as well.

@Yuyz0112
Copy link
Member

Nice idea +1. The only concern is the API design since we need to keep compatibility.

@croqaz
Copy link
Contributor Author

croqaz commented Feb 21, 2022

So I checked a few pages in both Firefox and Chrome and it seems that providing nullable params to toDataURL() function makes it use the same params as default.

Eg:

document.getElementById('mainCanvas').toDataURL(null, 0) === document.getElementById('mainCanvas').toDataURL()
document.getElementById('mainCanvas').toDataURL(undefined, undefined) === document.getElementById('mainCanvas').toDataURL()

I implemented dataURLOptions like we agreed.
It is optional, and backwards compatible. I tested locally with a few pages and it works as expected.

@Juice10
Copy link
Member

Juice10 commented Feb 21, 2022

Good point! I think it could work. If changing canvas recording, I think @Juice10 may take a look as well.
Thanks for looping me in @Mark-Fenng

I can't think of any reason to use separate image types for canvas, versus img.

What do you guys thing?

Unifying the config for canvas and images makes sense to me

@Juice10
Copy link
Member

Juice10 commented Feb 21, 2022

If you'd like to decrease the embedded img size for inline images, why not try to use the existing image?
I know it's hard to try and figure out what format an image is in once its loaded into the browser in an image element, but we could request the image a second time by using fetch. This will most likely come straight from the browser cache and will be really fast to load. With the added bonus that this might already be the very smallest file possible if the website owner is serious about performance.

Using fetch comes with the benefit that we won't have to use canvas.toDataURL.
I've been doing some experimenting with recording canvas snapshots periodically and I've been bumping into a lot of issues regarding performance of toDataURL. Depending on what format you choose its either slow or very slow and larger images tend to make browsers drop frames. Alternatively you can use toBlob instead of toDataURL it's async and about 2-3x faster as it doesn't have to do any base64 encoding. But you'll still need to do the base64 encoding so toBlob does just delays the inevitable, but maybe we can do that part in a web worker?

Big downside to both canvas.toBlob and fetch(img.src) is that they are async. So it means a different approach when saving these too events. This might be a good starting point for a discussion around async events @Mark-Fenng @Yuyz0112 @eoghanmurray (in a different issue probably)

@croqaz
Copy link
Contributor Author

croqaz commented Feb 22, 2022

I have actually tried the image fetch option, when I started working on the offline image feature. It's a bit of work to make all the functions in the snapshot.ts file async, but it makes it easier to wait for some callbacks and events.
It's a breaking feature tho, and I shouldn't probably implement it in this PR.
Because the code is sync, currently I don't see any other way of of decreasing the size of the images. I already implemented the code and I'm already using it on my end.

@Juice10
Copy link
Member

Juice10 commented Feb 22, 2022

I have actually tried the image fetch option, when I started working on the offline image feature. It's a bit of work to make all the functions in the snapshot.ts file async, but it makes it easier to wait for some callbacks and events.

It's a breaking feature tho, and I shouldn't probably implement it in this PR.

Because the code is sync, currently I don't see any other way of of decreasing the size of the images. I already implemented the code and I'm already using it on my end.

I totally agree that this PR is the next logical step to what we already have, and makes sense to do any image fetching or async work in a separate PR. Especially if we want to make the snapshot async.

What we could potentially do to keep everything sync, is to create a new type of event that contains the url of the image, and the contents. And then in the replayer we can loop through all of these new types of events to know what urls have been saved, and when the replayer creates an image it can check to see if we have an offline copy of that url and replace the contents of it with the saved version.
We are doing something similar for the canvas replay. If you search for imageMap throughout the source code you'll see it in action.

private imageMap: Map<eventWithTime | string, HTMLImageElement> = new Map();

Credits go to @Yuyz0112 for coming up with this concept

@Yuyz0112 Yuyz0112 merged commit e104300 into rrweb-io:master Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants