-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
In commit bacf502, I refactored how hostnames as specified in the domain= option in a network static filter was implemented.
As a result of the set-vs-regexp.html benchmark, I decided to use regexp to quickly lookup whether a hostname is part of a set of hostnames as specified in a domain= option.
However, as revealed by the "Take heap snapshot" memory tool in Chromium, the amount of memory used by regexp instances on Chromium-based browsers is quite surprising. RegExp instances are internally lazily allocated in Chromium, meaning that internally memory is allocated only when the method exec() is called on a RegExp instance.
However, as shown in the following screenshot, a lot of filters with the domain= option end up having their regexp executed earlier than expected. The heap snapshot was taken after launching uBO and visiting only the links on the front page of https://news.ycombinator.com/news:
The top RegExp by memory use comes from the filter $script,third-party,domain=123videos.tv|171gifs.com|1proxy.de|... in EasyList. Such filter will always end up being executed because if applies to any network request of type script. The number of distinct hostnames in the domain= option of that specific filter is 732.
As seen in the screenshot, even with a minimalist browsing session, all these RegExp instances add up to a good amount of memory. Pretty much all these memory-expensive RegExps are related to the domain= option in network static filtering.
Even a small EasyList filter such as |https://$script,third-party,xmlhttprequest,domain=candyreader.com|likesblog.com|projectfreetv.at|projectfreetv.sc|projectfreetv.us|projectwatchseries.com|shupebrothers.com|watchseriesonline.info -- which also always end up executing -- will have a memory footprint of 6,880 bytes to represent just the eight distinct hostnames specified in its 144-character long domain= option.
As shown in the benchmark, RegExp are reportedly quite faster than using Set when it comes to lookup whether a specific hostname is part of the set or not.
This issue is to document and address this domain=-related RegExp memory issue.
