Skip to content

[Performance] RegExp uses undue amount of memory on Chromium-based browsers #3193

@gorhill

Description

@gorhill

In commit bacf502, I refactored how hostnames as specified in the domain= option in a network static filter was implemented.

As a result of the set-vs-regexp.html benchmark, I decided to use regexp to quickly lookup whether a hostname is part of a set of hostnames as specified in a domain= option.

However, as revealed by the "Take heap snapshot" memory tool in Chromium, the amount of memory used by regexp instances on Chromium-based browsers is quite surprising. RegExp instances are internally lazily allocated in Chromium, meaning that internally memory is allocated only when the method exec() is called on a RegExp instance.

However, as shown in the following screenshot, a lot of filters with the domain= option end up having their regexp executed earlier than expected. The heap snapshot was taken after launching uBO and visiting only the links on the front page of https://news.ycombinator.com/news:

a

The top RegExp by memory use comes from the filter $script,third-party,domain=123videos.tv|171gifs.com|1proxy.de|... in EasyList. Such filter will always end up being executed because if applies to any network request of type script. The number of distinct hostnames in the domain= option of that specific filter is 732.

As seen in the screenshot, even with a minimalist browsing session, all these RegExp instances add up to a good amount of memory. Pretty much all these memory-expensive RegExps are related to the domain= option in network static filtering.

Even a small EasyList filter such as |https://$script,third-party,xmlhttprequest,domain=candyreader.com|likesblog.com|projectfreetv.at|projectfreetv.sc|projectfreetv.us|projectwatchseries.com|shupebrothers.com|watchseriesonline.info -- which also always end up executing -- will have a memory footprint of 6,880 bytes to represent just the eight distinct hostnames specified in its 144-character long domain= option.

As shown in the benchmark, RegExp are reportedly quite faster than using Set when it comes to lookup whether a specific hostname is part of the set or not.

This issue is to document and address this domain=-related RegExp memory issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions