Skip to content

Introduce Hyperscan lib to implement regexp functions#10279

Closed
philo-he wants to merge 1 commit intofacebookincubator:mainfrom
philo-he:hyperscan
Closed

Introduce Hyperscan lib to implement regexp functions#10279
philo-he wants to merge 1 commit intofacebookincubator:mainfrom
philo-he:hyperscan

Conversation

@philo-he
Copy link
Copy Markdown
Contributor

@philo-he philo-he commented Jun 21, 2024

Intel Hyperscan lib (https://github.com/intel/hyperscan) is a high-performance multiple regex matching library. We are proposing to introduce this lib into Velox. Related discussion: #9823.

Here is a benchmark's result which shows hyperscan performs ~28% better than RE2 in regex matching.

============================================================================
[...]benchmarks/Re2FunctionsBenchmarks.cpp     relative  time/iter   iters/s
============================================================================
regexMatch(bs1k)                                           94.13ns    10.62M
regexMatch(bs10k)                                          89.96ns    11.12M
regexMatch(bs100k)                                         88.40ns    11.31M
regexMatch(bs1k_hyperscan)                                 68.52ns    14.60M
regexMatch(bs10k_hyperscan)                                63.54ns    15.74M
regexMatch(bs100k_hyperscan)                               63.79ns    15.68M

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 21, 2024
@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 21, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit f0470e9
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6710b619b61cc80008f57c5d

@liuyongvs
Copy link
Copy Markdown

@philo-he what is going on? we need it

@philo-he
Copy link
Copy Markdown
Contributor Author

@philo-he what is going on? we need it

@liuyongvs, looks CI env. lacks some dependency for building hyperscan. I will update this pr soon.

@philo-he philo-he force-pushed the hyperscan branch 13 times, most recently from 649fe66 to 252d9d2 Compare July 16, 2024 15:18
@philo-he philo-he marked this pull request as ready for review July 16, 2024 15:31
@philo-he philo-he changed the title [WIP] Introduce Hyperscan lib to implement regexp functions Introduce Hyperscan lib to implement regexp functions Jul 16, 2024
@philo-he
Copy link
Copy Markdown
Contributor Author

@mbasmanova, do you have any comment?

@mbasmanova
Copy link
Copy Markdown
Contributor

@philo-he This is a major change, but it lacks description. Last time we discussed this in #9823 we found that hyperscan library is no longer maintained. We also found that its main advantage would be when matching a string against multiple regexes at once, but neither Presto nor Spark have functions that allow that. Finally, there are probably semantic differences in hyperscan and re2 and these need be to understood and documented. I haven't looked closely at the benchmark, but benchmarking regex engines in general is very hard as it is difficult to come up with a representative set of use cases. That said, 28% improvement doesn't appear significant enough to justify the complexity of introducing a new dependency and working through all the issues mentioned. Let me know your thoughts.

@stale
Copy link
Copy Markdown

stale Bot commented Oct 17, 2024

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale
Copy link
Copy Markdown

stale Bot commented Jan 15, 2025

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants