Skip to content

perf: use simdutf8 to validate UTF-8 when reading files#2040

Merged
IWANABETHATGUY merged 1 commit intorolldown:mainfrom
overlookmotel:readfile-utf8-simd
Aug 26, 2024
Merged

perf: use simdutf8 to validate UTF-8 when reading files#2040
IWANABETHATGUY merged 1 commit intorolldown:mainfrom
overlookmotel:readfile-utf8-simd

Conversation

@overlookmotel
Copy link
Copy Markdown
Collaborator

Description

I'm not familiar with Rolldown's codebase, so I'm not sure whether this function is the main one used to read source text from files - so this may be in wrong place/not enough places.

std::fs::read_to_string internally calls std::str::from_utf8 to check that the file is valid UTF-8. str::from_utf8 is not so efficient as it doesn't use SIMD. simduft8 should be faster.

I assume that this difference won't show up on benchmarks as they likely won't include IO. But I'd hope it'd have an effect in the real world. How would we go about testing that assumption? Are there any benchmarks which do exercise this function?

@netlify
Copy link
Copy Markdown

netlify bot commented Aug 22, 2024

Deploy Preview for rolldown-rs canceled.

Name Link
🔨 Latest commit 4dbf02f
🔍 Latest deploy log https://app.netlify.com/sites/rolldown-rs/deploys/66c73048b6271f000812f275

@Boshen
Copy link
Copy Markdown
Member

Boshen commented Aug 22, 2024

I assume this would break our wasm and wasi builds without and feature target gates.

@overlookmotel
Copy link
Copy Markdown
Collaborator Author

overlookmotel commented Aug 22, 2024

simduft8 supports WASM, although it's not clear to me if it's enabled by default: https://github.com/rusticstuff/simdutf8#wasm32

But it shouldn't break WASM anyway.

@Boshen
Copy link
Copy Markdown
Member

Boshen commented Aug 22, 2024

We can always test the effect of such functionalities in oxc first, oxlint in this case.

@overlookmotel
Copy link
Copy Markdown
Collaborator Author

Do oxlint's benchmarks include file IO?

@Boshen
Copy link
Copy Markdown
Member

Boshen commented Aug 22, 2024

Do oxlint's benchmarks include file IO?

I need to profile locally.

@Boshen Boshen self-assigned this Aug 23, 2024
@Boshen
Copy link
Copy Markdown
Member

Boshen commented Aug 25, 2024

In monitor-oxc, validating 84190 files (before and after):

image

Performance is greatly improved, but maybe M3 is too powerful to make a dent on the overall performance improvement.

@IWANABETHATGUY
Copy link
Copy Markdown
Member

@Boshen since you assign to yourself, this pull request needs your approval to merge.

@Boshen Boshen removed their assignment Aug 26, 2024
@IWANABETHATGUY IWANABETHATGUY added this pull request to the merge queue Aug 26, 2024
Merged via the queue into rolldown:main with commit 94cb434 Aug 26, 2024
@overlookmotel
Copy link
Copy Markdown
Collaborator Author

I wasn't sure if this change also needs making elsewhere in Rolldown, if other parts of the code also read from file system. @IWANABETHATGUY can you advise?

@overlookmotel overlookmotel deleted the readfile-utf8-simd branch August 26, 2024 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants