-
-
Notifications
You must be signed in to change notification settings - Fork 932
Description
Why the lexer is slower than it could be
Chars iterator is really slow. Lexer should iterate byte-by-byte rather than char-by-char.
In almost all cases, we're only matching against ASCII characters anyway (e.g. self.next_eq('.')), so calculating the Unicode char is completely pointless, as we only care if it's an ASCII . anyway. Surprisingly, the compiler seems not to be able to see this, and doesn't optimize out the char calculation itself. It generates a lot of pointless and slow code.
How to make it faster
Source was introduced with intention to ease the transition to byte-by-byte iteration through source code.
#2352 got some heavy perf improvements from using it (along with other tricks), but I have not been able to find the time to complete the work.
The following APIs should be converted to take/return u8 and use Source::peek_byte instead of Source::peek_char:
-
peek -
peek2 -
next_eq
Usages of these APIs should be refactored unless they're directly preceded by a peek:
-
next_char -
consume_char
Suggested implementation
Source::next_byte is an unsafe function, as it can break the invariant that Source must always be positioned on a UTF8 character boundary (i.e. not in the middle of a multi-byte Unicode char). It's preferable not to use it, to avoid littering the lexer with unsafe code.
However, Source::next_char is written to be much more transparent to the compiler than Chars::next is. So the compiler is able to optimize safe code like:
const dot_was_consumed = match lexer.source.peek_byte() {
b'.' => {
lexer.consume_char().unwrap();
true
}
_ => false,
}down to equivalent of:
const dot_was_consumed = match lexer.source.peek_byte() {
b'.' => {
unsafe {
// This is a single assembler operation
lexer.source.set_position( lexer.source.position().add(1) );
}
true
}
_ => false,
}(at least that's what I remember from a few months ago when I checked it with Godbolt).
(originally mentioned in #3250 (comment))
Metadata
Metadata
Assignees
Labels
Type
Fields
Give feedbackPriority