Remove type parameter from parse_* methods#9466
Conversation
ae6b1ac to
3112468
Compare
CodSpeed Performance ReportMerging #9466 will degrade performances by 6.76%Comparing Summary
Benchmarks breakdown
|
6552632 to
7403d2d
Compare
7403d2d to
9d17580
Compare
|
parse_* methods
charliermarsh
left a comment
There was a problem hiding this comment.
This seems reasonable to me given the explanation (especially around the perceived regression) in the summary.
BurntSushi
left a comment
There was a problem hiding this comment.
I buy what you're selling.
| pub fn parse_tokens(tokens: Vec<LexResult>, source: &str, mode: Mode) -> Result<Mod, ParseError> { | ||
| let marker_token = (Tok::start_marker(mode), TextRange::default()); | ||
| let lexer = std::iter::once(Ok(marker_token)).chain(lxr); | ||
| let lexer = std::iter::once(Ok(marker_token)).chain(TokenSource::new(tokens)); |
There was a problem hiding this comment.
I was trying to figure out why this is a dedicated type instead of a tokens.into_iter().filter(...) but couldn't come up with a reason. (Usually it's because you want to name the type somewhere, but maybe I'm missing that.)
There was a problem hiding this comment.
It's mainly to have a named type in #9152 and a place where we can implement lookahead without using PeekMany
There was a problem hiding this comment.
Ah gotya, don't mind me then. :)
|
LGTM. I was actually going to something like this later, glad you saved me the work 😊 |
This PR removes the
I(tokens iterator) type parameter from allparse_*methods. This is done in preparation for #9152 to avoid accidentally monomorphizing the parser more than once.The
Parserstruct defined in #9152 is parametrized by the underlying tokens iterator. This is dangerous because Rust will monomorphize the entire parser for each distinctItype parameter.This PR removes the
Itype parameters from allparse_*methods and introduces a newTokenSourcetype in the parser that:MultiPeek(which has an overhead when reading tokens)Parserto read tokens.The downside is that all
parse_methods now take aVec<LexResult>, which requires collecting the lexer result into a Vec before parsing. Our micro-benchmarks show that this is slower than streaming the tokens one by one.However, it turns out that both the linter and formatter both collect the tokens before parsing them to an AST. That means the path tested by the micro-benchmark isn't used in the actual product and the introduced regression doesn't affect users.
You may wonder why this is only a problem now but hasn't been a problem with lalrpop. The problem existed with lalrpop as well but to a much smaller extent because lalrpop separates the parser into two parts:
Ithat consumes the tokens and calls into the parser methods (only passing the tokens)Only the state machine gets monomorphized, which is fine because it is limited in size. Another way to think about it is that the lalrpop pushes the tokens into the parser (and, therefore, the parser is decoupled from I). In contrast, the handwritten parser pulls the tokens (and, therefore, is generic over I).
Alternatives
Alternatives that may be less affected by the performance regression are
TokenSourceenum with two variants: One storing the lexed tokens and the other storing the lexer to consume the tokens lazily.Box<impl Iterator>in the hand written parserI went with the above approach because we don't need the flexibility of either lexing lazily or eagerly in Ruff outside the parser benchmark. Thus, using a
Vec<LexResult>seemed the easiest solution.CC: @LaBatata101