Conversation
This accomplishes the goals of pest-parser#880 (in a slightly different way). I haven't added any unit tests or documentation yet; this is a preview.
WalkthroughThe changes add a new literal push functionality across the codebase. A Changes
Sequence Diagram(s)sequenceDiagram
participant Src as Source Code
participant P as Parser
participant A as AST
participant O as Optimizer
participant G as Generator
participant VM as Virtual Machine
participant PS as ParserState
Src->>P: Parse input (_push_literal rule)
P->>A: Create PushLiteral expression
A->>O: Convert AST node to optimized variant
O->>G: Generate code for PushLiteral
G->>PS: Call stack_push_literal with literal string
PS->>VM: Execute literal push operation
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
meta/src/parser.rs (1)
1-1797: Consider adding unit tests for PUSH_LITERAL functionality.While the implementation is solid, there are no specific unit tests for the new PUSH_LITERAL functionality. Adding tests would ensure the feature works as expected and prevent regression issues.
Consider adding a test case similar to the existing
pushtest (lines 952-969) to verify that PUSH_LITERAL parses correctly and another test to verify that it generates the expected AST.#[test] fn push_literal() { parses_to! { parser: PestParser, input: "PUSH_LITERAL ( \"a\" )", rule: Rule::_push_literal, tokens: [ _push_literal(0, 19, [ opening_paren(13, 14), string(15, 18, [ quote(15, 16), inner_str(16, 17), quote(17, 18) ]), closing_paren(18, 19) ]) ] }; } #[test] fn ast_push_literal() { let input = "rule = _{ PUSH_LITERAL(\"a\") }"; let pairs = PestParser::parse(Rule::grammar_rules, input).unwrap(); let ast = consume_rules_with_spans(pairs).unwrap(); let ast: Vec<_> = ast.into_iter().map(convert_rule).collect(); assert_eq!( ast, vec![AstRule { name: "rule".to_owned(), ty: RuleType::Silent, expr: Expr::PushLiteral("a".to_owned()), }] ); }pest/src/parser_state.rs (1)
1133-1139:stack_push_literalmethod is coherent.
Pushing a literal without consuming input is a key feature. Consider adding dedicated unit tests for this method to ensure correctness in edge cases (e.g., empty strings, large strings).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
generator/src/generator.rs(2 hunks)meta/src/ast.rs(2 hunks)meta/src/grammar.pest(2 hunks)meta/src/optimizer/mod.rs(3 hunks)meta/src/parser.rs(6 hunks)meta/src/validator.rs(2 hunks)pest/src/parser_state.rs(9 hunks)vm/src/lib.rs(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
pest/src/parser_state.rs (5)
pest/src/position.rs (2) (2)
new(45-47)span(98-107)pest/src/stack.rs (1) (1)
new(51-57)pest/src/span.rs (4) (4)
new(47-53)as_str(197-200)start(100-102)end(118-120)pest/src/iterators/pair.rs (2) (2)
new(48-60)as_str(114-120)pest/src/iterators/pairs.rs (3) (3)
new(45-87)as_str(116-125)as_str(570-574)
🔇 Additional comments (28)
meta/src/validator.rs (2)
403-403: Correctly handles PushLiteral as non-progressingThe addition of the
PushLiteral(_)case in theis_non_progressingfunction ensures that pushing literal strings to the stack is properly recognized as an operation that doesn't consume input. This is consistent with the behavior of similar operations likePush.
498-498: Correctly handles PushLiteral as non-failingThe addition of the
PushLiteral(_)case in theis_non_failingfunction correctly identifies that this operation is guaranteed to succeed regardless of input. This matches the expected semantics ofPUSH_LITERALwhich should never fail during parsing.meta/src/ast.rs (2)
97-98: New expression variant for pushing literal stringsThe new
PushLiteral(String)variant appropriately extends the AST to support pushing literal strings onto the stack without matching input, which is the core functionality described in the PR objectives. The doc comment clearly explains its purpose.
324-324: Display formatting for the new PushLiteral variantThe implementation for formatting
PushLiteralin theDisplaytrait is consistent with the format used for other expression types. Using{:?}for the string ensures proper escaping of special characters.generator/src/generator.rs (2)
573-577: Generator implementation for PushLiteralThe code generation for
PushLiteralcorrectly callsstate.stack_push_literal(#string), which will push the literal string onto the parser state stack. This implementation is consistent with the other stack operations and fulfills the requirements for the new rule.
763-767: Atomic generator implementation for PushLiteralThis implementation mirrors the non-atomic version, ensuring consistent behavior when
PushLiteralis used in atomic contexts. The duplicate implementation is necessary because atomic and non-atomic parsing have different code generation patterns in pest.vm/src/lib.rs (1)
239-239: VM implementation for PushLiteralThe implementation in the VM correctly maps the
OptimizedExpr::PushLiteralvariant tostate.stack_push_literal(string.to_owned()). Theto_owned()call is necessary because the VM needs to take ownership of the string when pushing it onto the stack.meta/src/grammar.pest (2)
84-84: Added new terminal rule for PUSH_LITERAL operation.The terminal rule has been correctly updated to include the new
_push_literaloption, expanding the available operations in the grammar.
148-149: Well-structured rule definition for PUSH_LITERAL.The new
_push_literalrule is properly defined with appropriate documentation and follows the pattern established by other similar operations like_push. This rule correctly matches the string "PUSH_LITERAL" followed by an argument enclosed in parentheses, specifically a literal string.meta/src/optimizer/mod.rs (3)
72-72: Added variant to handle PushLiteral conversion in optimizer.The
to_optimizedfunction has been correctly updated to handle the newExpr::PushLiteral(string)variant, converting it toOptimizedExpr::PushLiteral(string).
155-156: Clear documentation for the new PushLiteral variant.The new
PushLiteral(String)variant has been added to theOptimizedExprenum with appropriate documentation that clearly explains its purpose: "Pushes a literal string to the stack, e.g.push_literal("")".
331-331: Implemented Display trait for the new variant.The
Displayimplementation has been extended to handle thePushLiteralvariant, correctly formatting it asPUSH_LITERAL({:?})which will properly display the enclosed string with escaped characters.meta/src/parser.rs (5)
169-170: Added well-documented PushLiteral variant to ParserExpr.The new
PushLiteral(String)variant has been added to theParserExprenum with clear documentation explaining its purpose and usage pattern.
209-209: Updated convert_node to handle the new expression variant.The
convert_nodefunction now correctly handles the newParserExpr::PushLiteral(string)variant, creating the correspondingExpr::PushLiteral(string)AST node.
233-233: Added PUSH_LITERAL to meta rule renaming function.The
rename_meta_rulefunction has been updated to handle the new_push_literalrule, returning the user-friendly name "PUSH_LITERAL".
400-409: Implemented PUSH_LITERAL parser logic.The implementation for handling the
_push_literalrule in theconsume_exprfunction is well-structured:
- It correctly extracts the string content from the rule
- It unescapes the string literals properly
- It creates a ParserNode with the PushLiteral expression
- It uses the appropriate span for error reporting
1333-1333: Updated test error message to include the new rule.The test case error message has been properly updated to include the new
_push_literalrule in the list of expected rules when a term is missing.pest/src/parser_state.rs (11)
13-13: No concerns about the new import.
The addition ofToOwnedis consistent with the usage of owned/borrowed strings.
217-221: IntroducingBorrowedOrRcis a clear improvement.
This enum nicely handles both borrowed & owned strings without complicating the parser’s memory management.
223-227:From<&'static str>impl looks good.
The conversion logic is straightforward and aligns well withBorrowedOrRc<'_>.
229-233:From<String>impl is concise.
Wrapping theStringinsideRcfits the parser VM’s needs for reference-counted string literals.
235-245: Adequate docs forBorrowedOrRc.
Documenting the typical usage (&'static strvsRc<String>) clarifies any confusion about the enum’s purpose.
246-260:SpanOrLiteralenum effectively unifies two data representations.
This design simplifies pushing either spans or literal strings onto the stack.
1498-1498: Storing spans asSpanOrLiteral::Spanis correct.
No functional issues identified here.
1530-1531: Passing the borrowed or owned string tomatch_stringis consistent.
Looks correct and should work seamlessly.
1559-1560: Reusingas_borrowed_or_rchere is fine.
No issues with matching strings from the popped value.
1606-1606: Inline closure usage for matching is correct.
No logical problems detected.
1667-1667: Applyingmatch_stringtoas_borrowed_or_rc().as_str()is straightforward.
No issues found.
tomtau
left a comment
There was a problem hiding this comment.
- This would need to be feature-guarded
#[cfg(feature = "grammar-extras")], because otherwise the different mixing of pest dependencies that cargo may resolve can fail to compile. - there was also this issue #541 -- not sure if making possible to push insensitive literals can be helpful here?
Happy to do that!
I don't think it's possible to combine this feature with that one, though I'm open if you have suggestions. Case-insensitivity feels pretty different, and I don't see any reasonably simple abstraction that combines it with matching delims. But it sounds like, feature guarding aside (and I'd need to get the CIs passing, docs, etc.), this is something you'd be interested in accepting? |
Actually, the |
I generally don't feel comfortable with all the ad-hoc stack operations in pest for the extra complexity, but I understand why they were introduced... In terms of extending them / adding new ones, I'm open to it as long as the extension solves several compelling unaddressed use cases (vs one niche use case), so that the increased complexity is somewhat justified. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
pest/src/parser_state.rs (2)
217-221: Enum definition is well-structured.
SpanOrLiteralcorrectly unifiesSpan<'i>andCow<'i, str>. Consider derivingPartialEqif comparisons ofSpanOrLiteralmight be needed in the future.
223-231: Consider renaming for clarity and avoiding unnecessary clones.
as_borrowed_or_rc()works but may cause extra cloned allocations. Optionally, you might rename it (e.g.,as_cow()) and explore ways to avoid cloning when theCowis already owned.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
meta/src/grammar.pest(2 hunks)meta/src/optimizer/mod.rs(3 hunks)pest/src/parser_state.rs(10 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- meta/src/grammar.pest
- meta/src/optimizer/mod.rs
🧰 Additional context used
🧬 Code Definitions (1)
pest/src/parser_state.rs (6)
pest/src/iterators/tokens.rs (3) (3)
pairs(144-144)pairs(146-146)new(33-58)pest/src/position.rs (2) (2)
new(45-47)span(98-107)pest/src/stack.rs (1) (1)
new(51-57)pest/src/span.rs (3) (3)
new(47-53)start(100-102)end(118-120)pest/src/iterators/pairs.rs (1) (1)
new(45-87)pest/src/iterators/flat_pairs.rs (1) (1)
new(32-46)
🔇 Additional comments (9)
pest/src/parser_state.rs (9)
13-20: Imports look consistent.
These new imports align well with the newly added types and functionalities, and no overshadowing or redundancy is apparent.
31-31: Use ofDerefis appropriate.
The import ofstd::ops::Derefis correctly utilized when callingderef()onCowinstances.
442-442: Stack type update is coherent.
Switching the parser stack toStack<SpanOrLiteral<'i>>makes sense for pushing string literals and spans uniformly.
1083-1112: Newstack_push_literalmethod looks good.
It offers a clear, always-successful push of a literal. Adequate doc comments and inline tests illustrate usage.
1471-1471: Updated push usage aligns withSpanOrLiteral.
Switching from directly pushingSpanto wrapping it inSpanOrLiteral::Spanis consistent and maintains existing functionality.
1503-1504:stack_peekadaptation is solid.
Usingas_borrowed_or_rc()followed byderef()correctly handles both borrowed and owned cases.
1532-1533:stack_popadaptation is consistent.
Mirroring thestack_peeklogic ensures the popped value works seamlessly as a literal or span.
1579-1580: Slice matching updated correctly.
Iterating overSpanOrLiteraland matching viaas_borrowed_or_rc()is straightforward and preserves functionality for partial stack matches.
1641-1641:stack_match_popbehavior is consistent.
Replacing directSpanusage withSpanOrLiteralintegration remains functionally valid.
|
Bumping this PR to Draft status until it's ready for a re-review. |
This reverts commit fbe2950.
|
Should be ready now! I added tests and the grammar-extra cfg. |
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
pest/src/parser_state.rs (1)
90-1161: 💡 Verification agent🧩 Analysis chain
Verify that this new feature is guarded with
#[cfg(feature = "grammar-extras")].According to the PR comments, this new functionality should be feature-guarded with
#[cfg(feature = "grammar-extras")]to prevent potential compilation issues due to mixed pest dependencies. However, this feature guard is missing from the implementation.Please verify if the entire new
PUSH_LITERALfunctionality, including theLiteralenum,BorrowedOrRcenum, andstack_push_literalmethod, should be conditionally compiled with this feature flag.
🏁 Script executed:
#!/bin/bash # Check if other similar functionality uses feature guards grep -r "#\[cfg(feature = \"grammar-extras\")]" --include="*.rs" .Length of output: 3960
Feature Guard Missing: Please wrap new literal functionality with
#[cfg(feature = "grammar-extras")].After verifying similar functionality via the grep search, we see that comparable features (in modules like
generator,meta, andvm) are conditionally compiled using#[cfg(feature = "grammar-extras")]. The newPUSH_LITERALfunctionality—including theLiteralenum, theBorrowedOrRcenum, and thestack_push_literalmethod—is currently missing this guard. To ensure compatibility with projects that do not enable the"grammar-extras"feature and to maintain consistency with the rest of the codebase, the new code should be conditionally compiled using this feature flag.
- Action Required:
• Wrap the definition ofLiteraland itsFromimplementations with#[cfg(feature = "grammar-extras")].
• Similarly, guard theBorrowedOrRcenum and its related conversion code.
• Apply the feature guard to thestack_push_literalmethod inParserState.**
🧹 Nitpick comments (4)
pest/src/parser_state.rs (4)
24-24: Add missing documentation comment forDerefimport.The
std::ops::Derefimport is used for theBorrowedOrRcimplementation but lacks a documentation comment to explain its purpose, unlike other imports in this file.
238-242: Consider adding a reason for usingRcinstead ofCowin documentation.The
BorrowedOrRcenum is well-designed but the reasoning behind usingRc<String>instead ofCow<'i, str>is not clear from the documentation. Based on previous comments in the PR, this choice was made to avoid copying when cloning strings, but this context would be valuable in the code comments.-enum BorrowedOrRc<'i> { +/// A helper enum that provides efficient string handling without unnecessary copying. +/// We use `Rc<String>` instead of `Cow<'i, str>` to avoid copying of owned strings when cloning, +/// as `Rc::clone` only increments a reference count. +enum BorrowedOrRc<'i> {
1152-1152: Consider removing or documenting#[allow(private_bounds)]attribute.The
#[allow(private_bounds)]attribute is used but its purpose is not clear. Consider adding a brief comment explaining why this attribute is needed or remove it if it's not necessary.
90-1161: Add comprehensive test coverage for the newPUSH_LITERALfunctionality.While the PR indicates local testing has been done, it would be beneficial to add comprehensive unit tests for the new
PUSH_LITERALfunctionality, covering both static and owned string literals. This would ensure the feature works as expected and remains stable as the codebase evolves.Consider adding tests that:
- Verify
stack_push_literalworks with both static and owned strings- Test interaction between
stack_push_literaland other stack operations- Verify that sequences combining normal character matching with literal pushing work correctly
- Test edge cases like empty strings and Unicode characters
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
generator/src/generator.rs(2 hunks)grammars/Cargo.toml(1 hunks)grammars/src/grammars/surround.pest(1 hunks)grammars/src/lib.rs(1 hunks)grammars/tests/surround.rs(1 hunks)meta/src/ast.rs(3 hunks)meta/src/optimizer/mod.rs(4 hunks)meta/src/parser.rs(7 hunks)meta/src/validator.rs(4 hunks)pest/src/parser_state.rs(10 hunks)vm/src/lib.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- generator/src/generator.rs
- meta/src/validator.rs
🧰 Additional context used
🧬 Code Definitions (3)
meta/src/optimizer/mod.rs (2)
meta/src/ast.rs (2)
string(451-453)push_literal(691-696)meta/src/parser.rs (1)
string(1065-1078)
meta/src/ast.rs (1)
meta/src/optimizer/mod.rs (1)
push_literal(1098-1103)
pest/src/parser_state.rs (6)
pest/src/error.rs (3)
from(83-85)from(89-92)start(498-503)pest/src/span.rs (4)
as_str(197-200)new(47-53)start(100-102)end(118-120)pest/src/iterators/pair.rs (2)
as_str(114-120)new(48-60)pest/src/iterators/pairs.rs (3)
as_str(116-125)as_str(570-574)new(45-87)pest/src/stack.rs (1)
new(51-57)pest/src/position.rs (2)
new(45-47)span(98-107)
🔇 Additional comments (33)
grammars/Cargo.toml (1)
18-18: Correctly enables the "grammar-extras" feature.The addition of the
features = ["grammar-extras"]attribute to thepest_derivedependency is correctly implemented. This follows the convention mentioned in the PR comments about feature-guarding the newPUSH_LITERALrule to prevent potential compilation issues with mixed pest dependencies.grammars/src/lib.rs (1)
67-85: Well-documented implementation of the surround module.The newly added
surroundmodule provides a clear example of how to use the newPUSH_LITERALfunctionality with different delimiter pairs. The documentation is comprehensive, explaining the purpose of the parser and its limitations. This module directly demonstrates the PR objective of supporting quoted strings with different start and end delimiters.grammars/tests/surround.rs (3)
17-27: Test cases verify parentheses functionality.The test correctly verifies that the surround parser handles strings enclosed in parentheses, validating a key use case for the new
PUSH_LITERALfunctionality.
29-39: Test cases verify angle brackets functionality.The test correctly verifies that the surround parser handles strings enclosed in angle brackets, providing good coverage of the different delimiter types supported.
41-51: Good edge case testing for delimiter mismatch.This test properly verifies that the parser rejects strings with mismatched delimiters (starting with one type and ending with another), which is an important validation of the grammar's correctness.
vm/src/lib.rs (1)
239-241: Correctly implements the PUSH_LITERAL functionality.The implementation of the
PushLiteralvariant is:
- Properly feature-guarded with
#[cfg(feature = "grammar-extras")]- Integrated correctly into the expression parser
- Implemented to take ownership of the string that needs to be pushed onto the stack
This directly implements the core functionality described in the PR objectives, allowing for pushing literal values onto the stack without consuming input.
grammars/src/grammars/surround.pest (1)
1-9: Well-designed grammar example for the new PUSH_LITERAL feature.This grammar effectively demonstrates the use case for the new PUSH_LITERAL feature by implementing a flexible quoted text matcher. It elegantly handles different delimiter pairs by pushing the appropriate closing delimiter onto the stack based on the detected opening delimiter.
meta/src/ast.rs (3)
97-99: Good implementation with proper feature gating.The new
PushLiteralvariant is correctly added to theExprenum and properly feature-gated with#[cfg(feature = "grammar-extras")], which aligns with the PR feedback.
326-327: Display implementation is correct.The implementation of the
Displaytrait for the new variant follows the established pattern in the codebase and correctly formats the PushLiteral variant.
689-696: Comprehensive test case.The test case effectively verifies that the string representation works correctly, including proper handling of special characters and escaping of quotes within the string.
meta/src/parser.rs (6)
169-171: Good implementation with proper feature gating.The new
PushLiteralvariant is correctly added to theParserExprenum and properly feature-gated with#[cfg(feature = "grammar-extras")].
210-212: Proper conversion implementation.The conversion from
ParserExpr::PushLiteraltoExpr::PushLiteralis correctly implemented in theconvert_nodefunction.
235-235: Rule renaming is properly handled.The
rename_meta_rulefunction is updated to include a case for the new_push_literalrule, returning a user-friendly name.
402-422: Well-implemented parsing logic with proper error handling.The implementation for handling the
_push_literalrule is thorough and includes:
- Proper string unescaping
- Correct node creation
- Appropriate error handling when the feature is not enabled
The code follows the established patterns in the codebase for parsing rules.
1346-1346: Test error message updated correctly.The test error message is updated to include the new
_push_literalrule in the list of expected rules.
1543-1579: Comprehensive test coverage.The tests thoroughly cover both successful parsing of the PUSH_LITERAL syntax and error conditions when invalid input is provided. The error test specifically verifies that a descriptive error message is shown when the PUSH_LITERAL syntax is used incorrectly.
meta/src/optimizer/mod.rs (4)
73-73: Proper optimization handling.The
to_optimizedfunction is correctly updated to handle the conversion fromExpr::PushLiteraltoOptimizedExpr::PushLiteral.
156-158: Good implementation with proper feature gating.The new
PushLiteralvariant is correctly added to theOptimizedExprenum and properly feature-gated with#[cfg(feature = "grammar-extras")].
334-335: Display implementation is correct.The implementation of the
Displaytrait for the new variant follows the established pattern in the codebase and correctly formats the PushLiteral variant.
1096-1103: Comprehensive test case.The test case effectively verifies that the string representation works correctly, including proper handling of special characters and escaping of quotes within the string.
pest/src/parser_state.rs (13)
90-96: NewLiteralenum provides flexibility for static and dynamic strings.The new
Literalenum is well-designed, allowing for both static strings (preferred for derived parsers) and owned strings (for VM contexts). This approach optimizes memory usage by avoiding unnecessary string allocations when static strings are available.
98-108: Good implementation ofFromtraits forLiteral.The implementations of
From<&'static str>andFrom<String>forLiteralfollow best practices, enabling ergonomic conversion from both string types. This will make the API more convenient to use.
245-253: Well-implementedas_strmethod forBorrowedOrRc.The
as_strmethod is correctly implemented to return a string slice regardless of the variant, making it convenient to work with either borrowed or owned strings uniformly.
255-262: Efficient conversion fromLiteraltoBorrowedOrRc.The implementation of
From<Literal>forBorrowedOrRccorrectly maps the variants and wraps the owned string in anRcto avoid deep copying when cloned later.
264-268: Good design ofSpanOrLiteralenum.The
SpanOrLiteralenum effectively unifies handling of both spans from the input and literal strings, which is essential for the newPUSH_LITERALfunctionality.
270-278: Efficient implementation ofas_borrowed_or_rcforSpanOrLiteral.The
as_borrowed_or_rcmethod correctly handles both variants, converting a span to a borrowed string and cloning a literal reference efficiently without deep copying the string content when it's in anRc.
489-489: Updatedstackfield type to support literals.The
stackfield has been correctly updated fromStack<Span<'i>>toStack<SpanOrLiteral<'i>>to accommodate both spans and literals. This is a key change that enables the newPUSH_LITERALfunctionality.
1130-1161: Well-implementedstack_push_literalmethod.The new
stack_push_literalmethod is well-designed:
- It accepts any type that can be converted to a
Literalusingimpl Into<Literal>, providing flexibility.- It correctly pushes the literal onto the stack after converting it to a
SpanOrLiteral::Literal.- The implementation is concise and follows the existing code style.
The doc comments and examples are also thorough and follow the pattern of other methods.
1520-1520: Updatedstack_pushmethod to work withSpanOrLiteral.The
stack_pushmethod has been correctly updated to push aSpanOrLiteral::Spaninstead of just aSpan.
1552-1553: Updatedstack_peekmethod to work withSpanOrLiteral.The
stack_peekmethod has been correctly updated to useas_borrowed_or_rc()to get a string representation of either a span or literal. The change is minimal and maintains the existing logic.
1581-1582: Updatedstack_popmethod to work withSpanOrLiteral.The
stack_popmethod has been correctly updated to useas_borrowed_or_rc()to get a string representation of either a span or literal, maintaining consistency with thestack_peekmethod.
1628-1629: Updatedstack_match_peek_slicemethod to work withSpanOrLiteral.The matcher closure in
stack_match_peek_slicehas been correctly updated to useas_borrowed_or_rc()to get a string representation of either a span or literal. The change is minimal and preserves the existing logic.
1690-1690: Updatedstack_match_popmethod to work withSpanOrLiteral.The string matching in
stack_match_pophas been correctly updated to useas_borrowed_or_rc()to get a string representation of either a span or literal, maintaining consistency with other methods.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
derive/tests/surround.pest (1)
3-7: Consider making the grammar more extensible for future delimiter pairs.The current implementation works well for parentheses and angle brackets. If you plan to support more delimiter pairs in the future, you might want to consider a more maintainable structure.
_QuoteStart = _{ ( "(" ~ PUSH_LITERAL(")") ) | ( "<" ~ PUSH_LITERAL(">") ) + // Easy to add more delimiter pairs: + // | ( "[" ~ PUSH_LITERAL("]") ) + // | ( "{" ~ PUSH_LITERAL("}") ) + // | ( "/*" ~ PUSH_LITERAL("*/") ) }derive/tests/surround.rs (1)
35-72: Consider adding more comprehensive test cases.While the current tests cover basic functionality, consider adding tests for:
- Nested quotes (e.g.,
"(outer <inner>)").- Empty quotes (e.g.,
"()"and"<>").- Quotes containing their counterpart delimiters (e.g.,
"(contains > symbol)"or"<contains ) symbol>").- Edge cases with whitespace or special characters.
#[test] #[cfg(feature = "grammar-extras")] fn nested_quotes() { parses_to! { parser: SurroundParser, input: "(outer <inner>)", rule: Rule::Quote, tokens: [ QuoteChars(1, 14) ] } } #[test] #[cfg(feature = "grammar-extras")] fn empty_quotes() { parses_to! { parser: SurroundParser, input: "()", rule: Rule::Quote, tokens: [ QuoteChars(1, 1) ] } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
derive/tests/surround.pest(1 hunks)derive/tests/surround.rs(1 hunks)pest/src/parser_state.rs(10 hunks)vm/src/lib.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- vm/src/lib.rs
🧰 Additional context used
🧬 Code Definitions (2)
derive/tests/surround.rs (1)
pest/src/parser_state.rs (1)
rule(677-801)
pest/src/parser_state.rs (7)
pest/src/span.rs (4)
as_str(197-200)new(47-53)start(100-102)end(118-120)pest/src/iterators/pairs.rs (3)
as_str(116-125)as_str(570-574)new(45-87)pest/src/iterators/pair.rs (2)
as_str(114-120)new(48-60)pest/src/stack.rs (1)
new(51-57)pest/src/position.rs (2)
new(45-47)span(98-107)pest/src/iterators/tokens.rs (1)
new(33-58)pest/src/iterators/flat_pairs.rs (1)
new(32-46)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Unit, Style, and Lint Testing
- GitHub Check: check for semver compatibility
- GitHub Check: cargo hack check --feature-powerset
- GitHub Check: check for no_std compatibility
- GitHub Check: Test Coverage
- GitHub Check: Documentation check
- GitHub Check: Minimal Versions Testing
- GitHub Check: Fuzzing
🔇 Additional comments (20)
derive/tests/surround.pest (2)
1-10: Grammar implementation looks good and demonstrates the PUSH_LITERAL feature well.This grammar showcases a practical use case for the new
PUSH_LITERALfeature by implementing a quote parser that supports different delimiter pairs. The implementation is elegant and demonstrates howPUSH_LITERALcan simplify working with paired delimiters compared to alternative approaches.The grammar correctly:
- Captures opening delimiters and pushes their closing counterparts
- Uses
PEEKto match content until the closing delimiter- Consumes the closing delimiter with
POP
9-10: Verify PEEK usage with appropriate tests.The
!PEEK ~ ANYpattern is key to this grammar's functionality. It ensures characters are consumed only if they're not the expected closing delimiter. This is a good demonstration of combiningPUSH_LITERALwithPEEKto create a flexible matching pattern.Make sure to test this grammar with nested quotes and quotes containing delimiters, such as:
"(hello (world))""<text with > inside>"
to verify the grammar behaves as expected in edge cases.derive/tests/surround.rs (5)
14-17: Correctly feature-guarded the implementation.Good use of the
#[cfg(feature = "grammar-extras")]feature guard for the parser implementation as discussed in the PR comments. This will prevent compilation issues due to mixed pest dependencies.
18-33: Well-documented parser with clear examples.The documentation for the
SurroundParseris clear and concise. It effectively explains the purpose of the grammar and provides examples of the supported formats.
35-46: Test case validates parenthesis delimiter functionality.This test confirms the parser correctly handles text surrounded by parentheses, which is one of the key use cases for the new
PUSH_LITERALfeature.
48-59: Test case validates angle bracket delimiter functionality.This test confirms the parser correctly handles text surrounded by angle brackets, demonstrating the flexibility of the
PUSH_LITERALapproach for different delimiter pairs.
61-72: Test case validates error handling for mismatched delimiters.Good negative test case that verifies the parser correctly rejects input with mismatched opening and closing delimiters.
pest/src/parser_state.rs (13)
22-22: Good: Adding the needed import for DerefThe addition of
core::ops::Derefis appropriate for use in theBorrowedOrRc.as_strmethod.
90-96: Well-designed enum for handling string literalsThe
Literalenum provides a clean way to handle both static and owned strings, with proper documentation explaining when to use each variant.
98-108: Good implementations for conversionThese From implementations provide convenient, zero-cost conversions from both static string references and owned strings to the
Literalenum.
238-258: Well-designed helper for efficient string handlingThe
BorrowedOrRcenum provides an efficient way to handle string references, usingRc<String>to avoid unnecessary copying when cloning theOwnedvariant. The documentation clearly explains the rationale for this design choice.The
as_strmethod implementation correctly handles both variants, usingderef()on theRc<String>to get a string slice.
260-267: Clean conversion implementationThe conversion from
LiteraltoBorrowedOrRcis implemented correctly, with appropriate handling for both variants.
269-283: Good design for unifying spans and literalsThe
SpanOrLiteralenum allows the parser to handle both spans and literals in a uniform way. Theas_borrowed_or_rcmethod provides a clean interface for working with either variant.The use of
#[inline]is appropriate for this small, frequently-used method.
494-494: Correct stack type updateUpdating the stack type from
Stack<Span<'i>>toStack<SpanOrLiteral<'i>>is necessary to support the new literal functionality.
1135-1165: Well-implemented literal push methodThe
stack_push_literalmethod is well-designed and implemented, with clear documentation and examples. The use ofimpl Into<Literal>provides flexibility for callers.The method correctly returns
Ok(self)since the operation always succeeds.Since the PR description mentions you haven't added tests yet, consider adding unit tests for this new functionality to verify it works correctly with different input types.
1524-1525: Appropriate stack_push updateThe update to push
SpanOrLiteral::Spaninstead ofSpanis necessary to work with the new stack type.
1556-1558: Correct stack_peek adaptationThe method correctly adapts to the new
SpanOrLiteraltype, usingas_borrowed_or_rcto get a string representation regardless of whether the stack entry is a span or a literal.
1585-1587: Proper stack_pop adaptationSimilar to
stack_peek, this method correctly adapts to theSpanOrLiteraltype, maintaining the existing functionality while supporting the new literal capability.
1632-1634: Good matcher implementation for stack_match_peek_sliceThe matcher function is correctly updated to work with the
SpanOrLiteraltype, ensuring that the method works with both spans and literals.
1694-1695: Proper stack_match_pop adaptationThe method is correctly updated to work with the
SpanOrLiteraltype, maintaining the existing functionality.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pest/src/lib.rs (1)
271-275: Consider adding feature guard information to documentationThe documentation clearly explains what the
PUSH_LITERALrule does and provides a helpful example. However, based on the PR comments, this feature should be guarded with#[cfg(feature = "grammar-extras")], but this constraint isn't mentioned in the documentation.Consider adding a note about the feature requirement:
`PUSH_LITERAL("a")` pushes the argument to the stack without considering the input. The argument must be a literal string. This is often useful in conjunction with another rule before it. For example, `"[" ~ PUSH_LITERAL("]")` will look for an opening bracket `[` and, if it finds one, will push a closing bracket `]` to the stack. + + Note: This feature requires the `grammar-extras` feature to be enabled.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
derive/tests/surround.rs(1 hunks)pest/src/lib.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- derive/tests/surround.rs
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: cargo hack check --feature-powerset
- GitHub Check: check for semver compatibility
- GitHub Check: Test Coverage
- GitHub Check: Minimal Versions Testing
- GitHub Check: check for no_std compatibility
- GitHub Check: Unit, Style, and Lint Testing
- GitHub Check: Documentation check
- GitHub Check: Fuzzing
🔇 Additional comments (2)
pest/src/lib.rs (2)
233-234: Documentation for the new PUSH_LITERAL rule looks good.The addition of the
PUSH_LITERALspecial rule documentation is clear and follows the same format as other special rules in this section. This makes it discoverable alongside other stack manipulation features.
266-266: Updated section header correctly includes the new rule.The section header now properly includes the new
PUSH_LITERALrule alongside the existing stack manipulation rules.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
generator/src/generator.rs (1)
1059-1070: Test covers basic functionality.The test verifies that the code generator produces the expected output for the
PushLiteralvariant. This ensures that the implementation is working correctly.Consider adding a test that combines
PushLiteralwith other expressions to verify more complex scenarios, such as the use case mentioned in the PR: matching one character and then pushing another.vm/tests/surround.rs (2)
1-37: Good test setup for the new functionality.The test correctly verifies a use case of the new
PUSH_LITERALrule by testing a quoted string with parentheses. The test structure follows the pattern used in other pest VM tests.Consider adding more test cases to verify the functionality with different delimiter pairs (like angle brackets) since the PR mentions this as a key use case. For example:
#[test] fn quote() { parses_to! { parser: vm(), input: "(abc)", rule: "Quote", tokens: [ QuoteChars(1, 4) ] }; } + +#[test] +fn angle_quote() { + parses_to! { + parser: vm(), + input: "<abc>", + rule: "AngleQuote", + tokens: [ + QuoteChars(1, 4) + ] + }; +}
18-18: Consider including the grammar content for clarity.It would be helpful to add a comment describing the grammar rules in
surround.pestor include a simplified version directly in this file to make the test more self-contained and easier to understand.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
generator/src/generator.rs(3 hunks)vm/tests/surround.pest(1 hunks)vm/tests/surround.rs(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- vm/tests/surround.pest
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: cargo hack check --feature-powerset
- GitHub Check: check for no_std compatibility
- GitHub Check: Test Coverage
- GitHub Check: Minimal Versions Testing
- GitHub Check: check for semver compatibility
- GitHub Check: Unit, Style, and Lint Testing
- GitHub Check: Documentation check
- GitHub Check: Fuzzing
🔇 Additional comments (2)
generator/src/generator.rs (2)
573-578: Good implementation with proper feature guarding.The implementation of the
PushLiteralvariant ingenerate_expris clean and properly feature-guarded with#[cfg(feature = "grammar-extras")]as discussed in the PR comments. This will prevent compilation issues with mixed pest dependencies.
764-769: Correctly implemented for atomic expressions as well.The implementation for
generate_expr_atomicfollows the same pattern as the regular variant, which is consistent with how other expression types are handled in the codebase.
tomtau
left a comment
There was a problem hiding this comment.
it seems there's a semver breaking change due to the auto-traits not being implemented:
type ParserState is no longer Send, in pest/src/parser_state.rs:448
type ParserState is no longer Sync, in pest/src/parser_state.rs:448
| stack: Stack<Span<'i>>, | ||
| stack: Stack<SpanOrLiteral<'i>>, |
There was a problem hiding this comment.
seems this is a semver-breaking change
There was a problem hiding this comment.
Just because of the Send/Sync, right? (I want to make sure I didn't miss any other aspect -- I wouldn't think so, since this is all just private to parser_state.rs)
There was a problem hiding this comment.
Yes, looks like it. There are of course semver breaking parts on the Rule enum but that's not usually externally consumed, so that should be ok (it's got semver broken since early version 2.X and noone reported any issues due to it)
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
pest/src/parser_state.rs (2)
218-265: Consider adding feature flag for grammar-extrasBased on the discussions in the PR comments, these new changes should be guarded by
#[cfg(feature = "grammar-extras")]to prevent compatibility issues with mixed pest dependencies. This will help maintain backward compatibility with existing codebases.+ #[cfg(feature = "grammar-extras")] enum BorrowedOrArc<'i> { Borrowed(&'i str), Owned(Arc<String>), } + #[cfg(feature = "grammar-extras")] enum SpanOrLiteral<'i> { Span(Span<'i>), Literal(BorrowedOrArc<'i>), } // And similar for the implementation blocks and the stack_push_literal methodAlso applies to: 476-476, 1117-1147, 1506-1507, 1538-1539, 1567-1568, 1614-1615, 1676-1676
476-476: Consider backward compatibility concernsChanging the stack field type from
Stack<Span<'i>>toStack<SpanOrLiteral<'i>>could affect API consumers that may be directly accessing this field through unsafe code. Consider whether this change needs to be communicated in release notes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pest/src/parser_state.rs(9 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
pest/src/parser_state.rs (10)
pest/src/error.rs (5)
fmt(719-721)fmt(725-730)from(83-85)from(89-92)start(498-503)pest/src/position.rs (3)
fmt(445-447)new(45-47)span(98-107)pest/src/iterators/pairs.rs (5)
fmt(456-458)fmt(462-471)as_str(116-125)as_str(570-574)new(45-87)pest/src/span.rs (5)
fmt(274-280)as_str(197-200)new(47-53)start(100-102)end(118-120)pest/src/iterators/pair.rs (4)
fmt(328-338)fmt(342-363)as_str(114-120)new(48-60)pest/src/iterators/flat_pairs.rs (2)
fmt(152-156)new(32-46)pest/src/iterators/tokens.rs (2)
fmt(129-131)new(33-58)pest/src/stack.rs (1)
new(51-57)pest/src/iterators/line_index.rs (1)
new(18-31)pest/src/pratt_parser.rs (1)
new(211-219)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: check for no_std compatibility
- GitHub Check: check for semver compatibility
- GitHub Check: Test Coverage
- GitHub Check: cargo hack check --feature-powerset
- GitHub Check: Unit, Style, and Lint Testing
- GitHub Check: Documentation check
- GitHub Check: Minimal Versions Testing
- GitHub Check: Fuzzing
🔇 Additional comments (12)
pest/src/parser_state.rs (12)
13-14: New imports added for literal handling supportThese new imports support the functionality to handle both borrowed string slices and owned Arc strings efficiently, with Deref being used to access the string content from Arc wrappers.
Also applies to: 18-19, 23-23
218-240: New BorrowedOrArc enum for efficient string handlingThe enum provides an elegant solution to avoid unnecessary string copying by using
Arc<String>instead ofCow<'i, str>when cloning. This design ensures thread safety by implementingSendandSynctraits.The comment clearly explains why this approach was chosen over
Cow- to avoid copying when theOwnedvariant is cloned, sinceArc::cloneonly increments a reference count.
242-249: Implementation of From for BorrowedOrArcThe implementation is concise and correctly handles both borrowed and owned variants of the input Cow, preserving the efficiency goal by wrapping owned strings in Arc.
251-265: SpanOrLiteral enum and implementationThis enum is the core addition that enables PUSH_LITERAL functionality by allowing the stack to hold both Spans (matched input) and Literals (pushed values). The
as_borrowed_or_rcmethod properly handles both variants and returns a cloned BorrowedOrArc to avoid borrowing issues when calling methods that require mutable access to the parser state.
476-476: Updated stack field type in ParserStateThe field type has been changed from
Stack<Span<'i>>toStack<SpanOrLiteral<'i>>to accommodate both matched spans and literal values on the stack.
1117-1147: New stack_push_literal methodThis method implements the core PUSH_LITERAL functionality by pushing a literal string onto the stack without consuming any input. The implementation is straightforward and well-documented with usage examples.
The method signature takes
impl Into<Cow<'static, str>>which provides flexibility while enforcing that literals must have a 'static lifetime.
1506-1507: Updated stack_push method to use SpanOrLiteralThe stack.push call has been updated to wrap the span in a SpanOrLiteral::Span variant to accommodate the new stack type.
1538-1539: Updated stack_peek methodThe method now correctly handles both Span and Literal variants by using the as_borrowed_or_rc method to get a string representation of the stack item.
1567-1568: Updated stack_pop methodSimilar to stack_peek, this method has been updated to handle both Span and Literal variants.
1614-1615: Updated stack_match_peek_slice methodThe matcher closure has been updated to work with SpanOrLiteral items by extracting the string representation using the as_borrowed_or_rc method.
1676-1676: Updated stack_match_pop methodThe string matching logic has been updated to handle both Span and Literal variants on the stack.
218-265:Details
❓ Verification inconclusive
Verify Send and Sync implications
Based on previous review comments, the change to use Arc instead of Rc may have semver-breaking implications due to the Send/Sync requirements. Ensure this is properly documented and communicated to users.
🏁 Script executed:
#!/bin/bash # Check for implementations of Send/Sync before this change grep -r "impl.*Send" --include="*.rs" . grep -r "impl.*Sync" --include="*.rs" .Length of output: 83
Action: Verify and Document the Send/Sync Behavior Impact
The change from using
Rcto usingArcfor theBorrowedOrArctype leverages Rust’s auto trait implementations forSendandSync. The grep searches did not reveal any explicit implementations ofSendorSync, which is expected since these are automatically derived by the compiler for types likeArc<String>. However, because this change might have semver implications when consumers rely on specific thread-safety guarantees, please ensure that this behavior is clearly documented in the changelog or migration guide.
- Confirm that the documentation or release notes detail that
Arcis used to guaranteeSendandSynctraits, contrasting with the previousRcusage.- If further confidence is needed, perform a manual review of the surrounding codebase to ensure no unintended consequences arise from this change.
|
Updated |
tomtau
left a comment
There was a problem hiding this comment.
@yshavit could you also update this book section: https://github.com/pest-parser/book/blob/master/src/grammars/syntax.md#the-stack-wip ?
|
Will do! |
* add documentation for PUSH_LITERAL See pest-parser/pest#1092 * fill out the rest of the stack docs
Add a rule
PUSH_LITERAL("a"). This rule always matches and never consumes any input, and it pushes"a"to the stack. The argument must be a literal.Note
I haven't added any unit tests or documentation yet; I wanted to get a sense of whether you'd be amenable to this new rule in principle, before I did that. But, I tested it locally on a grammar I have, and it does work.
This new rule accomplishes the goals of #880 (in a slightly different way). Instead of
PUSH_OTHER("-", " "), this is justPUSH_LITERAL(" "). If you wanted to match A and then push B, you just combine them via a sequence:"-" ~ PUSH_LITERAL(" ").This can be used to generalize quoted strings to a pair of matching but non-equal delimiters. For example, many programming languages let you express strings using either double or single quotes, and don't require escaping of the other:
This happens to work because the starting and ending char are the same, but that's not always desirable. For example, we may want to allow strings to be delimited by either
(...)or<...>, with similar rules as above, such that(a>)and<a)>are both valid.With
PUSH_LITERAL, this becomes:Happy to hear your thoughts!
Summary by CodeRabbit
New Features
Tests