Parsanol provides a declarative DSL for constructing parsers using PEG semantics. It offers excellent error reporting, memory efficiency through object pooling, and optional Rust native extensions for maximum performance. The library is designed as a drop-in replacement for Parslet while offering significant performance improvements.
|
Note
|
Parsanol is inspired by the Parslet library by Kaspar Schiess. While maintaining full API compatibility with Parslet, Parsanol features a complete independent implementation with additional performance optimizations and features. |
-
PEG-based Parser Construction - Declarative grammar definition
-
Detailed Error Reporting - Precise failure location and context
-
Rust Native Extension - Up to 29x faster parsing
-
Slice Support - Source position preservation for linters and IDEs
-
Tree Transformation - Pattern-based AST construction
-
Streaming Builder API - Single-pass parsing with callbacks
-
Parallel Parsing - Multi-core batch processing
-
Infix Expression Parsing - Built-in operator precedence support
-
Security Features - Input size and recursion limits
-
Debug Tools - Tracing and grammar visualization
Add this line to your application’s Gemfile:
gem 'parsanol'And then execute:
bundle installOr install it yourself as:
gem install parsanolDefine parsers by creating a class that inherits from Parsanol::Parser and declaring rules:
require 'parsanol'
class MyParser < Parsanol::Parser
rule(:keyword) { str('if') | str('while') }
rule(:expression) { keyword >> str('(') >> expression >> str(')') }
root(:expression)
end
parser = MyParser.new
result = parser.parse('if(x)')Parsanol provides detailed error messages when parsing fails:
begin
parser.parse('invalid input')
rescue Parsanol::ParseFailed => e
puts e.message
# => "Expected 'if' at line 1 char 1."
endConvert parse trees to AST using pattern-based transformations:
class MyTransform < Parsanol::Transform
rule(keyword: simple(:k)) { KeywordNode.new(k) }
rule(expression: subtree(:e)) { ExpressionNode.new(e) }
end
ast = MyTransform.new.apply(parse_tree)For maximum performance, compile the Rust native extension:
# Install Rust toolchain first
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Compile the extension
bundle exec rake compileAll parse results include source position information through Parsanol::Slice objects:
# Parse returns results with position info
result = parser.parse("hello world", mode: :native)
name = result[:name]
# Access the value
name.to_s # => "hello"
# Access position information
name.offset # => 0 (byte offset in original input)
name.length # => 5
name.line_and_column # => [1, 1] (1-indexed)
# Compare with strings (Slice compares by content)
name == "hello" # => true
# Extract from original source
name.extract_from(input) # => "hello"When using JSON mode, position information is included inline with each value:
result = parser.parse("hello", mode: :json)
# => {
# "name": {
# "value": "hello",
# "offset": 0,
# "length": 5,
# "line": 1,
# "column": 1
# }
# }This format ensures position information is available for all downstream consumers including IDEs, linters, and error reporting tools.
class Parsanol::Slice
# Core attributes
def content # String content
def offset # Byte offset in original input
def length # Length of the slice
def line_and_column # [line, column] tuple (requires line cache)
# String compatibility
def to_s # Returns content
def to_str # Implicit string conversion
def ==(other) # Compares content with String or Slice
# JSON serialization
def to_json # Returns { "value" => ..., "offset" => ..., ... }
def as_json # Returns hash with position info
# Utility
def to_span(input) # Returns SourceSpan object
def extract_from(input) # Extracts content from original input
endThis is essential for:
-
Linters - Map errors back to source locations
-
IDEs - Provide go-to-definition, hover info
-
Comment attachment - Attach remarks to AST nodes by position
-
Source extraction - Get original text for any parsed element
Parsanol provides full Parslet API compatibility with two migration modes.
Simply replace the parslet gem with parsanol in your Gemfile:
# Gemfile
- gem 'parslet'
+ gem 'parsanol'Your existing code works without modification:
# No changes needed!
require 'parslet' # Parsanol aliases itself
class MyParser < Parslet::Parser
rule(:number) { match('[0-9]').repeat(1) }
root(:number)
end
parser = MyParser.new
parser.parse('123') # Works exactly the same| Parslet API | Status | Notes |
|---|---|---|
|
✅ |
Literal string match |
|
✅ |
Character class |
|
✅ |
Any single character |
|
✅ |
Sequential composition |
|
✅ |
Ordered choice |
|
✅ |
Repetition with bounds |
|
✅ |
Optional (zero or one) |
|
✅ |
Label capture |
|
✅ |
Negative lookahead |
|
✅ |
Positive lookahead |
|
✅ |
Precedence climbing |
|
✅ |
Treetop-style expression parsing |
|
✅ |
Tree transformation |
|
✅ |
Match simple value |
|
✅ |
Match array of values |
|
✅ |
Match any subtree |
|
✅ |
Parsanol::Slice compatible |
|
✅ |
Named capture extraction (NEW in 1.2.0) |
|
✅ |
Isolated capture context (NEW in 1.2.0) |
|
✅ |
Runtime-determined parsing (NEW in 1.2.0) |
|
Note
|
The new capture, scope, and dynamic atoms provide powerful extraction and context-sensitive parsing capabilities. See the Captures section for details. |
┌─────────────────────────────────────┐
│ User Parser │
│ (inherits from Parsanol::Parser) │
└─────────────────┬───────────────────┘
│
┌─────────────────▼───────────────────┐
│ Parsing Backend │
├─────────────────┬───────────────────┤
│ Pure Ruby │ Rust Native │
│ (default) │ (optional) │
└─────────────────┴───────────────────┘
│
┌─────────────────▼───────────────────┐
│ Parse Tree │
│ (with Slice position info) │
└─────────────────┬───────────────────┘
│
┌─────────────────▼───────────────────┐
│ Parsanol::Transform │
│ (pattern-based transformation) │
└─────────────────┬───────────────────┘
│
┌─────────────────▼───────────────────┐
│ User AST │
└─────────────────────────────────────┘Parsanol offers 3 parsing modes through the parse method. All modes return Parsanol::Slice objects with position information:
result = parser.parse(input, mode: :native) # mode is optional, :native is default| Mode | Backend | Keys | Values | Best For |
|---|---|---|---|---|
|
Pure Ruby |
Symbol |
Slice |
Debugging, fallback |
|
Rust FFI |
Symbol |
Slice |
Production (DEFAULT) |
|
Rust FFI |
String |
Hash + position |
APIs, serialization |
All modes include position info (offset, length, line, column) by default.
- Ruby Mode (
:ruby) -
Pure Ruby parsing engine. Use for debugging grammar issues or when native extension is unavailable.
- Native Mode (
:native) -
Rust parser via FFI with automatic transformation to Ruby-friendly format (Symbol keys). ~20x faster than pure Ruby. This is the default mode.
- JSON Mode (
:json) -
Rust parser that returns JSON-serializable output with inline position information. Use for APIs and when you need JSON-compatible output.
For maximum performance (~29x faster than pure Ruby), use the ZeroCopy interface which bypasses Ruby transformation:
# Low-level API: Direct Rust access, String keys
grammar = Parsanol::Native.serialize_grammar(parser.root)
result = Parsanol::Native.parse_to_ruby_objects(grammar, input)
# Returns: { "name" => Slice("hello", offset: 0, length: 5) }
# High-level ZeroCopy: Include module for direct Ruby objects
class FastParser < Parsanol::Parser
include Parsanol::ZeroCopy
rule(:number) { match('[0-9]').repeat(1) }
root(:number)
output_types(number: MyNumberClass) # Map to Ruby classes
end
parser = FastParser.new
expr = parser.parse("42") # Returns MyNumberClass instance directly| Method | Keys | Use Case |
|---|---|---|
|
String |
Low-level, Slice objects directly from Rust |
|
Ruby objects |
Maximum performance, direct object construction |
|
Note
|
ZeroCopy requires the native extension and type mapping definitions. |
| Your Need | Use This | Why |
|---|---|---|
Building an API |
JSON mode ( |
Direct JSON serialization |
Building a linter/IDE |
Native mode ( |
Position info for errors |
Need position info |
Parse Modes (not ZeroCopy) |
ZeroCopy skips position tracking |
High-throughput parsing |
ZeroCopy |
Maximum performance |
Type-safe AST with methods |
ZeroCopy |
Direct typed object construction |
Debugging grammar |
Ruby mode ( |
Pure Ruby, easier to trace |
# 1. Define your AST classes with methods
module Calculator
class Expr
def eval = raise NotImplementedError
end
class Number < Expr
attr_reader :value
def initialize(value) = @value = value
def eval = @value
end
class BinOp < Expr
attr_reader :left, :op, :right
def initialize(left:, op:, right:)
@left, @op, @right = left, op, right
end
def eval
case @op
when '+' then @left.eval + @right.eval
when '-' then @left.eval - @right.eval
when '*' then @left.eval * @right.eval
when '/' then @left.eval / @right.eval
end
end
end
end
# 2. Define parser with ZeroCopy and output_types
class CalculatorParser < Parsanol::Parser
include Parsanol::ZeroCopy
rule(:number) { match('[0-9]').repeat(1).as(:int) }
rule(:expression) { (number.as(:left) >> add_op >> expression.as(:right)).as(:binop) | number }
root(:expression)
# Map rules to Ruby classes - Rust constructs these directly!
output_types(
number: Calculator::Number,
binop: Calculator::BinOp
)
end
# 3. Parse and evaluate - no transform needed!
parser = CalculatorParser.new
expr = parser.parse("2 + 3 * 4") # Returns Calculator::BinOp directly
puts expr.eval # => 14 (with proper precedence)When you don’t need typed objects, use parse_to_ruby_objects for direct Slice access:
# Direct FFI call - bypasses transformation, String keys
grammar = Parsanol::Native.serialize_grammar(MyParser.new.root)
result = Parsanol::Native.parse_to_ruby_objects(grammar, input)
# Result structure (String keys, Slice values):
# { "name" => Slice("hello", offset: 0, length: 5),
# "value" => Slice("42", offset: 10, length: 2) }
# Access position info directly
result["name"].offset # => 0
result["name"].to_s # => "hello"The ZeroCopy module requires:
-
Native extension - Run
bundle exec rake compile -
Type mapping - Define
output_typesin your parser -
Matching constructors - Your Ruby classes must accept the parsed attributes
For complex types, you may also need Rust-side type definitions with #[derive(RubyObject)] for full zero-copy FFI construction.
Behind the scenes, the Rust implementation uses one of two parsing backends:
| Backend | Use Case | Characteristics |
|---|---|---|
Packrat (default) |
Complex grammars |
O(n) guaranteed, higher memory |
Bytecode VM |
Simple patterns |
Lower memory, faster for linear patterns |
Auto |
Variable workloads |
Analyzes grammar, selects best backend |
The Ruby bindings automatically use the best backend for your grammar:
-
Uses
Backend::Autoby default (same as parsanol-rs) -
Detects nested repetitions, overlapping choices
-
Recommends Packrat for complex grammars
-
Falls back to Bytecode for simple patterns
|
Note
|
The backend selection is transparent to Ruby users. The parser object automatically uses the optimal backend based on grammar analysis. |
For more details on backend selection and grammar analysis, see the Parsing Backends documentation.
Parsanol 1.2.0 introduces powerful new features for extracting and managing parsed data.
Extract named values from parsed input, similar to named groups in regular expressions:
require 'parsanol/parslet'
include Parsanol::Parslet
# Basic capture
parser = str('hello').capture(:greeting)
result = parser.parse("hello")
puts result[:greeting].to_s # => "hello"
# Multiple captures - parse key=value pairs
kv_parser = match('[a-z]+').capture(:key) >>
str('=') >>
match('[a-zA-Z0-9]+').capture(:value)
result = kv_parser.parse("name=Alice")
puts result[:key].to_s # => "name"
puts result[:value].to_s # => "Alice"Create isolated capture contexts. Captures inside a scope are discarded when the scope exits:
# Without scope: inner captures leak out
parser = str('a').capture(:temp) >> str('b') >> str('c').capture(:temp)
# With scope: inner captures are discarded
parser = str('prefix').capture(:outer) >>
scope { str('inner').capture(:inner) } >>
str('suffix').capture(:outer_end)
result = parser.parse("prefix inner suffix")
puts result[:inner] # => nil (discarded)
puts result[:outer] # => "prefix"Scopes are essential for: - Parsing nested structures without capture pollution - Recursive parsing with isolated capture state - Memory-bounded parsing of repeated structures
Runtime-determined parsing via callbacks. The grammar can change based on context:
# Type-driven value parsing
class TypeParser < Parsanol::Parser
include Parsanol::Parslet
rule(:type) { match('[a-z]+').capture(:type) }
rule(:value) do
dynamic do |ctx|
case ctx[:type].to_s
when 'int' then match('\d+')
when 'str' then match('[a-z]+')
when 'bool' then str('true') | str('false')
else match('[a-z]+')
end.capture(:value)
end
end
rule(:declaration) { type >> str(':') >> match('[a-z]+').capture(:name) >> str('=') >> value }
root :declaration
end
parser = TypeParser.new
result = parser.parse("int:count=42")
puts result[:type].to_s # => "int"
puts result[:value].to_s # => "42"The DynamicContext provides:
- ctx[:name] - Access captured values
- ctx.remaining - Remaining input from current position
- ctx.pos - Current byte position
- ctx.input - Full input string
For maximum performance, use the streaming builder API which eliminates intermediate AST construction:
require 'parsanol'
class StringCollector
include Parsanol::BuilderCallbacks
def initialize
@strings = []
end
def on_string(value, offset, length)
@strings << value
end
def finish
@strings
end
end
grammar = Parsanol::Native.serialize_grammar(MyParser.new.root)
builder = StringCollector.new
result = Parsanol::Native.parse_with_builder(grammar, input, builder)
# result: ["hello", "world"]| Method | Description | Default |
|---|---|---|
|
Parsing started |
No-op |
|
Parsing succeeded |
No-op |
|
Parsing failed |
No-op |
|
String/slice matched |
No-op |
|
Integer matched |
No-op |
|
Float matched |
No-op |
|
Boolean matched |
No-op |
|
Nil matched |
No-op |
|
Entering a hash/object |
No-op |
|
Hash key encountered |
No-op |
|
Exiting a hash/object |
No-op |
|
Entering an array |
No-op |
|
Exiting an array |
No-op |
|
Parsing complete |
Returns nil |
Parse multiple inputs using all CPU cores:
require 'parsanol/parallel'
grammar = MyParser.new.serialize_grammar
inputs = Dir.glob("*.json").map { |f| File.read(f) }
# Parse all files in parallel
results = Parsanol::Parallel.parse_batch(grammar, inputs)
# With configuration
config = Parsanol::Parallel::Config.new
.with_num_threads(4)
.with_min_chunk_size(50)
results = Parsanol::Parallel.parse_batch(grammar, inputs, config: config)Built-in support for parsing infix expressions with operator precedence:
class CalculatorParser < Parsanol::Parser
rule(:number) { match('[0-9]').repeat(1).as(:int) }
rule(:primary) { number | str('(') >> expr >> str(')') }
rule(:expr) {
infix_expression(primary,
[str('*'), 2, :left],
[str('/'), 2, :left],
[str('+'), 1, :left],
[str('-'), 1, :left],
[str('^'), 3, :right] # Right-associative
)
}
root(:expr)
endParsanol supports treetop-style expression strings for quick grammar definition:
# Using exp() for treetop-style expressions
class QuickParser < Parsanol::Parser
rule(:word) { exp("'a' 'b' ?") } # 'a' followed by optional 'b'
root(:word)
end
# Equivalent to:
rule(:word) { str('a') >> str('b').maybe }| Syntax | Description |
|---|---|
|
Literal string match |
|
Character class |
|
Any single character |
|
Sequence (concatenation) |
|
Alternative (choice) |
|
Optional (zero or one) |
|
Zero or more repetitions |
|
One or more repetitions |
|
Between 2 and 5 repetitions |
|
Grouping |
|
Note
|
Whitespace is required before operators: |
The expression parser is pure Ruby (not Rust-accelerated) since it runs only at grammar definition time. The resulting atoms can still be used with Rust-accelerated parsing:
atom = Parsanol.exp("'a' +")
# Ruby parsing
atom.parse('aaa')
# Rust-accelerated parsing (if native extension available)
grammar = Parsanol::Native.serialize_grammar(atom)
Parsanol::Native.parse_to_ruby_objects(grammar, 'aaa')For parsing untrusted input, use built-in limits:
result = Parsanol::Native.parse_with_limits(
grammar_json,
untrusted_input,
max_input_size: 10 * 1024 * 1024, # 10 MB max
max_recursion_depth: 100 # Limit recursion
)Enable tracing for debugging grammars:
# Parse with trace
result, trace = Parsanol::Native.parse_with_trace(grammar_json, input)
puts trace
# Generate grammar visualization
mermaid = Parsanol::Native.grammar_to_mermaid(grammar_json)
dot = Parsanol::Native.grammar_to_dot(grammar_json)# Run all tests
bundle exec rake spec
# Run unit tests only
bundle exec rake spec:unit
# Run specific test file
bundle exec rspec spec/parsanol/atoms/str_spec.rb# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Compile the native extension
bundle exec rake compile
# Verify native extension is working
ruby -I lib -e "require 'parsanol'; puts Parsanol::Native.available?"
# => trueParsanol is inspired by the Parslet library. We thank Kaspar Schiess and all Parslet contributors for creating an excellent parser library that served as inspiration for this project.