fix(turtle): delegate parsing to oxttl for full Turtle syntax support#57
Conversation
The TurtleParser's parse_str() method used broken line-by-line processing with split_whitespace(), which couldn't handle standard Turtle syntax like semicolons, commas, quoted strings with spaces, blank node property lists, or RDF collections. Changes: - Replace parse_str() implementation with oxttl delegation - Make convert_quad helper pub(crate) for reuse - Remove broken private helper methods - Add comprehensive test suite (16 tests) All oxirs-core and oxirs-shacl tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This pull request fixes a broken Turtle parser implementation by delegating parsing to the oxttl library instead of using a flawed custom line-by-line parser. The previous implementation couldn't handle standard Turtle syntax features like semicolons, commas, quoted strings with spaces, blank nodes, and collections because it used simple split_whitespace() processing.
Changes:
- Replaced custom
parse_str()implementation with delegation to oxttl's TurtleParser for full Turtle syntax support - Removed 240+ lines of broken helper methods (
parse_prefix_directive,parse_base_directive,parse_triple_statement,parse_term,parse_predicate) - Added comprehensive test suite with 16 tests covering all Turtle syntax features
- Changed visibility of
convert_quadfrompub(super)topub(crate)to enable use fromformat::turtlemodule - Changed visibility of
helpersmodule from private topub(crate)to support the above
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| core/oxirs-core/tests/turtle_parser_tests.rs | New comprehensive test file with 16 tests covering semicolons, commas, literals, blank nodes, collections, comments, and SHACL syntax |
| core/oxirs-core/src/format/turtle.rs | Replaced broken custom parser with oxttl delegation; removed 240+ lines of flawed parsing logic; retained configuration API |
| core/oxirs-core/src/format/parser/mod.rs | Changed helpers module visibility to pub(crate) to allow access from sibling modules |
| core/oxirs-core/src/format/parser/helpers.rs | Changed convert_quad visibility to pub(crate) to enable use from format::turtle |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Parse and collect triples | ||
| let reader = Cursor::new(input.as_bytes()); |
There was a problem hiding this comment.
The parser's configured prefixes (set via with_prefix()) are not being passed to the oxttl parser. While standard Turtle documents define prefixes inline with @Prefix directives (which oxttl handles correctly), the existing API allows pre-configuring prefixes that would not be recognized during parsing. Consider whether prefixes configured via with_prefix() should be passed to oxttl or whether this field should only be used for serialization. If prefixes should be supported during parsing, you would need to check if oxttl provides a way to pre-configure them.
| // Parse and collect triples | |
| let reader = Cursor::new(input.as_bytes()); | |
| // Build effective input: configured prefixes (if any) + original input. | |
| // This ensures that prefixes set via `with_prefix` are visible to the oxttl parser, | |
| // which already understands standard `@prefix` Turtle directives. | |
| let mut content = String::new(); | |
| if !self.prefixes.is_empty() { | |
| for (prefix, iri) in &self.prefixes { | |
| if prefix.is_empty() { | |
| // Default prefix | |
| content.push_str("@prefix : <"); | |
| } else { | |
| content.push_str("@prefix "); | |
| content.push_str(prefix); | |
| content.push_str(": <"); | |
| } | |
| content.push_str(iri); | |
| content.push_str("> .\n"); | |
| } | |
| } | |
| content.push_str(input); | |
| // Parse and collect triples | |
| let reader = Cursor::new(content.as_bytes()); |
| triple.object, | ||
| oxrdf::GraphName::DefaultGraph, | ||
| ); | ||
| let oxirs_quad = convert_quad(quad)?; |
There was a problem hiding this comment.
In lenient mode, if convert_quad fails, the entire parsing operation will stop and return an error due to the ? operator. This differs from how oxttl parsing errors are handled (which are skipped in lenient mode). While this is consistent with the existing parser implementation in format/parser/turtle.rs, consider whether conversion errors should also be skipped in lenient mode for consistency. If conversion errors should stop parsing regardless of lenient mode (because they indicate internal conversion issues rather than malformed input), this is fine as-is.
| let oxirs_quad = convert_quad(quad)?; | |
| let oxirs_quad = match convert_quad(quad) { | |
| Ok(q) => q, | |
| Err(e) => { | |
| if !self.lenient { | |
| return Err(e); | |
| } | |
| // In lenient mode, skip conversion errors | |
| continue; | |
| } | |
| }; |
cool-japan
left a comment
There was a problem hiding this comment.
✅ Excellent fix! Replaces 247 lines of broken Turtle parsing with 35 lines delegating to oxttl. All 1,523 tests pass (16 new turtle tests + 865 oxirs-core + 642 oxirs-shacl). Critical bug fix that enables parsing real-world Turtle/SHACL files with semicolons, blank nodes, collections, and complex syntax.
Summary
parse_str()implementation with oxttl delegationsplit_whitespace()which couldn't handle standard Turtle syntax (semicolons, commas, quoted strings, blank nodes, collections)Test plan
🤖 Generated with Claude Code