fix(shacl): fix language-tagged string handling in SHACL shapes#58
Conversation
This commit includes two related fixes:
1. Turtle Parser (oxirs-core):
- Delegate parse_str() to oxttl for full Turtle syntax support
- The previous line-by-line implementation couldn't handle semicolons,
commas, quoted strings, blank nodes, or RDF collections
- Add comprehensive test suite with 16 tests
2. SHACL Language Tag Handling (oxirs-shacl):
- Fix validator to accept empty string as valid "no language tag"
- Add get_string_with_language() helpers to preserve language tags
- Fix sh:message parsing to use actual language tag as map key
- Previously, all messages were stored with "" key, then rejected
by validation which didn't accept empty strings
Fixes parsing of shapes with language-tagged strings like:
sh:message "Validation failed"@en
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in oxirs-shacl where shapes with language-tagged strings (e.g., sh:message "Error"@en) failed to parse. The fix has two parts: (1) updating the validator to accept empty strings as valid "no language tag" markers, and (2) adding helper methods to preserve language tags when parsing literals. The PR also includes changes from PR #57 that replaces the broken Turtle parser with delegation to the mature oxttl library.
Changes:
- Fixed validator to accept empty string as "no language tag" placeholder
- Added
get_string_with_language()helpers to extract and preserve language tags from RDF literals - Replaced broken line-by-line Turtle parser with delegation to oxttl for proper Turtle syntax support
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| engine/oxirs-shacl/tests/language_tag_bug_test.rs | New test file with 3 tests covering language-tagged and plain messages |
| engine/oxirs-shacl/src/validator.rs | Updated validator to accept empty string as valid "no language tag" marker |
| engine/oxirs-shacl/src/shapes/parser.rs | Added helper methods to extract language tags and updated sh:message parsing to preserve them |
| core/oxirs-core/tests/turtle_parser_tests.rs | Comprehensive test suite (16 tests) for Turtle parser covering all major syntax features |
| core/oxirs-core/src/format/turtle.rs | Replaced broken parser with delegation to oxttl for full Turtle support |
| core/oxirs-core/src/format/parser/mod.rs | Changed helpers module visibility to pub(crate) for access from turtle module |
| core/oxirs-core/src/format/parser/helpers.rs | Changed convert_quad visibility to pub(crate) for use in turtle parser |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Parse sh:message (preserving language tag if present) | ||
| if let Some((message, lang_tag)) = | ||
| self.get_string_with_language(graph, &shape_subject, &SHACL_VOCAB.message)? | ||
| { | ||
| shape.messages.insert("".to_string(), message); // Default language | ||
| shape.messages.insert(lang_tag, message); | ||
| } |
There was a problem hiding this comment.
According to the SHACL specification, a shape can have multiple sh:message predicates with different language tags. The current implementation only handles the first sh:message predicate found. This means if a shape has multiple messages in different languages (e.g., "Error"@en and "Erreur"@fr), only one will be stored.
To fix this, the code should iterate over all sh:message triples, similar to how sh:targetNode is handled (lines 454-467), and insert each message with its corresponding language tag into the messages map.
| if let Some((message, lang_tag)) = | ||
| self.get_string_with_language_for_subject(graph, blank_subject, &SHACL_VOCAB.message)? | ||
| { | ||
| shape.messages.insert("".to_string(), message); | ||
| shape.messages.insert(lang_tag, message); | ||
| } |
There was a problem hiding this comment.
Similar to the issue above, this code should handle multiple sh:message predicates with different language tags. The current implementation only processes the first message found, which means shapes defined as blank nodes cannot have multi-language messages.
To fix this, iterate over all sh:message triples for the blank subject and insert each message with its corresponding language tag.
There was a problem hiding this comment.
Copilot can you fix this?
| #[test] | ||
| fn test_shape_with_language_tagged_name() { | ||
| let shapes = r#" | ||
| @prefix sh: <http://www.w3.org/ns/shacl#> . | ||
| @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . | ||
| @prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> . | ||
| @prefix shapes: <urn:continuum:shapes/> . | ||
|
|
||
| shapes:LatitudeProperty a sh:PropertyShape ; | ||
| sh:path wgs84:lat ; | ||
| sh:name "latitude"@en ; | ||
| sh:datatype xsd:double ; | ||
| sh:severity sh:Violation . | ||
| "#; | ||
|
|
||
| let mut validator = Validator::new(); | ||
| let result = validator.load_shapes_from_rdf(shapes, "turtle", None); | ||
|
|
||
| match &result { | ||
| Ok(count) => println!("SUCCESS: Loaded {} shapes", count), | ||
| Err(e) => println!("ERROR: {:?}", e), | ||
| } | ||
|
|
||
| assert!(result.is_ok(), "Should load shapes with language-tagged sh:name"); | ||
| } |
There was a problem hiding this comment.
This test uses sh:name in the shape definition, but sh:name doesn't appear to be parsed or supported by the SHACL validator (it's not in the ShaclVocabulary). The test only verifies that the shape loads without error, not that sh:name is actually parsed and stored. This test name is therefore misleading - it's actually testing whether the Turtle parser can handle language-tagged strings, not whether the SHACL parser handles sh:name correctly.
Consider either renaming this test to clarify what it's actually testing (e.g., test_turtle_parser_handles_language_tags) or removing it since the other two tests already cover the actual bug fix for sh:message.
| #[test] | ||
| fn test_shape_with_language_tagged_message() { | ||
| let shapes = r#" | ||
| @prefix sh: <http://www.w3.org/ns/shacl#> . | ||
| @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . | ||
| @prefix shapes: <urn:continuum:shapes/> . | ||
|
|
||
| shapes:TestShape a sh:NodeShape ; | ||
| sh:targetClass shapes:TestClass ; | ||
| sh:message "Validation failed"@en . | ||
| "#; | ||
|
|
||
| let mut validator = Validator::new(); | ||
| let result = validator.load_shapes_from_rdf(shapes, "turtle", None); | ||
|
|
||
| match &result { | ||
| Ok(count) => println!("SUCCESS: Loaded {} shapes", count), | ||
| Err(e) => println!("ERROR: {:?}", e), | ||
| } | ||
|
|
||
| assert!(result.is_ok(), "Should load shapes with language-tagged sh:message"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_shape_with_plain_message() { | ||
| let shapes = r#" | ||
| @prefix sh: <http://www.w3.org/ns/shacl#> . | ||
| @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . | ||
| @prefix shapes: <urn:continuum:shapes/> . | ||
|
|
||
| shapes:TestShape a sh:NodeShape ; | ||
| sh:targetClass shapes:TestClass ; | ||
| sh:message "Validation failed" . | ||
| "#; | ||
|
|
||
| let mut validator = Validator::new(); | ||
| let result = validator.load_shapes_from_rdf(shapes, "turtle", None); | ||
|
|
||
| match &result { | ||
| Ok(count) => println!("SUCCESS: Loaded {} shapes", count), | ||
| Err(e) => println!("ERROR: {:?}", e), | ||
| } | ||
|
|
||
| assert!(result.is_ok(), "Should load shapes with plain sh:message"); | ||
| } |
There was a problem hiding this comment.
The tests only verify that shapes can be loaded without error, but they don't actually verify that the language tags are correctly preserved in the parsed shapes. Consider adding assertions that check:
- The message is stored with the correct language tag key (e.g.,
shape.messages["en"]should equal "Validation failed") - For plain messages, the key should be an empty string (e.g.,
shape.messages[""]should equal "Validation failed") - Multiple messages with different language tags are all preserved (e.g., a shape with both @en and @fr messages)
This would provide stronger test coverage and catch regressions more effectively.
| if let Some(ref base) = self.base_iri { | ||
| oxttl_parser = oxttl_parser | ||
| .with_base_iri(base.as_str()) | ||
| .unwrap_or_else(|_| oxttl::TurtleParser::new()); |
There was a problem hiding this comment.
If oxttl_parser.with_base_iri() fails, the code silently falls back to creating a new parser without the base IRI (line 83). This means the user's configured base IRI will be silently ignored on error, which could lead to unexpected parsing behavior.
Consider either:
- Propagating the error instead of silently ignoring it
- Logging a warning when falling back
- Documenting this fallback behavior
The silent fallback could be confusing for users who expect their base IRI configuration to be respected.
| .unwrap_or_else(|_| oxttl::TurtleParser::new()); | |
| .map_err(|e| RdfParseError::syntax(format!("Invalid base IRI '{base}': {e}")))?; |
cool-japan
left a comment
There was a problem hiding this comment.
✅ Excellent fix! Resolves critical bug where SHACL shapes with language-tagged strings (@en, @ja, etc.) failed to load. Parser now preserves language tags, validator accepts empty string for plain literals. All 645 tests pass (642 existing + 3 new). Enables internationalized SHACL shapes and W3C SHACL compliance.
Summary
Fixes oxirs-shacl failing to parse shapes with language-tagged strings (e.g.,
sh:message "Validation failed"@en).Root cause: Two issues:
""was used to represent "no language tag"""instead of actual tag like"en"Changes:
get_string_with_language()helpers to preserve language tags from literalssh:messageparsing to use actual language tag as map keyDependencies
Test plan
Reproduction
Before fix:
After fix: ✅ Loads successfully
🤖 Generated with Claude Code