Skip to content

fix(shacl): fix language-tagged string handling in SHACL shapes#58

Merged
cool-japan merged 1 commit intocool-japan:masterfrom
temporaryfix:fix/shacl-language-tagged-strings
Feb 8, 2026
Merged

fix(shacl): fix language-tagged string handling in SHACL shapes#58
cool-japan merged 1 commit intocool-japan:masterfrom
temporaryfix:fix/shacl-language-tagged-strings

Conversation

@temporaryfix
Copy link
Copy Markdown
Contributor

Summary

Fixes oxirs-shacl failing to parse shapes with language-tagged strings (e.g., sh:message "Validation failed"@en).

Root cause: Two issues:

  1. Validator rejected empty string language tags, but "" was used to represent "no language tag"
  2. Parser lost language tag info - always stored messages with key "" instead of actual tag like "en"

Changes:

  • Fix validator to accept empty string as valid "no language tag"
  • Add get_string_with_language() helpers to preserve language tags from literals
  • Fix sh:message parsing to use actual language tag as map key

Dependencies

⚠️ This PR depends on #57 (Turtle parser fix)

The Turtle parser fix is required for proper parsing of language-tagged literals. Please merge #57 first.

Test plan

  • New tests for language-tagged strings pass (3 tests)
  • All oxirs-shacl tests pass (645 tests)
  • Clippy passes with no warnings

Reproduction

Before fix:

let shapes = r#"
@prefix sh: <http://www.w3.org/ns/shacl#> .
shapes:Test a sh:NodeShape ;
    sh:message "Error"@en .
"#;
validator.load_shapes_from_rdf(shapes, "turtle", None);
// ERROR: ShapeParsing("Shape 'shapes:Test' has invalid language tag: ''")

After fix: ✅ Loads successfully

🤖 Generated with Claude Code

This commit includes two related fixes:

1. Turtle Parser (oxirs-core):
   - Delegate parse_str() to oxttl for full Turtle syntax support
   - The previous line-by-line implementation couldn't handle semicolons,
     commas, quoted strings, blank nodes, or RDF collections
   - Add comprehensive test suite with 16 tests

2. SHACL Language Tag Handling (oxirs-shacl):
   - Fix validator to accept empty string as valid "no language tag"
   - Add get_string_with_language() helpers to preserve language tags
   - Fix sh:message parsing to use actual language tag as map key
   - Previously, all messages were stored with "" key, then rejected
     by validation which didn't accept empty strings

Fixes parsing of shapes with language-tagged strings like:
  sh:message "Validation failed"@en

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in oxirs-shacl where shapes with language-tagged strings (e.g., sh:message "Error"@en) failed to parse. The fix has two parts: (1) updating the validator to accept empty strings as valid "no language tag" markers, and (2) adding helper methods to preserve language tags when parsing literals. The PR also includes changes from PR #57 that replaces the broken Turtle parser with delegation to the mature oxttl library.

Changes:

  • Fixed validator to accept empty string as "no language tag" placeholder
  • Added get_string_with_language() helpers to extract and preserve language tags from RDF literals
  • Replaced broken line-by-line Turtle parser with delegation to oxttl for proper Turtle syntax support

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
engine/oxirs-shacl/tests/language_tag_bug_test.rs New test file with 3 tests covering language-tagged and plain messages
engine/oxirs-shacl/src/validator.rs Updated validator to accept empty string as valid "no language tag" marker
engine/oxirs-shacl/src/shapes/parser.rs Added helper methods to extract language tags and updated sh:message parsing to preserve them
core/oxirs-core/tests/turtle_parser_tests.rs Comprehensive test suite (16 tests) for Turtle parser covering all major syntax features
core/oxirs-core/src/format/turtle.rs Replaced broken parser with delegation to oxttl for full Turtle support
core/oxirs-core/src/format/parser/mod.rs Changed helpers module visibility to pub(crate) for access from turtle module
core/oxirs-core/src/format/parser/helpers.rs Changed convert_quad visibility to pub(crate) for use in turtle parser

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +724 to 729
// Parse sh:message (preserving language tag if present)
if let Some((message, lang_tag)) =
self.get_string_with_language(graph, &shape_subject, &SHACL_VOCAB.message)?
{
shape.messages.insert("".to_string(), message); // Default language
shape.messages.insert(lang_tag, message);
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the SHACL specification, a shape can have multiple sh:message predicates with different language tags. The current implementation only handles the first sh:message predicate found. This means if a shape has multiple messages in different languages (e.g., "Error"@en and "Erreur"@fr), only one will be stored.

To fix this, the code should iterate over all sh:message triples, similar to how sh:targetNode is handled (lines 454-467), and insert each message with its corresponding language tag into the messages map.

Copilot uses AI. Check for mistakes.
Comment on lines +1041 to 1045
if let Some((message, lang_tag)) =
self.get_string_with_language_for_subject(graph, blank_subject, &SHACL_VOCAB.message)?
{
shape.messages.insert("".to_string(), message);
shape.messages.insert(lang_tag, message);
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the issue above, this code should handle multiple sh:message predicates with different language tags. The current implementation only processes the first message found, which means shapes defined as blank nodes cannot have multi-language messages.

To fix this, iterate over all sh:message triples for the blank subject and insert each message with its corresponding language tag.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot can you fix this?

Comment on lines +5 to +29
#[test]
fn test_shape_with_language_tagged_name() {
let shapes = r#"
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix shapes: <urn:continuum:shapes/> .

shapes:LatitudeProperty a sh:PropertyShape ;
sh:path wgs84:lat ;
sh:name "latitude"@en ;
sh:datatype xsd:double ;
sh:severity sh:Violation .
"#;

let mut validator = Validator::new();
let result = validator.load_shapes_from_rdf(shapes, "turtle", None);

match &result {
Ok(count) => println!("SUCCESS: Loaded {} shapes", count),
Err(e) => println!("ERROR: {:?}", e),
}

assert!(result.is_ok(), "Should load shapes with language-tagged sh:name");
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses sh:name in the shape definition, but sh:name doesn't appear to be parsed or supported by the SHACL validator (it's not in the ShaclVocabulary). The test only verifies that the shape loads without error, not that sh:name is actually parsed and stored. This test name is therefore misleading - it's actually testing whether the Turtle parser can handle language-tagged strings, not whether the SHACL parser handles sh:name correctly.

Consider either renaming this test to clarify what it's actually testing (e.g., test_turtle_parser_handles_language_tags) or removing it since the other two tests already cover the actual bug fix for sh:message.

Copilot uses AI. Check for mistakes.
Comment on lines +31 to +75
#[test]
fn test_shape_with_language_tagged_message() {
let shapes = r#"
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix shapes: <urn:continuum:shapes/> .

shapes:TestShape a sh:NodeShape ;
sh:targetClass shapes:TestClass ;
sh:message "Validation failed"@en .
"#;

let mut validator = Validator::new();
let result = validator.load_shapes_from_rdf(shapes, "turtle", None);

match &result {
Ok(count) => println!("SUCCESS: Loaded {} shapes", count),
Err(e) => println!("ERROR: {:?}", e),
}

assert!(result.is_ok(), "Should load shapes with language-tagged sh:message");
}

#[test]
fn test_shape_with_plain_message() {
let shapes = r#"
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix shapes: <urn:continuum:shapes/> .

shapes:TestShape a sh:NodeShape ;
sh:targetClass shapes:TestClass ;
sh:message "Validation failed" .
"#;

let mut validator = Validator::new();
let result = validator.load_shapes_from_rdf(shapes, "turtle", None);

match &result {
Ok(count) => println!("SUCCESS: Loaded {} shapes", count),
Err(e) => println!("ERROR: {:?}", e),
}

assert!(result.is_ok(), "Should load shapes with plain sh:message");
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests only verify that shapes can be loaded without error, but they don't actually verify that the language tags are correctly preserved in the parsed shapes. Consider adding assertions that check:

  1. The message is stored with the correct language tag key (e.g., shape.messages["en"] should equal "Validation failed")
  2. For plain messages, the key should be an empty string (e.g., shape.messages[""] should equal "Validation failed")
  3. Multiple messages with different language tags are all preserved (e.g., a shape with both @en and @fr messages)

This would provide stronger test coverage and catch regressions more effectively.

Copilot uses AI. Check for mistakes.
if let Some(ref base) = self.base_iri {
oxttl_parser = oxttl_parser
.with_base_iri(base.as_str())
.unwrap_or_else(|_| oxttl::TurtleParser::new());
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If oxttl_parser.with_base_iri() fails, the code silently falls back to creating a new parser without the base IRI (line 83). This means the user's configured base IRI will be silently ignored on error, which could lead to unexpected parsing behavior.

Consider either:

  1. Propagating the error instead of silently ignoring it
  2. Logging a warning when falling back
  3. Documenting this fallback behavior

The silent fallback could be confusing for users who expect their base IRI configuration to be respected.

Suggested change
.unwrap_or_else(|_| oxttl::TurtleParser::new());
.map_err(|e| RdfParseError::syntax(format!("Invalid base IRI '{base}': {e}")))?;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner

@cool-japan cool-japan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Excellent fix! Resolves critical bug where SHACL shapes with language-tagged strings (@en, @ja, etc.) failed to load. Parser now preserves language tags, validator accepts empty string for plain literals. All 645 tests pass (642 existing + 3 new). Enables internationalized SHACL shapes and W3C SHACL compliance.

@cool-japan cool-japan merged commit 1bd0a93 into cool-japan:master Feb 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants