-
Notifications
You must be signed in to change notification settings - Fork 367
Try Fixing XPath Bug #624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try Fixing XPath Bug #624
Conversation
Refactors XML XPath extraction in TextContainer to use XDocument and track node positions, ensuring correct sample and boundary values for XPath matches. Adds a regression test for GitHub issue #621 to verify that the extracted sample and boundary are accurate for XPath queries in XML files.
Replaces the previous EvaluateXPath method with a unified approach using XPathEvaluate, improving support for native engine results and namespace handling. Adds robust logic to extract value boundaries for elements, attributes, and strings, including fallbacks for missing position data. Enhances error handling and logging for XPath evaluation failures.
Renamed the test method from 'XPathBugFix621_VersionExtraction' to 'XPathVersionElementSampleBoundary' in XmlAndJsonTests.cs for improved clarity and alignment with test intent.
Introduces a new test class 'XPathComprehensiveTests' with a variety of cases to expand coverage of XPath handling in the rules engine. Tests include complex predicates, axis selectors, attribute formats, namespace conflicts, text nodes, invalid expressions, empty/whitespace nodes, CDATA/comments, large document performance, and boundary edge cases.
Simplifies XML XPath extraction by removing the custom node-to-position mapping logic and relying on line info and fallback string search to determine boundaries. This reduces complexity and improves maintainability while preserving the ability to map extracted values to their locations in the original document.
Enhanced the TextContainer to better handle XML namespaces by first extracting namespace declarations from the document root and then applying caller-supplied mappings, allowing for more robust and flexible XPath queries. Updated related test to use a more precise XPath and namespace mapping, ensuring correct matching of attributes in XML files.
Enhanced TextContainer to better disambiguate boundaries for duplicate XML element and attribute values, reducing collisions in boundary mapping. Added comprehensive unit tests in XPathBoundaryTests to validate correct boundary index/length mapping, especially for duplicate values and edge cases, and to document current limitations.
Removed documentation and handling of known boundary mapping limitations in XPathBoundaryTests. Now asserts that all boundaries are distinct for different element and attribute positions, reflecting improved or expected boundary disambiguation logic.
Extracted line-info and fallback XML boundary detection logic into separate methods for improved readability and maintainability. The main processing loop now delegates to these helpers, reducing code duplication and clarifying the boundary detection flow for XML elements and attributes.
Eliminated the unnecessary 'using System.Collections.Generic;' directive from XPathComprehensiveTests.cs to clean up unused imports.
Updated the BuildRule method in XPathBoundaryTests to construct rules using the object model instead of JSON strings. Added support for parsing XPath namespaces from JSON and mapping pattern type strings to the PatternType enum for improved test clarity and maintainability.
Enhanced attribute value boundary detection to handle namespaced attributes by checking both full and local names. Removed the TryProcessFallbackMethods method and its usage, simplifying the code and relying on improved primary detection logic.
Introduced RuleTestHelpers to centralize and simplify the creation of RuleProcessor and RuleSet instances in test code. Updated XPathBoundaryTests and XPathComprehensiveTests to use these helpers, reducing duplication and improving maintainability.
Replaces hardcoded numeric values for XML/XPath boundary search windows with named constants. This improves code readability and maintainability, and provides context for tuning performance and match accuracy.
Enhanced error handling in TextContainer by logging detailed trace messages when adding XML namespace mappings fails. Also, removed outdated comments from XPathBoundaryTests to clean up the test code.
Introduced a local variable 'targetVal' to store the value as a Span for reuse in the attribute value comparison loop, improving code clarity and efficiency.
Improves robustness by validating line and column numbers before mapping XML line info to content indices. Prevents out-of-bounds errors and ensures invalid or out-of-range line/column values are handled gracefully.
Introduces unit tests to verify that the rule processor handles malformed or out-of-range XML line and column information gracefully, ensuring correct fallback logic and value extraction without exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request significantly refactors the XML/XPath handling in the TextContainer class to improve boundary calculation accuracy and fix issues with XPath query result mapping. The primary goal is to resolve GitHub issue #621 where sample and excerpt extraction wasn't working properly with XPath.
Key changes:
- Migrated from
XPathDocumenttoXDocumentfor better line info support and XML manipulation capabilities - Implemented sophisticated boundary calculation logic using line information and fallback strategies
- Enhanced namespace handling for XPath queries with both document-declared and caller-supplied namespaces
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| AppInspector.RulesEngine/TextContainer.cs | Complete rewrite of XPath processing logic with new XDocument-based implementation and precise boundary calculation |
| AppInspector.Tests/RuleProcessor/XmlAndJsonTests.cs | Added comprehensive test for XPath boundary accuracy and updated existing namespace tests |
| AppInspector.Tests/RuleProcessor/XPathBoundaryTests.cs | New dedicated test file for validating boundary calculation edge cases |
| AppInspector.Tests/RuleProcessor/XPathComprehensiveTests.cs | New comprehensive test suite covering various XPath scenarios |
| AppInspector.Tests/RuleProcessor/XPathMalformedLineInfoTests.cs | New test file for defensive handling of malformed XML line info |
| AppInspector.Tests/RuleProcessor/RuleTestHelpers.cs | New helper class for creating test rules and processors |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // Window/offset constants used for XML/XPath boundary reconstruction. | ||
| // These constrain local searches to avoid scanning entire large documents while | ||
| // remaining generous enough for typical formatting (attributes spread across a tag line, | ||
| // element content near reported line info, and a broader fallback when line info is | ||
| // imperfect). Adjust cautiously as they directly impact performance and match accuracy. | ||
| private const int XmlAttributeSearchWindow = 400; // Max chars to scan forward for an attribute value on the same line | ||
| private const int XmlElementSearchLookBehind = 100; // Chars to look back from reported line position when locating element start | ||
| private const int XmlElementSearchLookAhead = 500; // Chars to look forward from reported line position for element content | ||
| private const int XmlElementFallbackWindow = 800; // Fallback scan window when precise line-info based match fails |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These magic numbers should be made configurable or moved to a configuration class. Hard-coded constants make it difficult to tune performance for different XML document sizes and structures without recompiling.
| var line = li.LineNumber; // 1-based per IXmlLineInfo | ||
| var col = li.LinePosition; // 1-based per IXmlLineInfo | ||
| // Defensive: ensure line index is within our sentinel-based one-indexed arrays | ||
| if (line <= 0 || line >= LineStarts.Count) |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition line >= LineStarts.Count should be line > LineStarts.Count - 1 or use >= with the proper array bounds check. Since LineStarts is 1-indexed with a sentinel at position 0, valid line numbers should be 1 to LineStarts.Count-1.
| private static object CreateLineInfoAnnotation(int line, int col) | ||
| { | ||
| var annotationType = typeof(XObject).Assembly.GetType("System.Xml.Linq.LineInfoAnnotation") | ||
| ?? throw new InvalidOperationException("LineInfoAnnotation type not found."); | ||
| var inst = Activator.CreateInstance(annotationType, line, col) | ||
| ?? throw new InvalidOperationException("Failed to create LineInfoAnnotation instance."); | ||
| return inst; | ||
| } |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method uses reflection to access internal types which creates brittle code that may break with .NET updates. Consider using public APIs or mocking frameworks instead of relying on internal implementation details.
| var removeMethod = typeof(System.Xml.Linq.XObject) | ||
| .GetMethods(BindingFlags.Instance | BindingFlags.Public) | ||
| .FirstOrDefault(m => m.Name == "RemoveAnnotations" && m.IsGenericMethod && m.GetParameters().Length == 0); |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using reflection to find and invoke methods by name is fragile and may break with framework updates. Consider using a more stable approach or documenting the .NET version dependency.
| var attrAIndex = xml.IndexOf("a=\"DupVal\"") + "a=\"".Length; // Should be around 44 | ||
| var attrBIndex = xml.IndexOf("b=\"DupVal\"") + "b=\"".Length; // Should be around 55 | ||
| var attrCIndex = xml.IndexOf("c=\"DupVal\"") + "c=\"".Length; // Should be around 66 | ||
| var elemIndex = xml.IndexOf(">DupVal<") + 1; // Should be around 73 |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variables are calculated but never used in the test assertions. Either use these expected indices in the test validation or remove the unused variables to avoid confusion.
Eliminated unused index calculations and an unnecessary assertion in the test. This streamlines the test code and removes dead code that did not contribute to the test's outcome.
This pull request introduces significant improvements to XML/XPath boundary detection in the
TextContainerclass and adds new test helpers for rule creation. The main focus is on enhancing the accuracy and robustness of XML XPath string extraction, especially with respect to namespace handling and precise location mapping, while also making it easier to write and manage rule-based tests.Enhancements to XML/XPath processing:
XPathDocumentandXmlNavigatorwithXDocumentand LINQ to XML for XPath evaluation, enabling better support for line information and more robust namespace handling. (AppInspector.RulesEngine/TextContainer.cs) [1] [2] [3]AppInspector.RulesEngine/TextContainer.cs) [1] [2]AppInspector.RulesEngine/TextContainer.cs)Testing improvements:
RuleTestHelpersclass to simplify the creation ofRuleProcessorinstances and single-ruleRuleSets for tests, supporting both string and dictionary-based namespace inputs for XPath patterns. (AppInspector.Tests/RuleProcessor/RuleTestHelpers.cs)These changes should result in more accurate XML analysis and make it easier to write and maintain tests for rule processing.