Try Fixing XPath Bug #624

gfs · 2025-09-08T23:38:57Z

This pull request introduces significant improvements to XML/XPath boundary detection in the TextContainer class and adds new test helpers for rule creation. The main focus is on enhancing the accuracy and robustness of XML XPath string extraction, especially with respect to namespace handling and precise location mapping, while also making it easier to write and manage rule-based tests.

Enhancements to XML/XPath processing:

Replaced the use of XPathDocument and XmlNavigator with XDocument and LINQ to XML for XPath evaluation, enabling better support for line information and more robust namespace handling. (AppInspector.RulesEngine/TextContainer.cs) [1] [2] [3]
Added new logic for precise boundary detection using line info from XML objects, including configurable search windows for attributes and element values. This improves the accuracy of mapping XPath results back to their positions in the original XML document. (AppInspector.RulesEngine/TextContainer.cs) [1] [2]
Improved handling of XML namespaces: now merges namespaces declared in the document with those provided by the caller, and logs detailed errors if namespace mapping fails. (AppInspector.RulesEngine/TextContainer.cs)

Testing improvements:

Added a new RuleTestHelpers class to simplify the creation of RuleProcessor instances and single-rule RuleSets for tests, supporting both string and dictionary-based namespace inputs for XPath patterns. (AppInspector.Tests/RuleProcessor/RuleTestHelpers.cs)

These changes should result in more accurate XML analysis and make it easier to write and maintain tests for rule processing.

Refactors XML XPath extraction in TextContainer to use XDocument and track node positions, ensuring correct sample and boundary values for XPath matches. Adds a regression test for GitHub issue #621 to verify that the extracted sample and boundary are accurate for XPath queries in XML files.

Replaces the previous EvaluateXPath method with a unified approach using XPathEvaluate, improving support for native engine results and namespace handling. Adds robust logic to extract value boundaries for elements, attributes, and strings, including fallbacks for missing position data. Enhances error handling and logging for XPath evaluation failures.

Renamed the test method from 'XPathBugFix621_VersionExtraction' to 'XPathVersionElementSampleBoundary' in XmlAndJsonTests.cs for improved clarity and alignment with test intent.

Introduces a new test class 'XPathComprehensiveTests' with a variety of cases to expand coverage of XPath handling in the rules engine. Tests include complex predicates, axis selectors, attribute formats, namespace conflicts, text nodes, invalid expressions, empty/whitespace nodes, CDATA/comments, large document performance, and boundary edge cases.

Simplifies XML XPath extraction by removing the custom node-to-position mapping logic and relying on line info and fallback string search to determine boundaries. This reduces complexity and improves maintainability while preserving the ability to map extracted values to their locations in the original document.

Enhanced the TextContainer to better handle XML namespaces by first extracting namespace declarations from the document root and then applying caller-supplied mappings, allowing for more robust and flexible XPath queries. Updated related test to use a more precise XPath and namespace mapping, ensuring correct matching of attributes in XML files.

Enhanced TextContainer to better disambiguate boundaries for duplicate XML element and attribute values, reducing collisions in boundary mapping. Added comprehensive unit tests in XPathBoundaryTests to validate correct boundary index/length mapping, especially for duplicate values and edge cases, and to document current limitations.

Removed documentation and handling of known boundary mapping limitations in XPathBoundaryTests. Now asserts that all boundaries are distinct for different element and attribute positions, reflecting improved or expected boundary disambiguation logic.

Extracted line-info and fallback XML boundary detection logic into separate methods for improved readability and maintainability. The main processing loop now delegates to these helpers, reducing code duplication and clarifying the boundary detection flow for XML elements and attributes.

Eliminated the unnecessary 'using System.Collections.Generic;' directive from XPathComprehensiveTests.cs to clean up unused imports.

Updated the BuildRule method in XPathBoundaryTests to construct rules using the object model instead of JSON strings. Added support for parsing XPath namespaces from JSON and mapping pattern type strings to the PatternType enum for improved test clarity and maintainability.

Enhanced attribute value boundary detection to handle namespaced attributes by checking both full and local names. Removed the TryProcessFallbackMethods method and its usage, simplifying the code and relying on improved primary detection logic.

Introduced RuleTestHelpers to centralize and simplify the creation of RuleProcessor and RuleSet instances in test code. Updated XPathBoundaryTests and XPathComprehensiveTests to use these helpers, reducing duplication and improving maintainability.

Replaces hardcoded numeric values for XML/XPath boundary search windows with named constants. This improves code readability and maintainability, and provides context for tuning performance and match accuracy.

Enhanced error handling in TextContainer by logging detailed trace messages when adding XML namespace mappings fails. Also, removed outdated comments from XPathBoundaryTests to clean up the test code.

Introduced a local variable 'targetVal' to store the value as a Span for reuse in the attribute value comparison loop, improving code clarity and efficiency.

Improves robustness by validating line and column numbers before mapping XML line info to content indices. Prevents out-of-bounds errors and ensures invalid or out-of-range line/column values are handled gracefully.

Introduces unit tests to verify that the rule processor handles malformed or out-of-range XML line and column information gracefully, ensuring correct fallback logic and value extraction without exceptions.

Copilot

Pull Request Overview

This pull request significantly refactors the XML/XPath handling in the TextContainer class to improve boundary calculation accuracy and fix issues with XPath query result mapping. The primary goal is to resolve GitHub issue #621 where sample and excerpt extraction wasn't working properly with XPath.

Key changes:

Migrated from XPathDocument to XDocument for better line info support and XML manipulation capabilities
Implemented sophisticated boundary calculation logic using line information and fallback strategies
Enhanced namespace handling for XPath queries with both document-declared and caller-supplied namespaces

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
AppInspector.RulesEngine/TextContainer.cs	Complete rewrite of XPath processing logic with new `XDocument`-based implementation and precise boundary calculation
AppInspector.Tests/RuleProcessor/XmlAndJsonTests.cs	Added comprehensive test for XPath boundary accuracy and updated existing namespace tests
AppInspector.Tests/RuleProcessor/XPathBoundaryTests.cs	New dedicated test file for validating boundary calculation edge cases
AppInspector.Tests/RuleProcessor/XPathComprehensiveTests.cs	New comprehensive test suite covering various XPath scenarios
AppInspector.Tests/RuleProcessor/XPathMalformedLineInfoTests.cs	New test file for defensive handling of malformed XML line info
AppInspector.Tests/RuleProcessor/RuleTestHelpers.cs	New helper class for creating test rules and processors

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-09T20:48:53Z

AppInspector.RulesEngine/TextContainer.cs

+    // Window/offset constants used for XML/XPath boundary reconstruction.
+    // These constrain local searches to avoid scanning entire large documents while
+    // remaining generous enough for typical formatting (attributes spread across a tag line,
+    // element content near reported line info, and a broader fallback when line info is
+    // imperfect). Adjust cautiously as they directly impact performance and match accuracy.
+    private const int XmlAttributeSearchWindow = 400;     // Max chars to scan forward for an attribute value on the same line
+    private const int XmlElementSearchLookBehind = 100;   // Chars to look back from reported line position when locating element start
+    private const int XmlElementSearchLookAhead = 500;    // Chars to look forward from reported line position for element content
+    private const int XmlElementFallbackWindow = 800;     // Fallback scan window when precise line-info based match fails


These magic numbers should be made configurable or moved to a configuration class. Hard-coded constants make it difficult to tune performance for different XML document sizes and structures without recompiling.

Copilot · 2025-09-09T20:48:53Z

AppInspector.RulesEngine/TextContainer.cs

+        var line = li.LineNumber; // 1-based per IXmlLineInfo
+        var col = li.LinePosition; // 1-based per IXmlLineInfo
+        // Defensive: ensure line index is within our sentinel-based one-indexed arrays
+        if (line <= 0 || line >= LineStarts.Count)


The condition line >= LineStarts.Count should be line > LineStarts.Count - 1 or use >= with the proper array bounds check. Since LineStarts is 1-indexed with a sentinel at position 0, valid line numbers should be 1 to LineStarts.Count-1.

Copilot · 2025-09-09T20:48:54Z

AppInspector.Tests/RuleProcessor/XPathMalformedLineInfoTests.cs

+    private static object CreateLineInfoAnnotation(int line, int col)
+    {
+        var annotationType = typeof(XObject).Assembly.GetType("System.Xml.Linq.LineInfoAnnotation")
+            ?? throw new InvalidOperationException("LineInfoAnnotation type not found.");
+        var inst = Activator.CreateInstance(annotationType, line, col)
+            ?? throw new InvalidOperationException("Failed to create LineInfoAnnotation instance.");
+        return inst;
+    }


This method uses reflection to access internal types which creates brittle code that may break with .NET updates. Consider using public APIs or mocking frameworks instead of relying on internal implementation details.

Copilot · 2025-09-09T20:48:54Z

AppInspector.Tests/RuleProcessor/XPathMalformedLineInfoTests.cs

+        var removeMethod = typeof(System.Xml.Linq.XObject)
+            .GetMethods(BindingFlags.Instance | BindingFlags.Public)
+            .FirstOrDefault(m => m.Name == "RemoveAnnotations" && m.IsGenericMethod && m.GetParameters().Length == 0);


Using reflection to find and invoke methods by name is fragile and may break with framework updates. Consider using a more stable approach or documenting the .NET version dependency.

Copilot · 2025-09-09T20:48:54Z

AppInspector.Tests/RuleProcessor/XPathBoundaryTests.cs

+        var attrAIndex = xml.IndexOf("a=\"DupVal\"") + "a=\"".Length;  // Should be around 44
+        var attrBIndex = xml.IndexOf("b=\"DupVal\"") + "b=\"".Length;  // Should be around 55  
+        var attrCIndex = xml.IndexOf("c=\"DupVal\"") + "c=\"".Length;  // Should be around 66
+        var elemIndex = xml.IndexOf(">DupVal<") + 1;                    // Should be around 73


These variables are calculated but never used in the test assertions. Either use these expected indices in the test validation or remove the unused variables to avoid confusion.

Eliminated unused index calculations and an unnecessary assertion in the test. This streamlines the test code and removes dead code that did not contribute to the test's outcome.

gfs added 4 commits September 8, 2025 16:04

Rename test method for XPath version extraction

1998f33

Renamed the test method from 'XPathBugFix621_VersionExtraction' to 'XPathVersionElementSampleBoundary' in XmlAndJsonTests.cs for improved clarity and alignment with test intent.

gfs requested a review from Copilot September 8, 2025 23:40