Skip to content

Fix XPath for HTML documents with broken root#6774

Merged
Alkarex merged 1 commit intoFreshRSS:edgefrom
Alkarex:fix-xpath
Sep 4, 2024
Merged

Fix XPath for HTML documents with broken root#6774
Alkarex merged 1 commit intoFreshRSS:edgefrom
Alkarex:fix-xpath

Conversation

@Alkarex
Copy link
Member

@Alkarex Alkarex commented Sep 4, 2024

fix #6773

The default .// prefix for the XPath does not work for HTML documents, which have content after the end of their main node.
But // works in both cases.

fix FreshRSS#6773

The default `.//` prefix for the XPath does not to work for documents, which have content after the end of their main node
@Alkarex Alkarex added this to the 1.24.3 milestone Sep 4, 2024
@Alkarex Alkarex added the Feed problem 🗞️ Feeds that have issues while loading/reading in FreshRSS label Sep 4, 2024
@Alkarex
Copy link
Member Author

Alkarex commented Sep 4, 2024

Example of test:

<?php
$htmlString = <<<HTML
<!DOCTYPE html>
<html>
	<body>
	</body>
</html>
<div>div1</div>
<div>div2</div>
HTML;

$html = new DOMDocument();
$html->loadHTML($htmlString);
$xpath = new DOMXPath($html);

$divs = $xpath->query('.//div');
foreach ($divs as $div) {
	echo 'Matched with .// ' . $div->nodeValue . PHP_EOL;
}

$divs = $xpath->query('//div');
foreach ($divs as $div) {
	echo 'Matched with // ' . $div->nodeValue . PHP_EOL;
}

Returns:

Matched with // div1
Matched with // div2

@Alkarex Alkarex merged commit 91d0e50 into FreshRSS:edge Sep 4, 2024
@Alkarex Alkarex deleted the fix-xpath branch September 4, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feed problem 🗞️ Feeds that have issues while loading/reading in FreshRSS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Retrieve content from a specific website

1 participant