feat: Support sitemap.xml#2071
Merged
thomas-zahner merged 16 commits intoMar 13, 2026
Merged
Conversation
mre
reviewed
Mar 4, 2026
mre
left a comment
Member
There was a problem hiding this comment.
You're on a roll! Great work! 🚀
I've added a few comments. 😊
mre
reviewed
Mar 5, 2026
Member
|
@cristiklein Regarding the clippy warnings/errors, by rebasing or merging master you should be able to resolve the issues. (edit: I did a rebase) |
c8e9129 to
a4ec801
Compare
thomas-zahner
requested changes
Mar 11, 2026
Tested with https://validator.w3.org/feed/check.cgi Got "[Valid Atom 1.0] This is a valid Atom 1.0 feed."
Contributor
Author
|
@mre @thomas-zahner Thanks for the feedback on this PR. I believe I addressed all your comments. Otherwise, please let me know. 😄 |
thomas-zahner
approved these changes
Mar 13, 2026
Member
|
@cristiklein Thank you for this cool addition to lychee! 🚀 |
Merged
Contributor
Author
And thank you for the kind words and for shepherding this PR. ❤️ |
donbeave
added a commit
to jackin-project/jackin
that referenced
this pull request
Apr 25, 2026
The first Docs workflow run on main after #173 (commit f3f3e5e) failed in deploy → "Check deployed docs links" with "No files found for this input source". Root cause: the previous step ran lychee --dump on the deployed sitemap URL, but lychee 0.23.0 (the lycheeverse/lychee-action v2 default) only extracts <a href> from HTML and matching patterns from markdown — it does not parse <loc> entries from XML sitemaps. The dump produced an empty list and the follow-up --files-from step had nothing to read. Upstream already fixed this. lycheeverse/lychee#2071 (merged 2026-03-13, tagged in v0.24.0 on 2026-04-24) adds <loc> extraction from sitemap.xml, closing lycheeverse/lychee#2062 and #1819. Verified locally on 0.24.0: $ lychee --version lychee 0.24.0 $ lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 Pin LYCHEE_VERSION at the workflow env level and reference it from every lychee-action call so future bumps are one-line. v0.24.0's breaking changes are in lychee-lib (the Rust API consumers); the CLI surface we use is unchanged. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
3 tasks
donbeave
added a commit
to jackin-project/jackin
that referenced
this pull request
Apr 25, 2026
Replace the previous v0.24.0 bump with the only combination that
actually works against the current lychee release pipeline:
- lycheeverse/lychee-action SHA 8646ba3 (tagged v2.8.0) → faea714
(post-v2.8.0 master). Adds subfolder-aware install needed for any
lychee 0.24.x tarball.
- LYCHEE_VERSION 'v0.24.0' → 'v0.24.1'.
Why both moves:
* lychee 0.24.0 added <loc> extraction from XML sitemaps
(lycheeverse/lychee#2071), which is what the deploy and check-deployed
jobs need to feed --files-from. lychee 0.23.0 dumps zero links from a
sitemap, which is what produced the "No files found for this input
source" failure on f3f3e5e.
* lychee 0.24.0's release tarball was repackaged with a top-level
subfolder AND the asset filename was renamed to
lychee-lychee-v0.24.0-{arch}-... — both incompatible with
lychee-action v2.8.0's hardcoded download URL and flat-extract logic.
* lychee 0.24.1 (released the same day) reverted to the original asset
filename but kept the subfolder layout AND kept the sitemap fix.
* lychee-action faea714 (unreleased; current HEAD of master) bumps the
default to 0.24.1 and adds subfolder-aware install. Pinning the SHA
is the same security model we already use for v2.8.0.
The combination 8646ba3 + 'latest' or 8646ba3 + 'v0.24.x' both fail.
The combination faea714 + 'v0.24.1' works.
Verified locally:
$ lychee-v0.24.1/lychee --version
lychee 0.24.1
$ lychee-v0.24.1/lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l
45
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com>
donbeave
added a commit
to jackin-project/jackin
that referenced
this pull request
Apr 25, 2026
) * ci(docs): bump lychee to v0.24.0 to fix sitemap URL extraction The first Docs workflow run on main after #173 (commit f3f3e5e) failed in deploy → "Check deployed docs links" with "No files found for this input source". Root cause: the previous step ran lychee --dump on the deployed sitemap URL, but lychee 0.23.0 (the lycheeverse/lychee-action v2 default) only extracts <a href> from HTML and matching patterns from markdown — it does not parse <loc> entries from XML sitemaps. The dump produced an empty list and the follow-up --files-from step had nothing to read. Upstream already fixed this. lycheeverse/lychee#2071 (merged 2026-03-13, tagged in v0.24.0 on 2026-04-24) adds <loc> extraction from sitemap.xml, closing lycheeverse/lychee#2062 and #1819. Verified locally on 0.24.0: $ lychee --version lychee 0.24.0 $ lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 Pin LYCHEE_VERSION at the workflow env level and reference it from every lychee-action call so future bumps are one-line. v0.24.0's breaking changes are in lychee-lib (the Rust API consumers); the CLI surface we use is unchanged. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> * ci(docs): bump lychee-action and lychee for sitemap URL extraction Replace the previous v0.24.0 bump with the only combination that actually works against the current lychee release pipeline: - lycheeverse/lychee-action SHA 8646ba3 (tagged v2.8.0) → faea714 (post-v2.8.0 master). Adds subfolder-aware install needed for any lychee 0.24.x tarball. - LYCHEE_VERSION 'v0.24.0' → 'v0.24.1'. Why both moves: * lychee 0.24.0 added <loc> extraction from XML sitemaps (lycheeverse/lychee#2071), which is what the deploy and check-deployed jobs need to feed --files-from. lychee 0.23.0 dumps zero links from a sitemap, which is what produced the "No files found for this input source" failure on f3f3e5e. * lychee 0.24.0's release tarball was repackaged with a top-level subfolder AND the asset filename was renamed to lychee-lychee-v0.24.0-{arch}-... — both incompatible with lychee-action v2.8.0's hardcoded download URL and flat-extract logic. * lychee 0.24.1 (released the same day) reverted to the original asset filename but kept the subfolder layout AND kept the sitemap fix. * lychee-action faea714 (unreleased; current HEAD of master) bumps the default to 0.24.1 and adds subfolder-aware install. Pinning the SHA is the same security model we already use for v2.8.0. The combination 8646ba3 + 'latest' or 8646ba3 + 'v0.24.x' both fail. The combination faea714 + 'v0.24.1' works. Verified locally: $ lychee-v0.24.1/lychee --version lychee 0.24.1 $ lychee-v0.24.1/lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> * ci(docs): add TODO(lychee-action-sha-pin) marker Companion to #179, which establishes the convention. Mark the spot where the SHA pin needs to be reverted once lycheeverse/lychee-action cuts a tagged release at or after faea714, with a back-link to the tracked entry in TODO.md so a single grep finds both ends. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> --------- Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> Co-authored-by: Claude <noreply@anthropic.com>
3 tasks
donbeave
added a commit
to jackin-project/jackin
that referenced
this pull request
May 6, 2026
) * ci(docs): bump lychee to v0.24.0 to fix sitemap URL extraction The first Docs workflow run on main after #173 (commit f3f3e5e) failed in deploy → "Check deployed docs links" with "No files found for this input source". Root cause: the previous step ran lychee --dump on the deployed sitemap URL, but lychee 0.23.0 (the lycheeverse/lychee-action v2 default) only extracts <a href> from HTML and matching patterns from markdown — it does not parse <loc> entries from XML sitemaps. The dump produced an empty list and the follow-up --files-from step had nothing to read. Upstream already fixed this. lycheeverse/lychee#2071 (merged 2026-03-13, tagged in v0.24.0 on 2026-04-24) adds <loc> extraction from sitemap.xml, closing lycheeverse/lychee#2062 and #1819. Verified locally on 0.24.0: $ lychee --version lychee 0.24.0 $ lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 Pin LYCHEE_VERSION at the workflow env level and reference it from every lychee-action call so future bumps are one-line. v0.24.0's breaking changes are in lychee-lib (the Rust API consumers); the CLI surface we use is unchanged. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> * ci(docs): bump lychee-action and lychee for sitemap URL extraction Replace the previous v0.24.0 bump with the only combination that actually works against the current lychee release pipeline: - lycheeverse/lychee-action SHA 8646ba3 (tagged v2.8.0) → faea714 (post-v2.8.0 master). Adds subfolder-aware install needed for any lychee 0.24.x tarball. - LYCHEE_VERSION 'v0.24.0' → 'v0.24.1'. Why both moves: * lychee 0.24.0 added <loc> extraction from XML sitemaps (lycheeverse/lychee#2071), which is what the deploy and check-deployed jobs need to feed --files-from. lychee 0.23.0 dumps zero links from a sitemap, which is what produced the "No files found for this input source" failure on f3f3e5e. * lychee 0.24.0's release tarball was repackaged with a top-level subfolder AND the asset filename was renamed to lychee-lychee-v0.24.0-{arch}-... — both incompatible with lychee-action v2.8.0's hardcoded download URL and flat-extract logic. * lychee 0.24.1 (released the same day) reverted to the original asset filename but kept the subfolder layout AND kept the sitemap fix. * lychee-action faea714 (unreleased; current HEAD of master) bumps the default to 0.24.1 and adds subfolder-aware install. Pinning the SHA is the same security model we already use for v2.8.0. The combination 8646ba3 + 'latest' or 8646ba3 + 'v0.24.x' both fail. The combination faea714 + 'v0.24.1' works. Verified locally: $ lychee-v0.24.1/lychee --version lychee 0.24.1 $ lychee-v0.24.1/lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> * ci(docs): add TODO(lychee-action-sha-pin) marker Companion to #179, which establishes the convention. Mark the spot where the SHA pin needs to be reverted once lycheeverse/lychee-action cuts a tagged release at or after faea714, with a back-link to the tracked entry in TODO.md so a single grep finds both ends. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> --------- Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> Co-authored-by: Claude <noreply@anthropic.com>
donbeave
added a commit
to jackin-project/jackin
that referenced
this pull request
May 7, 2026
) * ci(docs): bump lychee to v0.24.0 to fix sitemap URL extraction The first Docs workflow run on main after #173 (commit f3f3e5e) failed in deploy → "Check deployed docs links" with "No files found for this input source". Root cause: the previous step ran lychee --dump on the deployed sitemap URL, but lychee 0.23.0 (the lycheeverse/lychee-action v2 default) only extracts <a href> from HTML and matching patterns from markdown — it does not parse <loc> entries from XML sitemaps. The dump produced an empty list and the follow-up --files-from step had nothing to read. Upstream already fixed this. lycheeverse/lychee#2071 (merged 2026-03-13, tagged in v0.24.0 on 2026-04-24) adds <loc> extraction from sitemap.xml, closing lycheeverse/lychee#2062 and #1819. Verified locally on 0.24.0: $ lychee --version lychee 0.24.0 $ lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 Pin LYCHEE_VERSION at the workflow env level and reference it from every lychee-action call so future bumps are one-line. v0.24.0's breaking changes are in lychee-lib (the Rust API consumers); the CLI surface we use is unchanged. Co-authored-by: Claude <noreply@anthropic.com> * ci(docs): bump lychee-action and lychee for sitemap URL extraction Replace the previous v0.24.0 bump with the only combination that actually works against the current lychee release pipeline: - lycheeverse/lychee-action SHA 8646ba3 (tagged v2.8.0) → faea714 (post-v2.8.0 master). Adds subfolder-aware install needed for any lychee 0.24.x tarball. - LYCHEE_VERSION 'v0.24.0' → 'v0.24.1'. Why both moves: * lychee 0.24.0 added <loc> extraction from XML sitemaps (lycheeverse/lychee#2071), which is what the deploy and check-deployed jobs need to feed --files-from. lychee 0.23.0 dumps zero links from a sitemap, which is what produced the "No files found for this input source" failure on f3f3e5e. * lychee 0.24.0's release tarball was repackaged with a top-level subfolder AND the asset filename was renamed to lychee-lychee-v0.24.0-{arch}-... — both incompatible with lychee-action v2.8.0's hardcoded download URL and flat-extract logic. * lychee 0.24.1 (released the same day) reverted to the original asset filename but kept the subfolder layout AND kept the sitemap fix. * lychee-action faea714 (unreleased; current HEAD of master) bumps the default to 0.24.1 and adds subfolder-aware install. Pinning the SHA is the same security model we already use for v2.8.0. The combination 8646ba3 + 'latest' or 8646ba3 + 'v0.24.x' both fail. The combination faea714 + 'v0.24.1' works. Verified locally: $ lychee-v0.24.1/lychee --version lychee 0.24.1 $ lychee-v0.24.1/lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 Co-authored-by: Claude <noreply@anthropic.com> * ci(docs): add TODO(lychee-action-sha-pin) marker Companion to #179, which establishes the convention. Mark the spot where the SHA pin needs to be reverted once lycheeverse/lychee-action cuts a tagged release at or after faea714, with a back-link to the tracked entry in TODO.md so a single grep finds both ends. Co-authored-by: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> Co-authored-by: Codex <codex@openai.com>
donbeave
added a commit
to jackin-project/jackin
that referenced
this pull request
May 7, 2026
) * ci(docs): bump lychee to v0.24.0 to fix sitemap URL extraction The first Docs workflow run on main after #173 (commit f3f3e5e) failed in deploy → "Check deployed docs links" with "No files found for this input source". Root cause: the previous step ran lychee --dump on the deployed sitemap URL, but lychee 0.23.0 (the lycheeverse/lychee-action v2 default) only extracts <a href> from HTML and matching patterns from markdown — it does not parse <loc> entries from XML sitemaps. The dump produced an empty list and the follow-up --files-from step had nothing to read. Upstream already fixed this. lycheeverse/lychee#2071 (merged 2026-03-13, tagged in v0.24.0 on 2026-04-24) adds <loc> extraction from sitemap.xml, closing lycheeverse/lychee#2062 and #1819. Verified locally on 0.24.0: $ lychee --version lychee 0.24.0 $ lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 Pin LYCHEE_VERSION at the workflow env level and reference it from every lychee-action call so future bumps are one-line. v0.24.0's breaking changes are in lychee-lib (the Rust API consumers); the CLI surface we use is unchanged. * ci(docs): bump lychee-action and lychee for sitemap URL extraction Replace the previous v0.24.0 bump with the only combination that actually works against the current lychee release pipeline: - lycheeverse/lychee-action SHA 8646ba3 (tagged v2.8.0) → faea714 (post-v2.8.0 master). Adds subfolder-aware install needed for any lychee 0.24.x tarball. - LYCHEE_VERSION 'v0.24.0' → 'v0.24.1'. Why both moves: * lychee 0.24.0 added <loc> extraction from XML sitemaps (lycheeverse/lychee#2071), which is what the deploy and check-deployed jobs need to feed --files-from. lychee 0.23.0 dumps zero links from a sitemap, which is what produced the "No files found for this input source" failure on f3f3e5e. * lychee 0.24.0's release tarball was repackaged with a top-level subfolder AND the asset filename was renamed to lychee-lychee-v0.24.0-{arch}-... — both incompatible with lychee-action v2.8.0's hardcoded download URL and flat-extract logic. * lychee 0.24.1 (released the same day) reverted to the original asset filename but kept the subfolder layout AND kept the sitemap fix. * lychee-action faea714 (unreleased; current HEAD of master) bumps the default to 0.24.1 and adds subfolder-aware install. Pinning the SHA is the same security model we already use for v2.8.0. The combination 8646ba3 + 'latest' or 8646ba3 + 'v0.24.x' both fail. The combination faea714 + 'v0.24.1' works. Verified locally: $ lychee-v0.24.1/lychee --version lychee 0.24.1 $ lychee-v0.24.1/lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 * ci(docs): add TODO(lychee-action-sha-pin) marker Companion to #179, which establishes the convention. Mark the spot where the SHA pin needs to be reverted once lycheeverse/lychee-action cuts a tagged release at or after faea714, with a back-link to the tracked entry in TODO.md so a single grep finds both ends. --------- Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> Co-authored-by: Claude <noreply@anthropic.com>
donbeave
added a commit
to jackin-project/jackin
that referenced
this pull request
May 7, 2026
) * ci(docs): bump lychee to v0.24.0 to fix sitemap URL extraction The first Docs workflow run on main after #173 (commit f3f3e5e) failed in deploy → "Check deployed docs links" with "No files found for this input source". Root cause: the previous step ran lychee --dump on the deployed sitemap URL, but lychee 0.23.0 (the lycheeverse/lychee-action v2 default) only extracts <a href> from HTML and matching patterns from markdown — it does not parse <loc> entries from XML sitemaps. The dump produced an empty list and the follow-up --files-from step had nothing to read. Upstream already fixed this. lycheeverse/lychee#2071 (merged 2026-03-13, tagged in v0.24.0 on 2026-04-24) adds <loc> extraction from sitemap.xml, closing lycheeverse/lychee#2062 and #1819. Verified locally on 0.24.0: $ lychee --version lychee 0.24.0 $ lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 Pin LYCHEE_VERSION at the workflow env level and reference it from every lychee-action call so future bumps are one-line. v0.24.0's breaking changes are in lychee-lib (the Rust API consumers); the CLI surface we use is unchanged. * ci(docs): bump lychee-action and lychee for sitemap URL extraction Replace the previous v0.24.0 bump with the only combination that actually works against the current lychee release pipeline: - lycheeverse/lychee-action SHA 8646ba3 (tagged v2.8.0) → faea714 (post-v2.8.0 master). Adds subfolder-aware install needed for any lychee 0.24.x tarball. - LYCHEE_VERSION 'v0.24.0' → 'v0.24.1'. Why both moves: * lychee 0.24.0 added <loc> extraction from XML sitemaps (lycheeverse/lychee#2071), which is what the deploy and check-deployed jobs need to feed --files-from. lychee 0.23.0 dumps zero links from a sitemap, which is what produced the "No files found for this input source" failure on f3f3e5e. * lychee 0.24.0's release tarball was repackaged with a top-level subfolder AND the asset filename was renamed to lychee-lychee-v0.24.0-{arch}-... — both incompatible with lychee-action v2.8.0's hardcoded download URL and flat-extract logic. * lychee 0.24.1 (released the same day) reverted to the original asset filename but kept the subfolder layout AND kept the sitemap fix. * lychee-action faea714 (unreleased; current HEAD of master) bumps the default to 0.24.1 and adds subfolder-aware install. Pinning the SHA is the same security model we already use for v2.8.0. The combination 8646ba3 + 'latest' or 8646ba3 + 'v0.24.x' both fail. The combination faea714 + 'v0.24.1' works. Verified locally: $ lychee-v0.24.1/lychee --version lychee 0.24.1 $ lychee-v0.24.1/lychee --dump https://jackin.tailrocks.com/sitemap-0.xml | wc -l 45 * ci(docs): add TODO(lychee-action-sha-pin) marker Companion to #179, which establishes the convention. Mark the spot where the SHA pin needs to be reverted once lycheeverse/lychee-action cuts a tagged release at or after faea714, with a back-link to the tracked entry in TODO.md so a single grep finds both ends. --------- Signed-off-by: Alexey Zhokhov <alexey@zhokhov.com> Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2062
This PR adds support for extracting links from
<loc>tags fromsitemap.xml. The implementation is kept minimal, relying on a regex, similar to how links are extracted from CSS.In future, the XML extractor could be extended with support for links in SVGs and perhaps a proper XML parser. However, those are out-of-scope for this PR.