Skip to content

Update WM archiver script: add optional target parameter#3221

Merged
arkid15r merged 5 commits intovacanza:devfrom
pareshjoshij:links_archiver
Jan 21, 2026
Merged

Update WM archiver script: add optional target parameter#3221
arkid15r merged 5 commits intovacanza:devfrom
pareshjoshij:links_archiver

Conversation

@pareshjoshij
Copy link
Copy Markdown
Contributor

Proposed change

Closes #3220

I've updated scripts/archive_links.py to accept an optional path argument.

Right now, the script scans the whole holidays folder by default, which can take a while if we just want to verify references for a specific country file. With this update, you can point the archiver to a single file or sub-directory (e.g., python scripts/archive_links.py holidays/countries/estonia.py). If no path is provided, it still defaults to scanning everything, so existing workflows won't break.

Note on rate limits: I skipped adding a sleep timer to prioritize speed. If you'd prefer I add a delay to avoid potential API 429 errors, just let me know and I'll add it.

Type of change

  • New country/market holidays support (thank you!)
  • Supported country/market holidays update (calendar discrepancy fix, localization)
  • Existing code/documentation/test/process quality improvement (best practice, cleanup, refactoring, optimization)
  • Dependency update (version deprecation/pin/upgrade)
  • Bugfix (non-breaking change which fixes an issue)
  • Breaking change (a code change causing existing functionality to break)
  • New feature (new holidays functionality in general)

Checklist

Copilot AI review requested due to automatic review settings January 18, 2026 05:06
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 18, 2026

Caution

Review failed

Failed to post review comments

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

  • New Features

    • Added an optional path argument to target a specific file or directory; if omitted, the tool scans the original default location.
    • When a path is provided, the tool will process a single file or scan a directory accordingly.
    • Existing modes (show-only, check-liveness, archive) work with the new path-aware flow.
  • Bug Fixes

    • Adds error handling for non-existent paths (prints error and exits).

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

scripts/archive_links.py now accepts an optional positional path argument. If given, the script processes a single file or all files in a directory (errors on non-existent paths); if omitted, it scans the default holidays directory. Existing modes (show-only, check-liveness, archive) use the path-aware flow.

Changes

Cohort / File(s) Summary
CLI and main control flow
scripts/archive_links.py
Added optional positional path CLI argument; main() now branches to handle a single file, a directory, or the default holidays directory. Added error handling for non-existent paths and a local typed variable files_to_urls_data: dict[str, list[str]] = {} to route existing modes (show-only, check-liveness, archive) through the new flow.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • KJhellico
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: adding an optional target parameter to the WM archiver script.
Description check ✅ Passed The description clearly explains the change, references the linked issue #3220, and outlines the new functionality with examples.
Linked Issues check ✅ Passed The PR successfully implements the requirement from #3220: adding an optional path argument to run the archiver on a single file or subdirectory instead of scanning the entire library.
Out of Scope Changes check ✅ Passed All changes are scoped to the archiver script enhancement. The modifications add the requested path parameter functionality without introducing unrelated alterations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds an optional path argument to the archive_links.py script, enabling users to target specific files or directories for link archiving instead of scanning the entire holidays library. This addresses issue #3220 which requested the ability to run the link archiver on a single file for faster verification during country-specific updates.

Changes:

  • Added optional positional path argument to the CLI
  • Implemented logic to handle single file, directory, or default (all holidays) scanning modes
  • Maintained backward compatibility by defaulting to full library scan when no path is provided

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +445 to +449
if os.path.isfile(target_path):
print(f"Targeting single file: {target_path}")
urls = find_hyperlinks_in_file(target_path)
if urls:
files_to_urls_data = {target_path: urls}
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When targeting a single file, the code should validate that the file has one of the allowed extensions defined in EXTENSIONS_TO_SCAN. Currently, scan_directory_for_links filters files by extension, but this single-file path bypasses that validation. This could lead to processing files with unsupported extensions or inconsistent behavior compared to directory scanning. Consider adding a check to ensure the file ends with one of the allowed extensions before processing.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (095e64d) to head (8412b26).
⚠️ Report is 1 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff            @@
##               dev     #3221   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          307       307           
  Lines        18309     18309           
  Branches      2330      2330           
=========================================
  Hits         18309     18309           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 18, 2026
@KJhellico
Copy link
Copy Markdown
Collaborator

Actually, this script needs more refactoring later (use pathlib.Path, class-based approach).

@KJhellico KJhellico changed the title feat: add path argument to archive_links.py Update WM archiver script: add optional target parameter Jan 18, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 18, 2026
@pareshjoshij
Copy link
Copy Markdown
Contributor Author

@KJhellico sir, I’ve removed the extra comments and print statements !

KJhellico
KJhellico previously approved these changes Jan 21, 2026
Copy link
Copy Markdown
Collaborator

@KJhellico KJhellico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

PPsyrius
PPsyrius previously approved these changes Jan 21, 2026
Copy link
Copy Markdown
Collaborator

@PPsyrius PPsyrius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🛠️

I tried this script earlier and it works perfectly

@sonarqubecloud
Copy link
Copy Markdown

@arkid15r arkid15r enabled auto-merge January 21, 2026 19:13
@arkid15r arkid15r added this pull request to the merge queue Jan 21, 2026
Merged via the queue into vacanza:dev with commit 3608018 Jan 21, 2026
32 checks passed
@pareshjoshij pareshjoshij deleted the links_archiver branch January 23, 2026 11:21
@KJhellico KJhellico mentioned this pull request Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an option to run the links archiver on a single .py file

5 participants