Skip to content

Parallelize external URL checks#96

Merged
manuzhang merged 1 commit intomanuzhang:mainfrom
sisp:perf/concurrent-requests
Feb 21, 2026
Merged

Parallelize external URL checks#96
manuzhang merged 1 commit intomanuzhang:mainfrom
sisp:perf/concurrent-requests

Conversation

@sisp
Copy link
Copy Markdown
Contributor

@sisp sisp commented Feb 19, 2026

I've introduced concurrent.futures.ThreadPoolExecutor, inspired by the official example, to check external URLs in parallel instead of serially, speeding up the build when many external URLs need to be validated.

Here is a benchmark using hyperfine based on the test site in tests/integration/:

$ # Before:
$ hyperfine --warmup 3 --prepare 'rm -rf site/' --runs 20 'mkdocs build'
Benchmark 1: mkdocs build
  Time (mean ± σ):      2.682 s ±  0.127 s    [User: 0.400 s, System: 0.039 s]
  Range (min … max):    2.491 s …  2.962 s    20 runs

$ # After:
$ hyperfine --warmup 3 --prepare 'rm -rf site/' --runs 20 'mkdocs build'
Benchmark 1: mkdocs build
  Time (mean ± σ):      1.510 s ±  0.165 s    [User: 0.380 s, System: 0.042 s]
  Range (min … max):    0.919 s …  1.739 s    20 runs

The build time is almost cut in half on my machine.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces parallelization for external URL validation using Python's concurrent.futures.ThreadPoolExecutor to improve build performance. The change replaces sequential URL checking with concurrent execution, resulting in approximately 44% faster build times according to the provided benchmarks (from 2.682s to 1.510s mean).

Changes:

  • Added import for concurrent.futures module
  • Modified URL checking logic to collect URLs into a list before processing
  • Implemented parallel URL validation using ThreadPoolExecutor with as_completed() pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +142 to +145
for future in concurrent.futures.as_completed(
executor.submit(self.check_url, url, page.file.src_path, all_element_ids, opt_files) for url in urls_to_check
):
future.result()
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parallel execution doesn't explicitly handle exceptions that might be raised in worker threads. While future.result() will re-raise any exception that occurred in the thread, if one thread raises an exception, the other threads will continue executing. Additionally, if raise_error is True and multiple URLs fail simultaneously, only the first exception will be raised while others are lost.

Consider adding explicit exception handling logic to collect all failures before raising, or at minimum add a comment explaining the exception propagation behavior to make the intent clear.

Suggested change
for future in concurrent.futures.as_completed(
executor.submit(self.check_url, url, page.file.src_path, all_element_ids, opt_files) for url in urls_to_check
):
future.result()
errors: List[PluginError] = []
for future in concurrent.futures.as_completed(
executor.submit(self.check_url, url, page.file.src_path, all_element_ids, opt_files) for url in urls_to_check
):
try:
future.result()
except PluginError as exc:
# Collect all plugin errors raised during parallel URL checking.
errors.append(exc)
if errors and self.config['raise_error']:
if len(errors) == 1:
# Preserve the original single-error behavior.
raise errors[0]
combined_message = "Multiple link validation errors:\n" + "\n".join(str(e) for e in errors)
raise PluginError(combined_message)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to add a comment explaining the exception propagation behavior to make the intent clear. I don't think more complex logic is needed in this case.

@sisp sisp force-pushed the perf/concurrent-requests branch 2 times, most recently from 774137c to 28081bc Compare February 20, 2026 15:03
Use `concurrent.futures.ThreadPoolExecutor` to check external URLs in
parallel instead of serially, speeding up the build when many
external URLs need to be validated.
@sisp sisp force-pushed the perf/concurrent-requests branch from 28081bc to 7c33936 Compare February 20, 2026 20:33
@manuzhang manuzhang merged commit e218e28 into manuzhang:main Feb 21, 2026
12 checks passed
@manuzhang
Copy link
Copy Markdown
Owner

@sisp Thanks for your contribution!

@sisp sisp deleted the perf/concurrent-requests branch February 22, 2026 11:01
@sisp
Copy link
Copy Markdown
Contributor Author

sisp commented Feb 22, 2026

Sure thing, thanks for the quick review! 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants