Parallelize external URL checks#96
Conversation
0e4e453 to
96f17c1
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces parallelization for external URL validation using Python's concurrent.futures.ThreadPoolExecutor to improve build performance. The change replaces sequential URL checking with concurrent execution, resulting in approximately 44% faster build times according to the provided benchmarks (from 2.682s to 1.510s mean).
Changes:
- Added import for
concurrent.futuresmodule - Modified URL checking logic to collect URLs into a list before processing
- Implemented parallel URL validation using ThreadPoolExecutor with
as_completed()pattern
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for future in concurrent.futures.as_completed( | ||
| executor.submit(self.check_url, url, page.file.src_path, all_element_ids, opt_files) for url in urls_to_check | ||
| ): | ||
| future.result() |
There was a problem hiding this comment.
The parallel execution doesn't explicitly handle exceptions that might be raised in worker threads. While future.result() will re-raise any exception that occurred in the thread, if one thread raises an exception, the other threads will continue executing. Additionally, if raise_error is True and multiple URLs fail simultaneously, only the first exception will be raised while others are lost.
Consider adding explicit exception handling logic to collect all failures before raising, or at minimum add a comment explaining the exception propagation behavior to make the intent clear.
| for future in concurrent.futures.as_completed( | |
| executor.submit(self.check_url, url, page.file.src_path, all_element_ids, opt_files) for url in urls_to_check | |
| ): | |
| future.result() | |
| errors: List[PluginError] = [] | |
| for future in concurrent.futures.as_completed( | |
| executor.submit(self.check_url, url, page.file.src_path, all_element_ids, opt_files) for url in urls_to_check | |
| ): | |
| try: | |
| future.result() | |
| except PluginError as exc: | |
| # Collect all plugin errors raised during parallel URL checking. | |
| errors.append(exc) | |
| if errors and self.config['raise_error']: | |
| if len(errors) == 1: | |
| # Preserve the original single-error behavior. | |
| raise errors[0] | |
| combined_message = "Multiple link validation errors:\n" + "\n".join(str(e) for e in errors) | |
| raise PluginError(combined_message) |
There was a problem hiding this comment.
I decided to add a comment explaining the exception propagation behavior to make the intent clear. I don't think more complex logic is needed in this case.
774137c to
28081bc
Compare
Use `concurrent.futures.ThreadPoolExecutor` to check external URLs in parallel instead of serially, speeding up the build when many external URLs need to be validated.
28081bc to
7c33936
Compare
|
@sisp Thanks for your contribution! |
|
Sure thing, thanks for the quick review! 🙇 |
I've introduced
concurrent.futures.ThreadPoolExecutor, inspired by the official example, to check external URLs in parallel instead of serially, speeding up the build when many external URLs need to be validated.Here is a benchmark using hyperfine based on the test site in
tests/integration/:The build time is almost cut in half on my machine.