-
-
Notifications
You must be signed in to change notification settings - Fork 197
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Introduction
As of now we send each URI we'd like to check to a client pool in main.rs. Code is here:
Lines 128 to 135 in d2e349c
| tokio::spawn(async move { | |
| for link in links { | |
| if let Some(pb) = &bar { | |
| pb.set_message(&link.to_string()); | |
| }; | |
| send_req.send(link).await.unwrap(); | |
| } | |
| }); |
This is not ideal for a few reasons:
- All links get extracted on startup. This is a slow process that can take up to a few seconds for long link lists.
It's not necessary to block the client during this step, though as we could lazy-load the links on demand from the inputs. - There is no clear separation of concerns between
mainand the link extraction. Ideally the responsibilities could should be split up to make testing and refactoring easier.
We already use a channel for sending the links to check to the client pool. We could use the same abstraction for extracting the links, too in form of an extractor pool.
In the future this would allow implementing some advanced features in an extensible way:
- Recursively check links: Push newly discovered websites into the input channel of the extractor pool
- Skip duplicate URLs: Filter input links with a
HashSetor even a Bloom filter (for constant memory-usage) that is maintained by the extractor pool before sending it to the client pool. - Request throttling: Group requests per website and apply some throttling to not overload the server.
How to contribute
- Create an extractor pool similar to our client pool
- Spawn the pool inside
mainon startup, pass the channel to the pool and start processing the inputs.
(The other end of the channel the channel is already passed to the client pool.)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed