Introduce extractor pool

## Introduction

As of now we send each URI we'd like to check to a client pool in `main.rs`. Code is here:
https://github.com/lycheeverse/lychee/blob/d2e349cb29606d3a4a999dcf5dce1b5108a8c0f9/src/main.rs#L128-L135

This is not ideal for a few reasons:

- All links get extracted on startup. This is a slow process that can take up to a few seconds for long link lists.
  It's not necessary to block the client during this step, though as we could lazy-load the links on demand from the inputs.
- There is no clear separation of concerns between `main` and the link extraction. Ideally the responsibilities could should be split up to make testing and refactoring easier.

We already use a channel for sending the links to check to the client pool. We could use the same abstraction for extracting the links, too in form of an extractor pool.

In the future this would allow implementing some advanced features in an extensible way:

- Recursively check links: Push newly discovered websites into the input channel of the extractor pool
- Skip duplicate URLs: Filter input links with a `HashSet` or even a Bloom filter (for constant memory-usage) that is maintained by the extractor pool before sending it to the client pool.
- Request throttling: Group requests per website and apply some throttling to not overload the server.

## How to contribute

1. Create an extractor pool similar to our [client pool](https://github.com/lycheeverse/lychee/blob/master/src/client_pool.rs)
2. Spawn the pool inside `main` on startup, pass the channel to the pool and start processing the inputs.

(The other end of the channel the channel is already passed to the client pool.)


	tokio::spawn(async move {
	for link in links {
	if let Some(pb) = &bar {
	pb.set_message(&link.to_string());
	};
	send_req.send(link).await.unwrap();
	}
	});

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce extractor pool #55

Introduction

How to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Introduce extractor pool #55

Description

Introduction

How to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions