When reading this PR, I had the thought that the new preprocess value has to be handled a lot and passed repeatedly to get to where it needs to be used. I think this is due to an architecture where it's all very hierarchical. It looks something like this (conceptually):
inputs
|> collector(basic_auth, skip, include_verbatim, client, preprocess, ...)
Here, collector calls other helper functions and has to amalgamate all their arguments. It is responsible for a lot of functionality, from resolving inputs all the way to link extraction and request building.
If the architecture was more like a flat pipeline, it would reduce the need for this argument injection. Instead, of one big "collector", it might look like this:
inputs
|> resolve_inputs(skip, glob_ignore_case)
|> preprocess_inputs(pre_cmd)
|> get_input_contents(basic_auth, retries, max_redirect)
|> extract_links(root_dir, base_url)
Hopefully, you can see how this reduces the parameters needed - each step only needs the parameters for its own functionality. A clear pipeline makes it much easier to implement features like --dump or --dump-inputs, which are just stopping at certain points in the pipeline (I started thinking about this because of the dumping issues). It also makes testing easier.
Anyway, this is all theoretical at the moment. I don't know if this is possible or how hard it would be. There is Chain in the codebase, but it's limited to homogenous pipeline functions. Anyway, as I said, nothing that needs to affects this PR right now.
Originally posted by @katrinafyi in #1891 (review)
When reading this PR, I had the thought that the new preprocess value has to be handled a lot and passed repeatedly to get to where it needs to be used. I think this is due to an architecture where it's all very hierarchical. It looks something like this (conceptually):
inputs |> collector(basic_auth, skip, include_verbatim, client, preprocess, ...)Here,
collectorcalls other helper functions and has to amalgamate all their arguments. It is responsible for a lot of functionality, from resolving inputs all the way to link extraction and request building.If the architecture was more like a flat pipeline, it would reduce the need for this argument injection. Instead, of one big "collector", it might look like this:
Hopefully, you can see how this reduces the parameters needed - each step only needs the parameters for its own functionality. A clear pipeline makes it much easier to implement features like --dump or --dump-inputs, which are just stopping at certain points in the pipeline (I started thinking about this because of the dumping issues). It also makes testing easier.
Anyway, this is all theoretical at the moment. I don't know if this is possible or how hard it would be. There is Chain in the codebase, but it's limited to homogenous pipeline functions. Anyway, as I said, nothing that needs to affects this PR right now.
Originally posted by @katrinafyi in #1891 (review)