Skip to content

Flatten processing pipeline #1898

@thomas-zahner

Description

@thomas-zahner

When reading this PR, I had the thought that the new preprocess value has to be handled a lot and passed repeatedly to get to where it needs to be used. I think this is due to an architecture where it's all very hierarchical. It looks something like this (conceptually):

inputs
|> collector(basic_auth, skip, include_verbatim, client, preprocess, ...)

Here, collector calls other helper functions and has to amalgamate all their arguments. It is responsible for a lot of functionality, from resolving inputs all the way to link extraction and request building.

If the architecture was more like a flat pipeline, it would reduce the need for this argument injection. Instead, of one big "collector", it might look like this:

inputs
|> resolve_inputs(skip, glob_ignore_case)
|> preprocess_inputs(pre_cmd)
|> get_input_contents(basic_auth, retries, max_redirect)
|> extract_links(root_dir, base_url)

Hopefully, you can see how this reduces the parameters needed - each step only needs the parameters for its own functionality. A clear pipeline makes it much easier to implement features like --dump or --dump-inputs, which are just stopping at certain points in the pipeline (I started thinking about this because of the dumping issues). It also makes testing easier.

Anyway, this is all theoretical at the moment. I don't know if this is possible or how hard it would be. There is Chain in the codebase, but it's limited to homogenous pipeline functions. Anyway, as I said, nothing that needs to affects this PR right now.

Originally posted by @katrinafyi in #1891 (review)

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions