Skip to content

Should we limit hooks that are processing too many files #1998

@hofbi

Description

@hofbi

https://github.com/crate-ci/typos is an awesome hook and runs super fast. However, for large codebases and since it processes any file independent of the language, the number of files this hook has to process at once can be a lot. When running on a small CI machine, this can kill the machine with an out of memory error.

I did some investigations already and compared the hook when running prek run -a typos:

  • without require_serial:
    DEBUG: Running priority group with priority XX with concurrency N
    TRACE run{hook_id=typos language=python}: Running typos total_files=10000 concurrency=N
    
  • require_serial: true
    DEBUG: Running priority group with priority XX with concurrency N
    TRACE run{hook_id=typos language=python}: Running typos total_files=10000 concurrency=1
    

where N is the number of available cores (I guess). Now when running on a small machine with 4GB or 8GB memory, this can easily cause OOMing.

I am not sure if this is a typos problem or prek problem, so I wanted to start here and maybe create an issue in https://github.com/crate-ci/typos as well.

Would it make sense to specify a file limit or something in prek so that the hook is called multiple times in sequence when the number of files exceeds the limit?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions