Skip to content

Build hangs when mkdocstrings paths directories contain virtual environments #391

@lucasrodes

Description

@lucasrodes

Context

Using mkdocstrings to document Python packages in a monorepo where sub-libraries have their own virtual environments alongside their source code.

Bug description

The _list_sources function in config.py recursively scans all directories under the mkdocstrings.handlers.python.paths to enumerate Python files for cache invalidation. There is no way to exclude directories from this scan.

The paths option must point to the parent of the top-level package so that griffe can resolve the full module path. For example, to document mypkg, paths must include the directory containing the mypkg/ package — you cannot point deeper (e.g. to mypkg/ directly) because griffe would fail to resolve mypkg.hello (it looks for mypkg/ as a child of the search path).

This means paths necessarily points to a broad directory that may contain non-source content. If that directory includes a virtual environment (.venv, venv, or any other name), _list_sources picks up tens of thousands of third-party .py files. This causes extreme memory usage (25+ GB observed) and the build hangs indefinitely.

There is currently no way for the user to work around this at the configuration level. A possible solution would be an exclude option for _list_sources, for example:

[project.plugins.mkdocstrings.handlers.python]
paths = ["lib"]
exclude = [".venv", "venv"]

As a workaround, _list_sources can be monkeypatched before calling build() to filter out unwanted paths.

Related links

Reproduction

I have attached a .zip file with a minimal reproduction. The zip contains a zensical project with a lib/mypkg source package and mkdocstrings configured with paths = ["lib"]. The virtual environment is not included in the zip due to its size — it must be created as part of the steps below.

zensical-repro.zip

Steps to reproduce

  1. Unzip the attached file
  2. Install zensical and mkdocstrings: pip install "zensical>=0.0.23" "mkdocstrings[python]"
  3. Create a virtual environment inside lib/ to simulate a sub-library with its own environment:
    python -m venv lib/myenv
    lib/myenv/bin/pip install requests pandas numpy django flask sqlalchemy boto3 scipy matplotlib scikit-learn
  4. Run zensical build -f zensical.toml --clean
  5. Observe that the build hangs and memory usage grows continuously
  6. For comparison, remove the virtual environment and rebuild:
    rm -rf lib/myenv
    zensical build -f zensical.toml --clean
    The build completes in under a second.

Note: narrowing paths to lib/mypkg is not an option when the documented module is mypkg — griffe requires mypkg/ to be a direct child of the search path to resolve mypkg.hello.

Browser

No response

Before submitting

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue reports a bugresolvedIssue is resolved, yet unreleased if open

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions