Skip to content

♻️ refactor(deps): drop pip dependency, use stdlib for metadata#552

Open
gaborbernat wants to merge 14 commits intotox-dev:mainfrom
gaborbernat:no-pip
Open

♻️ refactor(deps): drop pip dependency, use stdlib for metadata#552
gaborbernat wants to merge 14 commits intotox-dev:mainfrom
gaborbernat:no-pip

Conversation

@gaborbernat
Copy link
Member

@gaborbernat gaborbernat commented Mar 10, 2026

pipdeptree depended on pip._internal APIs that are explicitly not part of pip's public interface. Every pip release risked breakage — pip 21.1.1 already required a compatibility fix (#150), and the project had to isolate all pip usage into a single module just to contain the blast radius (#402). 🔧 The dependency also blocked pipdeptree from working in pip-less environments (#335) and added ~50MB to installation size.

This replaces all pip._internal imports with a new _parser package built entirely on stdlib (importlib.metadata, json, urllib). It implements PEP 610 direct_url.json parsing with full validation, editable install detection for both modern PEP 660 and legacy .egg-link files, and VCS requirement string generation for git, hg, svn, and bzr. ✨ The VCS module is split into per-backend files with shared utilities, matching pip's output format for freeze-style requirement strings including credential redaction, SCP-style git URL normalization, and subdirectory fragments.

The pip>=25.2 dependency is removed from pyproject.toml and _freeze.py is deleted entirely. pipdeptree now works independently of pip's release cycle and in any Python environment with importlib.metadata available. All existing output formats are preserved.

@gaborbernat gaborbernat force-pushed the no-pip branch 2 times, most recently from 039cad4 to d0f3e9d Compare March 10, 2026 00:59
@gaborbernat gaborbernat requested a review from kemzeb March 10, 2026 00:59
@gaborbernat gaborbernat force-pushed the no-pip branch 3 times, most recently from 2e92354 to 19babe7 Compare March 10, 2026 01:27
Relying on pip's internal APIs created fragility as these APIs change
frequently between pip versions without deprecation warnings. This caused
pipdeptree to break unexpectedly in CI/CD pipelines and user environments.

Replaced pip internals with a standalone PEP 610 implementation using only
Python's standard library (importlib.metadata, json, urllib). The new
_parser package handles direct URL parsing, editable detection (both modern
PEP 610 and legacy .egg-link), and requirement string formatting.

This makes pipdeptree more stable, reduces its installed footprint by ~50MB,
and allows it to work in pip-less environments.

Signed-off-by: Bernát Gábor <bgabor8@bloomberg.net>
@kemzeb
Copy link
Collaborator

kemzeb commented Mar 10, 2026

Going to take a look at pip's tests for freezing (looks like it's stored here) and the PEPs to try to see if we need to implement something.

I did notice the following after running pip install -e . to make pipdeptree an editable package:
Old

vscode ➜ /workspaces/pipdeptree (main) $ pipdeptree -p pipdeptree -o freeze
Warning!!! Duplicate package metadata found:
"/usr/local/lib/python3.10/site-packages"
  pip                              23.0.1           (using 26.0, "/home/vscode/.local/lib/python3.10/site-packages")
NOTE: This warning isn't a failure warning.
------------------------------------------------------------------------
-e git+https://github.com/kemzeb/pipdeptree.git@205cca8b02bb7804216410f57e88f14985e7a17a#egg=pipdeptree
  packaging==26.0
  pip==26.0
vscode ➜ /workspaces/pipdeptree (main) $

New

vscode ➜ /workspaces/pipdeptree (no-pip) $ pipdeptree -p pipdeptree -o freeze
Warning!!! Duplicate package metadata found:
"/usr/local/lib/python3.10/site-packages"
  pip                              23.0.1           (using 26.0, "/home/vscode/.local/lib/python3.10/site-packages")
NOTE: This warning isn't a failure warning.
------------------------------------------------------------------------
-e /workspaces/pipdeptree
  packaging==26.0
  pip==26.0
vscode ➜ /workspaces/pipdeptree (no-pip) $ 

However, pipdeptree's direct-url.json is missing VCS info:

vscode ➜ /workspaces/pipdeptree (no-pip) $ cat /home/vscode/.local/lib/python3.10/site-packages/pipdeptree-2.30.1.dev29+g205cca8b0.dist-info/direct_url.json 
{"dir_info": {"editable": true}, "url": "file:///workspaces/pipdeptree"}

This makes me believe that our older pip API usage may have been searching the external environment to get VCS info

@kemzeb
Copy link
Collaborator

kemzeb commented Mar 10, 2026

This makes me believe that our older pip API usage may have been searching the external environment to get VCS info

Yes, this is the case. The old code eventually ends up calling this pip function, where it will end up calling the "git" VersionControl backend to detect if it's a git repository.

Looks like the question is if we should attempt to scan the external environment to get the VCS, or should the metadata file be the only place we look? If we are looking to do the former it may be better to just have the freeze requirement generation as it's own standalone library that we maintain.

…for editables

The initial pip dependency removal introduced several functional regressions and
structural bugs that caused pipdeptree to diverge from pip freeze behavior. Editable
VCS installs were showing -e /path instead of -e git+url@commit#egg=name, losing
critical provenance information. The DirectUrl implementation had structural
mismatches with PEP 610 and pip's reference implementation.

Restructured DirectUrl to use single info field (Union[VcsInfo, ArchiveInfo, DirInfo])
matching pip's implementation, enforcing exactly-one semantics. Fixed ArchiveInfo to
use hash field per PEP 610 spec. Added VCS detection module that probes filesystem
for git repositories and extracts remote URL and commit hash, falling back to local
path when VCS unavailable. Fixed egg-link fallback to only run when direct_url.json
missing, preventing incorrect editable detection with stale .egg-link files. Added
legacy version support using === operator for non-PEP-440 versions. Strengthened
type validation for dir_info.editable and required VCS fields.
Refactored direct_url.json parsing to eliminate complexity violations and
add comprehensive type validation matching pip's strict validation model.

Extracted parse_direct_url_json into helper functions to reduce complexity:
- _load_and_validate_json: JSON loading and structure validation
- _validate_required_fields: Type validation for url/subdirectory
- _parse_info_block: Info block selection and validation

Enhanced VCS detection to handle repository subdirectories using git
rev-parse --show-toplevel, allowing editable packages in subdirectories.
Added URL normalization for SCP-style (git@host:path), git://, and local
paths. Added subdirectory fragment support for monorepo packages.

Fixed file:// URL conversion to match pip's implementation exactly,
handling localhost URLs, UNC paths on Windows, and Windows drive letter
corrections. Rejects non-local file URIs on non-Windows platforms.

Added comprehensive type validation for all direct_url.json fields:
- Validates url and subdirectory are strings
- Validates all info blocks are dicts
- Validates vcs_info fields (vcs, commit_id, requested_revision) are strings
- Validates archive_info.hash matches algo=value pattern
- Validates dir_info.editable is boolean

Changed DirectUrlValidationError to inherit from ValueError instead of
Exception, following Python conventions for value validation errors.

All 84 parser tests pass with 95.63% coverage maintained.
Fixed git:// URLs being incorrectly prefixed with git+ (should be git://... not
git+git://...). Now only adds git+ prefix when URL doesn't already start with git:,
matching pip's behavior per versioncontrol.py:262.

Fixed url_to_path double-decoding issue where unquote was called before url2pathname,
causing file:///x%2520y to become x y instead of x%20y like pip. Removed unquote since
url2pathname handles percent-decoding.

Refactored _get_git_requirement to eliminate complexity violations by extracting helper
functions: _get_repo_root, _get_remote_url, _get_commit_id, _build_vcs_requirement.

Fixed hash validation to accept any algo=value format (including md5, uppercase) to
match pip's lenient validation instead of strict regex.

Added platform-specific pragma comments for Windows-only code paths (UNC paths, drive
letter handling) that can't be tested on non-Windows platforms.

Added PLR2004 to global ruff ignore since magic number checks are disabled project-wide.
Added pytest-subprocess>=1.5 to test dependencies for declarative subprocess mocking.

All 99 tests pass with 100% diff coverage.
@gaborbernat gaborbernat marked this pull request as draft March 10, 2026 05:00
Tests now only import and exercise public functions, treating private
helper functions (prefixed with _) as implementation details. This
improves test maintainability by reducing coupling to internal structure.

Switched from manual mocking (Mock, monkeypatch) to declarative test
helpers (pytest-subprocess for subprocess calls, pytest-mock for patches).
This makes subprocess interaction tests more readable and eliminates
the need for complex mock.side_effect chains.

Reduced test count from 84 to 81 while maintaining coverage by combining
related test cases into parametrized tests. Coverage remains at 97.44%
with only Windows-specific code paths untested.
The pip-free parser diverged from pip in several areas that affected
editable install representation and metadata fidelity. This brings
full compatibility.

Multi-VCS detection (hg/svn/bzr) joins the existing git backend,
using pip's innermost-repo-root selection logic. Each backend extracts
remote URL and revision via the same subprocess commands pip uses.

VcsResult replaces the raw string return so callers can emit
pip-compatible diagnostic comments for missing VCS, missing remotes,
and invalid remote URIs. Subdirectory calculation now walks up from
the install location looking for pyproject.toml/setup.py, matching
pip's find_path_to_project_root_from_repo_root.

Egg-link search now scans sys.path with safe-name normalization
before falling back to site-packages, matching pip's
egg_link_path_from_sys_path. ArchiveInfo gains a hashes dict and
DirectUrl gains credential redaction for URL output.
- Redact token-only credentials (no colon) to ****@host
- Always prefix git URLs with git+ (including git:// protocol)
- Skip subdirectory fragment for svn and bzr backends
- Convert hg/bzr local-path remotes to file:// URLs
- Return COMMAND_NOT_FOUND instead of NO_REMOTE for missing
  VCS binaries
Resolve relative-path remotes (e.g. ../repo.git) against repo_root
in _normalize_git_url, matching pip's behavior of treating any
existing path as local.
Pip's DirectUrl._remove_auth_from_netloc preserves git@ only when
vcs_info.vcs == "git", strips all other auth entirely (no ****@),
and preserves ${VAR} patterns. Align our implementation by passing
info to _redact_url.
Use re.VERBOSE with named groups for all regex patterns. Tighten
env-var detection to match pip's ENV_VAR_RE (only ${VAR} or
${VAR}:${VAR}), preventing credential leak in mixed patterns
like ${TOKEN}:pass@.
@gaborbernat gaborbernat marked this pull request as ready for review March 10, 2026 23:43
Reorganize _vcs.py (449 lines) into a package with one module
per VCS backend, improving navigability. Tests restructured to
mirror the src layout with shared helpers in conftest.py.
- Make cross-module imports public with __all__ re-exports
- Compile regex as module-level Final[re.Pattern[str]]
- Support hg shared repos (.hg as file) via dir_only param
- Format tests with setup/run/assert blank line separation
- Rename test helpers and test names to reference public API
@gaborbernat
Copy link
Member Author

@kemzeb this is now actually ready 👍

@gaborbernat gaborbernat changed the title ♻️ refactor(deps): drop pip dependency, use stdlib for metadata parsing ♻️ refactor(deps): drop pip dependency, use stdlib for metadata Mar 11, 2026
uv's direct_url.json has commit_id as Optional<String>. Handle
gracefully instead of rejecting the entire direct_url.json.
@kemzeb
Copy link
Collaborator

kemzeb commented Mar 11, 2026

Good stuff, I'll be able to dive deep into this on Thursday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants