♻️ refactor(deps): drop pip dependency, use stdlib for metadata#552
♻️ refactor(deps): drop pip dependency, use stdlib for metadata#552gaborbernat wants to merge 14 commits intotox-dev:mainfrom
Conversation
039cad4 to
d0f3e9d
Compare
2e92354 to
19babe7
Compare
Relying on pip's internal APIs created fragility as these APIs change frequently between pip versions without deprecation warnings. This caused pipdeptree to break unexpectedly in CI/CD pipelines and user environments. Replaced pip internals with a standalone PEP 610 implementation using only Python's standard library (importlib.metadata, json, urllib). The new _parser package handles direct URL parsing, editable detection (both modern PEP 610 and legacy .egg-link), and requirement string formatting. This makes pipdeptree more stable, reduces its installed footprint by ~50MB, and allows it to work in pip-less environments. Signed-off-by: Bernát Gábor <bgabor8@bloomberg.net>
|
Going to take a look at pip's tests for freezing (looks like it's stored here) and the PEPs to try to see if we need to implement something. I did notice the following after running New However, pipdeptree's This makes me believe that our older |
Yes, this is the case. The old code eventually ends up calling this pip function, where it will end up calling the "git" Looks like the question is if we should attempt to scan the external environment to get the VCS, or should the metadata file be the only place we look? If we are looking to do the former it may be better to just have the freeze requirement generation as it's own standalone library that we maintain. |
…for editables The initial pip dependency removal introduced several functional regressions and structural bugs that caused pipdeptree to diverge from pip freeze behavior. Editable VCS installs were showing -e /path instead of -e git+url@commit#egg=name, losing critical provenance information. The DirectUrl implementation had structural mismatches with PEP 610 and pip's reference implementation. Restructured DirectUrl to use single info field (Union[VcsInfo, ArchiveInfo, DirInfo]) matching pip's implementation, enforcing exactly-one semantics. Fixed ArchiveInfo to use hash field per PEP 610 spec. Added VCS detection module that probes filesystem for git repositories and extracts remote URL and commit hash, falling back to local path when VCS unavailable. Fixed egg-link fallback to only run when direct_url.json missing, preventing incorrect editable detection with stale .egg-link files. Added legacy version support using === operator for non-PEP-440 versions. Strengthened type validation for dir_info.editable and required VCS fields.
Refactored direct_url.json parsing to eliminate complexity violations and add comprehensive type validation matching pip's strict validation model. Extracted parse_direct_url_json into helper functions to reduce complexity: - _load_and_validate_json: JSON loading and structure validation - _validate_required_fields: Type validation for url/subdirectory - _parse_info_block: Info block selection and validation Enhanced VCS detection to handle repository subdirectories using git rev-parse --show-toplevel, allowing editable packages in subdirectories. Added URL normalization for SCP-style (git@host:path), git://, and local paths. Added subdirectory fragment support for monorepo packages. Fixed file:// URL conversion to match pip's implementation exactly, handling localhost URLs, UNC paths on Windows, and Windows drive letter corrections. Rejects non-local file URIs on non-Windows platforms. Added comprehensive type validation for all direct_url.json fields: - Validates url and subdirectory are strings - Validates all info blocks are dicts - Validates vcs_info fields (vcs, commit_id, requested_revision) are strings - Validates archive_info.hash matches algo=value pattern - Validates dir_info.editable is boolean Changed DirectUrlValidationError to inherit from ValueError instead of Exception, following Python conventions for value validation errors. All 84 parser tests pass with 95.63% coverage maintained.
Fixed git:// URLs being incorrectly prefixed with git+ (should be git://... not git+git://...). Now only adds git+ prefix when URL doesn't already start with git:, matching pip's behavior per versioncontrol.py:262. Fixed url_to_path double-decoding issue where unquote was called before url2pathname, causing file:///x%2520y to become x y instead of x%20y like pip. Removed unquote since url2pathname handles percent-decoding. Refactored _get_git_requirement to eliminate complexity violations by extracting helper functions: _get_repo_root, _get_remote_url, _get_commit_id, _build_vcs_requirement. Fixed hash validation to accept any algo=value format (including md5, uppercase) to match pip's lenient validation instead of strict regex. Added platform-specific pragma comments for Windows-only code paths (UNC paths, drive letter handling) that can't be tested on non-Windows platforms. Added PLR2004 to global ruff ignore since magic number checks are disabled project-wide. Added pytest-subprocess>=1.5 to test dependencies for declarative subprocess mocking. All 99 tests pass with 100% diff coverage.
Tests now only import and exercise public functions, treating private helper functions (prefixed with _) as implementation details. This improves test maintainability by reducing coupling to internal structure. Switched from manual mocking (Mock, monkeypatch) to declarative test helpers (pytest-subprocess for subprocess calls, pytest-mock for patches). This makes subprocess interaction tests more readable and eliminates the need for complex mock.side_effect chains. Reduced test count from 84 to 81 while maintaining coverage by combining related test cases into parametrized tests. Coverage remains at 97.44% with only Windows-specific code paths untested.
The pip-free parser diverged from pip in several areas that affected editable install representation and metadata fidelity. This brings full compatibility. Multi-VCS detection (hg/svn/bzr) joins the existing git backend, using pip's innermost-repo-root selection logic. Each backend extracts remote URL and revision via the same subprocess commands pip uses. VcsResult replaces the raw string return so callers can emit pip-compatible diagnostic comments for missing VCS, missing remotes, and invalid remote URIs. Subdirectory calculation now walks up from the install location looking for pyproject.toml/setup.py, matching pip's find_path_to_project_root_from_repo_root. Egg-link search now scans sys.path with safe-name normalization before falling back to site-packages, matching pip's egg_link_path_from_sys_path. ArchiveInfo gains a hashes dict and DirectUrl gains credential redaction for URL output.
- Redact token-only credentials (no colon) to ****@host - Always prefix git URLs with git+ (including git:// protocol) - Skip subdirectory fragment for svn and bzr backends - Convert hg/bzr local-path remotes to file:// URLs - Return COMMAND_NOT_FOUND instead of NO_REMOTE for missing VCS binaries
Resolve relative-path remotes (e.g. ../repo.git) against repo_root in _normalize_git_url, matching pip's behavior of treating any existing path as local.
Pip's DirectUrl._remove_auth_from_netloc preserves git@ only when
vcs_info.vcs == "git", strips all other auth entirely (no ****@),
and preserves ${VAR} patterns. Align our implementation by passing
info to _redact_url.
Use re.VERBOSE with named groups for all regex patterns. Tighten
env-var detection to match pip's ENV_VAR_RE (only ${VAR} or
${VAR}:${VAR}), preventing credential leak in mixed patterns
like ${TOKEN}:pass@.
Reorganize _vcs.py (449 lines) into a package with one module per VCS backend, improving navigability. Tests restructured to mirror the src layout with shared helpers in conftest.py.
- Make cross-module imports public with __all__ re-exports - Compile regex as module-level Final[re.Pattern[str]] - Support hg shared repos (.hg as file) via dir_only param - Format tests with setup/run/assert blank line separation - Rename test helpers and test names to reference public API
|
@kemzeb this is now actually ready 👍 |
uv's direct_url.json has commit_id as Optional<String>. Handle gracefully instead of rejecting the entire direct_url.json.
|
Good stuff, I'll be able to dive deep into this on Thursday |
pipdeptree depended on
pip._internalAPIs that are explicitly not part of pip's public interface. Every pip release risked breakage — pip 21.1.1 already required a compatibility fix (#150), and the project had to isolate all pip usage into a single module just to contain the blast radius (#402). 🔧 The dependency also blocked pipdeptree from working in pip-less environments (#335) and added ~50MB to installation size.This replaces all
pip._internalimports with a new_parserpackage built entirely on stdlib (importlib.metadata,json,urllib). It implements PEP 610direct_url.jsonparsing with full validation, editable install detection for both modern PEP 660 and legacy.egg-linkfiles, and VCS requirement string generation for git, hg, svn, and bzr. ✨ The VCS module is split into per-backend files with shared utilities, matching pip's output format for freeze-style requirement strings including credential redaction, SCP-style git URL normalization, and subdirectory fragments.The
pip>=25.2dependency is removed frompyproject.tomland_freeze.pyis deleted entirely. pipdeptree now works independently of pip's release cycle and in any Python environment withimportlib.metadataavailable. All existing output formats are preserved.