Skip to content

The "standard" installation use pyproject.toml in UV rather than dynamic dependencies via build hooks (comparing to PIP) #2130

@potiuk

Description

@potiuk

When you install packages using remote url and specify extras, the --editable version of extras are used, rather than the dependencies used in wheel. While I don't think it's very well specified which dependencies should be used

uv pip install 'apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io, \
docker,elasticsearch,ftp,google,google-auth,graphviz,grpc,hashicorp,http,ldap,microsoft-azure, \
mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake, \
ssh,statsd,uv,virtualenv] @ https://github.com/apache/airflow/archive/main.tar.gz'
Here is the output of `uv pip install` output

Resolved 359 packages in 2.94s

Downloaded 154 packages in 2.44s
Installed 257 packages in 700ms

  • adal==1.2.7
  • adlfs==2024.2.0
  • aiobotocore==2.12.0
  • aiofiles==23.2.1
  • aiohttp==3.9.3
  • aioitertools==0.11.0
  • aiosignal==1.3.1
  • amqp==5.2.0
  • annotated-types==0.6.0
  • apispec==6.5.0
  • asn1crypto==1.5.1
  • async-timeout==4.0.3
  • asyncssh==2.14.2
  • authlib==1.3.0
  • aws-sam-translator==1.85.0
  • aws-xray-sdk==2.12.1
  • azure-batch==14.1.0
  • azure-common==1.1.28
  • azure-core==1.30.1
  • azure-cosmos==4.5.1
  • azure-datalake-store==0.0.53
  • azure-identity==1.15.0
  • azure-keyvault-secrets==4.8.0
  • azure-kusto-data==4.3.1
  • azure-mgmt-containerinstance==10.1.0
  • azure-mgmt-containerregistry==10.3.0
  • azure-mgmt-core==1.4.0
  • azure-mgmt-cosmosdb==9.4.0
  • azure-mgmt-datafactory==5.0.0
  • azure-mgmt-datalake-nspkg==3.0.1
  • azure-mgmt-datalake-store==0.5.0
  • azure-mgmt-nspkg==3.0.2
  • azure-mgmt-resource==23.0.1
  • azure-mgmt-storage==21.1.0
  • azure-nspkg==3.0.2
  • azure-servicebus==7.11.4
  • azure-storage-blob==12.19.0
  • azure-storage-file-datalake==12.14.0
  • azure-storage-file-share==12.15.0
  • azure-synapse-artifacts==0.18.0
  • azure-synapse-spark==0.7.0
  • babel==2.14.0
  • bcrypt==4.1.2
  • beautifulsoup4==4.12.3
  • billiard==4.2.0
  • boto3==1.34.51
  • botocore==1.34.51
  • cachetools==5.3.3
  • cattrs==23.2.3
  • celery==5.3.6
  • cfn-lint==0.85.3
  • chardet==5.2.0
  • click-didyoumean==0.3.0
  • click-plugins==1.1.1
  • click-repl==0.3.0
  • colorama==0.4.6
  • cryptography==42.0.5
  • cryptography==41.0.7
  • db-dtypes==1.2.0
  • decorator==5.1.1
  • distlib==0.3.8
  • dnspython==2.6.1
  • docker==7.0.0
  • elastic-transport==8.12.0
  • elasticsearch==8.12.1
  • email-validator==1.3.1
  • eventlet==0.35.2
  • filelock==3.13.1
  • flask-appbuilder==4.3.11
  • flask-babel==2.0.0
  • flask-jwt-extended==4.6.0
  • flask-limiter==3.5.1
  • flask-login==0.6.3
  • flask-sqlalchemy==2.5.1
  • flower==2.0.1
  • frozenlist==1.4.1
  • gcloud-aio-auth==4.2.3
  • gcloud-aio-bigquery==7.1.0
  • gcloud-aio-storage==9.2.0
  • gcsfs==2024.2.0
  • gevent==24.2.1
  • google-ads==23.1.0
  • google-analytics-admin==0.22.6
  • google-api-core==2.17.1
  • google-api-python-client==2.120.0
  • google-auth==2.28.1
  • google-auth-httplib2==0.2.0
  • google-auth-oauthlib==1.2.0
  • google-cloud-aiplatform==1.43.0
  • google-cloud-appengine-logging==1.4.2
  • google-cloud-audit-log==0.2.5
  • google-cloud-automl==2.13.2
  • google-cloud-batch==0.17.12
  • google-cloud-bigquery==3.17.2
  • google-cloud-bigquery-datatransfer==3.15.0
  • google-cloud-bigquery-storage==2.24.0
  • google-cloud-bigtable==2.23.0
  • google-cloud-build==3.23.2
  • google-cloud-compute==1.17.0
  • google-cloud-container==2.41.0
  • google-cloud-core==2.4.1
  • google-cloud-datacatalog==3.18.2
  • google-cloud-dataflow-client==0.8.9
  • google-cloud-dataform==0.5.8
  • google-cloud-dataplex==1.12.2
  • google-cloud-dataproc==5.9.2
  • google-cloud-dataproc-metastore==1.15.2
  • google-cloud-dlp==3.15.2
  • google-cloud-kms==2.21.2
  • google-cloud-language==2.13.2
  • google-cloud-logging==3.9.0
  • google-cloud-memcache==1.9.2
  • google-cloud-monitoring==2.19.2
  • google-cloud-orchestration-airflow==1.12.0
  • google-cloud-os-login==2.14.2
  • google-cloud-pubsub==2.19.7
  • google-cloud-redis==2.15.2
  • google-cloud-resource-manager==1.12.2
  • google-cloud-run==0.10.4
  • google-cloud-secret-manager==2.18.2
  • google-cloud-spanner==3.42.0
  • google-cloud-speech==2.25.0
  • google-cloud-storage==2.14.0
  • google-cloud-storage-transfer==1.11.2
  • google-cloud-tasks==2.16.2
  • google-cloud-texttospeech==2.16.2
  • google-cloud-translate==3.15.2
  • google-cloud-videointelligence==2.13.2
  • google-cloud-vision==3.7.1
  • google-cloud-workflows==1.14.2
  • google-crc32c==1.5.0
  • google-resumable-media==2.7.0
  • graphql-core==3.2.3
  • graphviz==0.20.1
  • greenlet==3.0.3
  • grpc-google-iam-v1==0.13.0
  • grpc-interceptor==0.15.4
  • grpcio-gcp==0.2.2
  • grpcio-status==1.62.0
  • httplib2==0.22.0
  • humanize==4.9.0
  • hvac==2.1.0
  • ijson==3.2.3
  • isodate==0.6.1
  • jmespath==1.0.1
  • joserfc==0.9.0
  • jschema-to-python==1.2.3
  • json-merge-patch==0.2
  • jsondiff==2.0.0
  • jsonpatch==1.33
  • jsonpath-ng==1.6.1
  • jsonpickle==3.0.3
  • jsonpointer==2.4
  • jsonschema-path==0.3.2
  • junit-xml==1.9
  • kombu==5.3.5
  • kubernetes==29.0.0
  • kubernetes-asyncio==29.0.0
  • ldap3==2.9.1
  • limits==3.9.0
  • looker-sdk==24.2.0
  • lxml==5.1.0
  • marshmallow-sqlalchemy==0.26.1
  • more-itertools==10.2.0
  • moto==5.0.2
  • mpmath==1.3.0
  • msal==1.27.0
  • msal-extensions==1.1.0
  • msrest==0.7.1
  • msrestazure==0.6.4
  • multidict==6.0.5
  • mypy-boto3-appflow==1.34.0
  • mypy-boto3-rds==1.34.50
  • mypy-boto3-redshift-data==1.34.0
  • mypy-boto3-s3==1.34.14
  • mysql-connector-python==8.3.0
  • mysqlclient==2.2.4
  • networkx==3.1
  • numpy==1.24.4
  • oauthlib==3.2.2
  • openapi-schema-validator==0.6.2
  • openapi-spec-validator==0.7.1
  • openlineage-integration-common==1.9.1
  • openlineage-python==1.9.1
  • openlineage-sql==1.9.1
  • ordered-set==4.1.0
  • pandas==2.0.3
  • pandas-gbq==0.21.0
  • paramiko==3.4.0
  • pathable==0.4.3
  • pbr==6.0.0
  • platformdirs==3.11.0
  • ply==3.11
  • portalocker==2.8.2
  • prison==0.2.1
  • prometheus-client==0.20.0
  • prompt-toolkit==3.0.43
  • proto-plus==1.23.0
  • psycopg2-binary==2.9.9
  • py-partiql-parser==0.5.1
  • pyarrow==15.0.0
  • pyasn1==0.5.1
  • pyasn1-modules==0.3.0
  • pyathena==3.3.0
  • pydantic==2.6.3
  • pydantic-core==2.16.3
  • pydata-google-auth==1.8.2
  • pynacl==1.5.0
  • pyodbc==5.1.0
  • pyopenssl==24.0.0
  • pyparsing==3.1.1
  • pyspnego==0.10.2
  • python-dotenv==1.0.1
  • python-http-client==3.3.7
  • python-ldap==3.4.4
  • pywinrm==0.4.3
  • redis==4.6.0
  • redshift-connector==2.1.0
  • referencing==0.33.0
  • referencing==0.31.1
  • regex==2023.12.25
  • requests-ntlm==1.2.0
  • requests-oauthlib==1.3.1
  • requests-toolbelt==1.0.0
  • responses==0.25.0
  • rsa==4.9
  • s3fs==2024.2.0
  • s3transfer==0.10.0
  • sarif-om==1.0.4
  • scramp==1.4.4
  • sendgrid==6.11.0
  • shapely==2.0.3
  • slack-sdk==3.27.1
  • snowflake-connector-python==3.7.1
  • snowflake-sqlalchemy==1.5.1
  • sortedcontainers==2.4.0
  • soupsieve==2.5
  • sqlalchemy-bigquery==1.10.0
  • sqlalchemy-redshift==0.8.14
  • sqlalchemy-spanner==1.6.2
  • sqlalchemy-utils==0.41.1
  • sqlparse==0.4.4
  • sshtunnel==0.4.0
  • starkbank-ecdsa==2.2.0
  • statsd==4.0.1
  • sympy==1.12
  • tomlkit==0.12.4
  • tornado==6.4
  • uritemplate==4.1.1
  • urllib3==2.2.1
  • urllib3==1.26.18
  • vine==5.1.0
  • virtualenv==20.25.1
  • watchtower==3.0.1
  • wcwidth==0.2.13
  • websocket-client==1.7.0
  • xmltodict==0.13.0
  • yarl==1.9.4
  • zope-event==5.0
  • zope-interface==6.2

Compare it with the equivalent pip result:

uv pip install 'apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io, \
docker,elasticsearch,ftp,google,google-auth,graphviz,grpc,hashicorp,http,ldap,microsoft-azure, \
mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake, \
ssh,statsd,uv,virtualenv] @ https://github.com/apache/airflow/archive/main.tar.gz'
Result of `pip install`

Installing collected packages: wcwidth, unicodecsv, text-unidecode, statsd, starkbank-ecdsa, sortedcontainers, pytz, ply, lockfile, json-merge-patch, ijson, distlib, cron-descriptor, colorlog, azure-nspkg, azure-common, asn1crypto, zope.interface, zope.event, zipp, wrapt, websocket-client, vine, urllib3, uritemplate, uc-micro-py, tzdata, typing-extensions, tornado, tomlkit, termcolor, tenacity, tabulate, sqlparse, sqlalchemy, soupsieve, sniffio, slack_sdk, six, setproctitle, scramp, rpds-py, PyYAML, python-slugify, python-http-client, python-dotenv, pyparsing, pyodbc, pyjwt, pygments, pycparser, pyasn1, psycopg2-binary, psutil, protobuf, prompt-toolkit, prometheus-client, portalocker, pluggy, platformdirs, pkgutil-resolve-name, pathspec, packaging, ordered-set, opentelemetry-semantic-conventions, openlineage-sql, oauthlib, numpy, mysqlclient, mysql-connector-python, multidict, more-itertools, mdurl, markupsafe, lxml, lazy-object-proxy, jsonpath_ng, jmespath, itsdangerous, inflection, idna, humanize, h11, grpcio, greenlet, graphviz, google-re2, google-crc32c, fsspec, frozenlist, filelock, exceptiongroup, docutils, dnspython, dill, decorator, configupdater, colorama, click, charset-normalizer, chardet, certifi, cachetools, cachelib, blinker, billiard, bcrypt, backports.zoneinfo, backoff, Babel, azure-mgmt-nspkg, attrs, async-timeout, argcomplete, aiofiles, yarl, wtforms, werkzeug, virtualenv, universal-pathlib, sqlalchemy-utils, sqlalchemy_redshift, sqlalchemy-jsonfield, shapely, sendgrid, rsa, rfc3339-validator, requests, referencing, redis, python-dateutil, python-daemon, pyasn1-modules, pyarrow, proto-plus, prison, opentelemetry-proto, marshmallow, markdown-it-py, Mako, linkify-it-py, ldap3, jinja2, isodate, importlib-resources, importlib-metadata, httplib2, httpcore, gunicorn, grpcio-gcp, grpc-interceptor, googleapis-common-protos, google-resumable-media, gevent, eventlet, email-validator, elastic-transport, deprecated, clickclick, click-repl, click-plugins, click-didyoumean, cffi, cattrs, beautifulsoup4, azure-mgmt-datalake-nspkg, asgiref, apispec, anyio, amqp, aiosignal, aioitertools, time-machine, rich, requests_toolbelt, requests-oauthlib, python-nvd3, python-ldap, pynacl, pandas, opentelemetry-exporter-otlp-proto-common, opentelemetry-api, openlineage-python, mdit-py-plugins, marshmallow-sqlalchemy, marshmallow-oneofschema, looker-sdk, limits, kombu, jsonschema-specifications, hvac, httpx, grpcio-status, google-cloud-audit-log, google-auth, flask, elasticsearch, docker, cryptography, croniter, botocore, azure-core, alembic, aiohttp, s3transfer, rich-argparse, PyOpenSSL, pendulum, paramiko, opentelemetry-sdk, openlineage-integration-common, msrest, kubernetes_asyncio, kubernetes, jsonschema, grpc-google-iam-v1, google-auth-oauthlib, google-auth-httplib2, google-api-core, gcloud-aio-auth, flask-wtf, Flask-SQLAlchemy, flask-session, flask-login, Flask-Limiter, Flask-JWT-Extended, flask-caching, Flask-Babel, db-dtypes, celery, azure-storage-file-share, azure-storage-blob, azure-servicebus, azure-mgmt-core, azure-keyvault-secrets, azure-cosmos, authlib, asyncssh, aiobotocore, adal, sshtunnel, snowflake-connector-python, pydata-google-auth, opentelemetry-exporter-otlp-proto-http, opentelemetry-exporter-otlp-proto-grpc, msrestazure, msal, google-cloud-core, google-api-python-client, google-ads, gcloud-aio-storage, gcloud-aio-bigquery, flower, flask-appbuilder, connexion, boto3, azure-synapse-spark, azure-synapse-artifacts, azure-storage-file-datalake, azure-mgmt-storage, azure-mgmt-resource, azure-mgmt-datafactory, azure-mgmt-cosmosdb, azure-mgmt-containerregistry, azure-mgmt-containerinstance, watchtower, snowflake-sqlalchemy, redshift_connector, PyAthena, opentelemetry-exporter-otlp, msal-extensions, google-cloud-workflows, google-cloud-vision, google-cloud-videointelligence, google-cloud-translate, google-cloud-texttospeech, google-cloud-tasks, google-cloud-storage-transfer, google-cloud-storage, google-cloud-speech, google-cloud-spanner, google-cloud-secret-manager, google-cloud-run, google-cloud-resource-manager, google-cloud-redis, google-cloud-pubsub, google-cloud-os-login, google-cloud-orchestration-airflow, google-cloud-monitoring, google-cloud-memcache, google-cloud-language, google-cloud-kms, google-cloud-dlp, google-cloud-dataproc-metastore, google-cloud-dataproc, google-cloud-dataplex, google-cloud-dataform, google-cloud-dataflow-client, google-cloud-datacatalog, google-cloud-container, google-cloud-compute, google-cloud-build, google-cloud-bigtable, google-cloud-bigquery-storage, google-cloud-bigquery-datatransfer, google-cloud-bigquery, google-cloud-batch, google-cloud-automl, google-cloud-appengine-logging, google-analytics-admin, azure-mgmt-datalake-store, azure-datalake-store, azure-batch, sqlalchemy-spanner, sqlalchemy-bigquery, pandas-gbq, google-cloud-logging, google-cloud-aiplatform, gcsfs, azure-identity, azure-kusto-data, adlfs, apache-airflow-providers-smtp, apache-airflow-providers-imap, apache-airflow-providers-http, apache-airflow-providers-ftp, apache-airflow-providers-fab, apache-airflow-providers-common-sql, apache-airflow-providers-common-io, apache-airflow-providers-sqlite, apache-airflow-providers-ssh, apache-airflow-providers-snowflake, apache-airflow-providers-slack, apache-airflow-providers-sftp, apache-airflow-providers-sendgrid, apache-airflow-providers-redis, apache-airflow-providers-postgres, apache-airflow-providers-openlineage, apache-airflow-providers-odbc, apache-airflow-providers-mysql, apache-airflow-providers-microsoft-azure, apache-airflow-providers-hashicorp, apache-airflow-providers-grpc, apache-airflow-providers-google, apache-airflow-providers-elasticsearch, apache-airflow-providers-docker, apache-airflow-providers-cncf-kubernetes, apache-airflow-providers-celery, apache-airflow-providers-amazon

Note - all the apache-airflow-providers-* packages missing in case of uv pip install.

The problem is likely that the installation uses directly pyproject.toml to install dependencies, however for such remote installation (and without --editable install at that - but even if it would be specified, --editable makes no sense for remote install) the dependencies should be the same as in packaged .whl file and it makes the installation of uv in this case non-compliant with PEP 517.

A bit more context: Airlfow uses hatchling build backend, and utilzes PEP 517 compliant build_hooks (https://peps.python.org/pep-0517/#build-wheel) to modify the --editable extras into wheel extras on the flight. So for example [celery] requirement in pyproject.toml ( https://github.com/apache/airflow/blob/main/pyproject.toml#L641) is this:

celery = [ # source: airflow/providers/celery/provider.yaml
  "celery>=5.3.0,<6,!=5.3.3,!=5.3.2",
  "flower>=1.0.0",
  "google-re2>=1.0",
]

However the hatchling build hook of ours, when preparing wheel package, replaces this extra with:

"apache-airflow-providers-celery"

This is the way how we are dealing with our monorepo where --editable "extra" just installs dependencies of our providers, while the "wheel" extra install actual provider (and transitively dependencies of that provider).

I believe that PEP-517 compliant way of installing a package from remote URL should actually build the wheel file first using the build backend the project has defined in pyproject.toml and only then install such a wheel file (this is exactly what pip does under the hood when installing package from remote url - treating it the same way as installind an sdist package (which the remote URL is equivalent of).

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcompatibilityCompatibility with a specification or another tool

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions