Split providers out of the main "airflow/" tree into a UV workspace project#42505
Split providers out of the main "airflow/" tree into a UV workspace project#42505ashb merged 1 commit intoapache:mainfrom
Conversation
7171386 to
9331a5a
Compare
|
Oh, prepare provider packages is failing for prod image build. I guess that's going to have to be in this PR then |
056127e to
2e952e5
Compare
|
Smalll comment (I am returning back from holidays) and slowly going through the two week's history.
I tried various things before (including There are two problems with using uv cache:
UV cache benefits from the fact that the same cache is potentially reused by multiple projects you have installed on the same machine. The UV cache is stored in HOME dir and reused between all venvs used in the same machine - so in docker it's nearly useless because it multiplies the space used for both - installed and cached files but cached files are not reusable in any way for other venvs (because we have only one venv in docker container). And since layers in docker image cannot be removed, even if we delete the cache after final installation step - the cache will remain in the docker image layer additionally to installed packages - increasing significantly size of the final image. Of course - you could potentially use locally mounted cache volumes in docker to keep the UV cache on, in order to not pollute docke image, but that would only be useful after you build your image locally once and you won't be able to use remotely cached layers which are currently used when you build the image. Using remote cache in the way we do it now, is much nicer, because the cache is actually the same as most installed packages (with the exception of the new/ updated packages change after the original cached layer is created. This has the nice property that every time new patchlevel base image is released by python (every two-three weeks or so) the cache is rebuit from scratch, and we get a new "fresh" main cache with latest package versions.
Whatever that command will produce as cache will be invalidated the moment the source dependency files (pyproject.toml, hatch_build.py, provider.yaml files, generated/provider_dependencies.json) - so if you want to effectively use cache, you need to build it BEFORE you copy any of those files into the new image. This is precisely what "our" caching is doing - it does not use any "local file" and does not copy them to the image, but uses "https://github" URL to install the dependencies for the first time - which means that next time that layer will not be invalidated when pyproject.toml or hatch_build.py or any other files that decide about dependencies will change. Now - I'd love to make it simpler, but so far all the attempts to find a better solution failed because the one we have is the only one that has good "caching" properties. |
|
But maybe you can come up with a better approach that I have not thought about and tried already, of course :) |
|
Yeah, that comment was mostly a note to myself, and not something I plan on even looking at as part of this work. |
ecf9f37 to
7cb016b
Compare
d7e5864 to
ddc7402
Compare
ddc7402 to
9cd5774
Compare
|
I think I'm getting closer now 🤞🏻 . Oddly enough a lot of the time the er diagram static check is failing in CI , even though I've run Edit: this turned out to be a (purposefully) untracked file in the migrations folder in my local checkout. |
0956d95 to
36ab177
Compare
Follow-up after #42505 fixing teething problem with tests_common. Originally in #42505 common test code was moved to "dev" folder, but the "dev" folder is really dedicated to "build" scripts and the problem with moving "tests_common" to the folder was that the whole "dev" folder is replaced (for non-committer PRs) with the content from the target branch. This is done for security reasons, because we can accidentally use any of the scripts from dev in the CI build scripts and we might not notice, which will open us to a security issue where a file in "dev" coming from PR could be accidentally executed during the "pull_request_target" workflow - which would expose our secrets and GitHub Package write permissions to a contributor coming from a fork. This change moves the files, fixes pre-commit specification and docs, also fixes a number of "doc" issues detected by "ruff" in the tests_common folder as they were detected after the move. The tests_common folder is added to folders mounted when breeze is executed with local folders mounted (in order to avoid accidental mounting of randomly generated files to inside the breeze container). All imports for the common tests were updated to reflect this move.
This is a no-op change right now, but as part of the provider re-org in apache#42505 this sets us up to be able to load the providers code in the tests The reason this change is done separately is that changes to breeze code form forks doesn't take effect, and this small change makes it easier to land on main without having to re-create that large PR.
…roject (apache#42505) This is only a partial split so far. It moves all the code and tests, but leaves the creation of `core/` to a separate PR as this is already large enough. In addition to the straight file rename the other changes I had to make here are: - Some mypy/typing fixes. Mypy can be fragile about what it picks up when, so maybe some of those changes were caused by that. But the typing changes aren't large. - Improve typing in common.sql type stub Again, likely a mypy file oddity, but the types should be safe - Removed the `check-providers-init-file-missing` check This isn't needed now that airflow/providers shouldn't exist at all in the main tree. - Create a "dev.tests_common" package that contains helper files and common pytest fixtures Since the provider tests are no longer under tests/ they don't automatically share the fixtures from the parent `tests/conftest.py` so they needed extracted. Ditto for `tests.test_utils` -- they can't be easily imported in provider tests anymore, so they are moved to a more explicit shared location. In future we should switch how the CI image is built to make better use of UV caching than our own approach as that would remvoe a lot of custom code.
…roject (apache#42505) (apache#42624) This is only a partial split so far. It moves all the code and tests, but leaves the creation of `core/` to a separate PR as this is already large enough. In addition to the straight file rename the other changes I had to make here are: - Some mypy/typing fixes. Mypy can be fragile about what it picks up when, so maybe some of those changes were caused by that. But the typing changes aren't large. - Improve typing in common.sql type stub Again, likely a mypy file oddity, but the types should be safe - Removed the `check-providers-init-file-missing` check This isn't needed now that airflow/providers shouldn't exist at all in the main tree. - Create a "dev.tests_common" package that contains helper files and common pytest fixtures Since the provider tests are no longer under tests/ they don't automatically share the fixtures from the parent `tests/conftest.py` so they needed extracted. Ditto for `tests.test_utils` -- they can't be easily imported in provider tests anymore, so they are moved to a more explicit shared location. In future we should switch how the CI image is built to make better use of UV caching than our own approach as that would remvoe a lot of custom code. Co-authored-by: Ash Berlin-Taylor <ash@apache.org> Co-authored-by: Ryan Hatter <25823361+RNHTTR@users.noreply.github.com>
Follow-up after #42505 fixing teething problem with tests_common. Originally in #42505 common test code was moved to "dev" folder, but the "dev" folder is really dedicated to "build" scripts and the problem with moving "tests_common" to the folder was that the whole "dev" folder is replaced (for non-committer PRs) with the content from the target branch. This is done for security reasons, because we can accidentally use any of the scripts from dev in the CI build scripts and we might not notice, which will open us to a security issue where a file in "dev" coming from PR could be accidentally executed during the "pull_request_target" workflow - which would expose our secrets and GitHub Package write permissions to a contributor coming from a fork. This change moves the files, fixes pre-commit specification and docs, also fixes a number of "doc" issues detected by "ruff" in the tests_common folder as they were detected after the move. The tests_common folder is added to folders mounted when breeze is executed with local folders mounted (in order to avoid accidental mounting of randomly generated files to inside the breeze container). All imports for the common tests were updated to reflect this move.
* Move tests common without changes * Fix docstrings in tests_common * Move tests_common from "dev" to top-level. Follow-up after #42505 fixing teething problem with tests_common. Originally in #42505 common test code was moved to "dev" folder, but the "dev" folder is really dedicated to "build" scripts and the problem with moving "tests_common" to the folder was that the whole "dev" folder is replaced (for non-committer PRs) with the content from the target branch. This is done for security reasons, because we can accidentally use any of the scripts from dev in the CI build scripts and we might not notice, which will open us to a security issue where a file in "dev" coming from PR could be accidentally executed during the "pull_request_target" workflow - which would expose our secrets and GitHub Package write permissions to a contributor coming from a fork. This change moves the files, fixes pre-commit specification and docs, also fixes a number of "doc" issues detected by "ruff" in the tests_common folder as they were detected after the move. The tests_common folder is added to folders mounted when breeze is executed with local folders mounted (in order to avoid accidental mounting of randomly generated files to inside the breeze container). All imports for the common tests were updated to reflect this move.
…roject (apache#42505) This is only a partial split so far. It moves all the code and tests, but leaves the creation of `core/` to a separate PR as this is already large enough. In addition to the straight file rename the other changes I had to make here are: - Some mypy/typing fixes. Mypy can be fragile about what it picks up when, so maybe some of those changes were caused by that. But the typing changes aren't large. - Improve typing in common.sql type stub Again, likely a mypy file oddity, but the types should be safe - Removed the `check-providers-init-file-missing` check This isn't needed now that airflow/providers shouldn't exist at all in the main tree. - Create a "dev.tests_common" package that contains helper files and common pytest fixtures Since the provider tests are no longer under tests/ they don't automatically share the fixtures from the parent `tests/conftest.py` so they needed extracted. Ditto for `tests.test_utils` -- they can't be easily imported in provider tests anymore, so they are moved to a more explicit shared location. In future we should switch how the CI image is built to make better use of UV caching than our own approach as that would remvoe a lot of custom code.
|
#protm |
This was never supposed to work. the parameters that you can pass to "testing tests" is really "extra pytest paramaters" you can pass - and it's not really inteded to use "pytest files" there. That's why for example the extra parameters are not autocomplete'able with files/paths. The "tests" command is really meant to run the whole "test type" and you can easily use it now. For example running all airbyte tests can be achieved by: breeze testing tests --test-type "Providers[airbyte]"And it nicely works now. |
|
(so I think indeed all known "teething" problems" are solved - and i can try to attempt to split out individual providers). |
closes #42857
As discussed https://lists.apache.org/thread/dyv5jhvt65xs6l5o2byc2b67f4wlwf6r, this is the first part of the new layout.
This is only a partial split so far. It moves all the code and tests, but the provider release generation code hasn't moved yet as it's a few weeks until the next batch, and I wan't to try and keep this already huge PR as small as possible. Likewise the creation of
core/will be a follow on PRIn addition to the straight file rename the other changes I had to make here are:
Some mypy/typing fixes.
Mypy can be fragile about what it picks up when, so maybe some of those
changes were caused by that. But the typing changes aren't large.
Improve typing in common.sql type stub
Again, likely a mypy file oddity, but the types should be safe
Removed the
check-providers-init-file-missingcheckThis isn't needed now that airflow/providers shouldn't exist at all in the
main tree.
Create a "dev.tests_common" package that contains helper files and common
pytest fixtures
Since the provider tests are no longer under tests/ they don't automatically
share the fixtures from the parent
tests/conftest.pyso they neededextracted.
Ditto for
tests.test_utils-- they can't be easily imported in providertests anymore, so they are moved to a more explicit shared location.
In future we should switch how the CI image is built to make better use of UV caching than our own approach as that would remvoe a lot of custom code.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.