-
Notifications
You must be signed in to change notification settings - Fork 16.3k
[AIRFLOW-6817] - Move airflow __init__.py imports to sub-packages #7456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0f1e626 to
c6f60c8
Compare
4e22c14 to
18c9202
Compare
Codecov Report
@@ Coverage Diff @@
## master #7456 +/- ##
==========================================
- Coverage 86.79% 86.62% -0.18%
==========================================
Files 887 887
Lines 41976 41991 +15
==========================================
- Hits 36432 36373 -59
- Misses 5544 5618 +74
Continue to review full report at Codecov.
|
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! Sorry for slight delay with reviewing
I think you will need to rebase (and there are a few changes coming from @nuclearpinguin with small refactorings which are probably going to be merged faster.
For sure you need to add a note about UPDATING.md. This is a potentially breaking change as a number of DAGs will have to be adjusted so we need a small table describing which imports were removed and how they should be changed.
The process we follow is that in UPDATING.md we describe how to do it manually but closer to the date we will add some migration assistant which will do those modifications automatically for those who would like to do it automatically. But this is a separate step and for now we need a short description on "how to modify your DAGs after this change".
5e80c08 to
34c65a8
Compare
|
@potiuk Great to hear you like it. I have provided an |
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Plus you will need to rebase it @msb217 ! |
tests/jobs/test_local_task_job.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we import DAG explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think that would be good for consistency.
Would it be worth considering a similar change to airflow/models/__init__.py? Removing the implicit imports of models?
The implicit imports of models are used quite frequently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky, because we are relying on it in "resetdb" case. The models when imported register themselves in sqlalchemy and resetdb uses that information to know which tables to delete. However we already have some exceptions to that - some models are moved out of models package init.py and have to be added manually to resetdb.
I think what would make more sense is to have a separate models/all_models.py which would import all models and this one can be used by resetdb,
But I think we should make a separate PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an issue for that https://issues.apache.org/jira/browse/AIRFLOW-6870
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what would make more sense is to have a separate models/all_models.py which would import all models and this one can be used by resetdb,
I am happy to do that as I am pylinting core code :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we import DAG directly from airflow.models.dag?
|
I love this change! <3 |
e5b541a to
ae33730
Compare
7ca8779 to
53d2a10
Compare
…plicit imports with explicit imports, added entry to `UPDATING.MD` - squashed/rebased
53d2a10 to
0306223
Compare
|
Awesome work, congrats on your first merged pull request! |
|
Thanks @msb217 ! First commit and so important for us ! |
|
Thanks @potiuk , I am happy to contribute and hope that I can help with more issues in the future! |
|
This is going to need a change to the vast majority of dag in existence - our own docs http://airflow.apache.org/docs/stable/tutorial.html say Is there any clever tricks we can do to have that import still work but only import models etc when first accessed? |
|
I like the change but also with @ashb here, it's going to break a lot of workflows which import @ashb I don't understand your question? I know of the magic |
|
For python 3 we can possibly do https://www.python.org/dev/peps/pep-0562/ do have a airflow.DAG still work, but issue a depreciation warning. I'll take a look at a warning on Monday |
|
One thing to consider. I would like to see if we can think slightly differently here. This is a change that is super easy to add to automated "migration assistant" tool that we will have to do anyway. I think we will have far more issues to deal with in terms of backwards compatibility (for example bash_operator and python_operator have moved). I think personally that with the 2.0 migration we should simply make people aware that they have to migrate using migration tool (or manually). Otherwise they will have a false sense of having "compatible" dags. I think telling people they will be able to move their dags to 2.0 and they will continue to work as usual, is a bit miscommunication (or even a bit of a lie - we know it's not going to work in a number of cases). Migration to 2.0 will be anyway pain for many people but they can deal with one pain at at time - i.e. DAG migration first, then migrate to 2.0 Airflow infrastructure. We can make it easier for people to migrate to 2.0 - if they will have the new "back-ported" providers packages installed - the migration to the "both 1.10 and 2.0" compatible dags might be done automatically even BEFORE migration to 2.0 and our tool should be able to tell how "ready" you are to migrate to 2.0. For now - I believe none of the changes we've done prevents us from having Dags that are both 1.0 and 2.0 compatible (and if we find one - we can still easily fix it in master to keep that possibility). The "migration assistant" can produce both - 1.10 and 2.0 compatible DAGs (for example imports changed to "airflow.models.DAG" etc. This should be very easy with tools like Bowler : https://pybowler.io/ - we used it successfully to refactor tons of python code (basically all providers code!) and we can easily produce dag-migration assistant that can be used by people. |
|
I think we have the easiest migration path for our users to follow: This way while migrating the users will have to deal with one problem at a time and the most difficult part (adapting DAGs) can be mostly automated and can be done incrementally while they are still running 1.10. The fact that this is incremental and that they do not have to migrate everything at once is really appealing for many users I think. |
|
I've thought about this a bit more, and I'm coming at this from a different point of view than simple migration - that of user facing API and what we should expose to them. Also the easier we make the migration the more likely it is to not leave users behind. So if they can run old dags with new Airflow so much the better. Lets take a lesson from Django here, they've thought long and hard about how to do migrations and upgrades. In summary: If code works in one version without warnings, it MUST work in the next version but can issue deprecation warnings, and can only be removed in the release after that. Also to me, My proposal here:
|
|
I'm raising the point about what should our API be on the mailing list. Plus this PR didn't (yet) have much of an impact on import times - #7505 :) |
|
Not yet indeed :). Let's talk at the devlist indeed. I understand where you are coming from, but I think it's not good idea for the users in the future to switch to that. While I understand (and concur) we might want to add PEP-562 behavior for backwards compatibility, I don't think it should be default behavior for examples and future users:
I think a lot of our users are not super-hackers and they use IDEs to write the code. If the PEP-562 results with having no autocomplete/verification of proper import, then it's a no-go for me. |
…plicit imports with explicit imports, added entry to `UPDATING.MD` - squashed/rebased (apache#7456)
Issue link: AIRFLOW-6817
Moved
airflow/__init__.pyimports to respective sub-packages with the exception of thesettings.initialize()andintegrate_plugins()All breeze unit and integration tests are passing locally.
Make sure to mark the boxes below before creating PR: [x]
[AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID** For document-only changes commit message can start with
[AIRFLOW-XXXX].In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.