Skip to content

Conversation

@msb217
Copy link
Contributor

@msb217 msb217 commented Feb 18, 2020


Issue link: AIRFLOW-6817

Moved airflow/__init__.py imports to respective sub-packages with the exception of the settings.initialize() and integrate_plugins()

All breeze unit and integration tests are passing locally.

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added area:CLI area:Scheduler including HA (high availability) scheduler provider:Apache provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues labels Feb 18, 2020
@msb217 msb217 changed the title [AIRFLOW 6817] - Move airflow __init__.py imports to sub-packages [AIRFLOW-6817] - Move airflow __init__.py imports to sub-packages Feb 18, 2020
@msb217 msb217 closed this Feb 19, 2020
@msb217 msb217 reopened this Feb 19, 2020
@msb217 msb217 force-pushed the feature/AIRFLOW-6817 branch from 0f1e626 to c6f60c8 Compare February 19, 2020 19:29
@msb217 msb217 changed the title [AIRFLOW-6817] - Move airflow __init__.py imports to sub-packages [AIRFLOW-6817] [WIP] - Move airflow __init__.py imports to sub-packages Feb 19, 2020
@msb217 msb217 force-pushed the feature/AIRFLOW-6817 branch 2 times, most recently from 4e22c14 to 18c9202 Compare February 20, 2020 16:59
@codecov-io
Copy link

codecov-io commented Feb 20, 2020

Codecov Report

Merging #7456 into master will decrease coverage by 0.17%.
The diff coverage is 99%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7456      +/-   ##
==========================================
- Coverage   86.79%   86.62%   -0.18%     
==========================================
  Files         887      887              
  Lines       41976    41991      +15     
==========================================
- Hits        36432    36373      -59     
- Misses       5544     5618      +74
Impacted Files Coverage Δ
airflow/__init__.py 100% <ø> (ø) ⬆️
airflow/executors/dask_executor.py 5.88% <0%> (ø) ⬆️
airflow/providers/imap/hooks/imap.py 94.95% <100%> (+0.04%) ⬆️
airflow/models/baseoperator.py 96.52% <100%> (ø) ⬆️
airflow/providers/dingding/hooks/dingding.py 66.66% <100%> (ø) ⬆️
...ow/providers/docker/example_dags/example_docker.py 100% <100%> (ø) ⬆️
.../providers/google/cloud/hooks/kubernetes_engine.py 95.94% <100%> (+0.05%) ⬆️
airflow/providers/opsgenie/hooks/opsgenie_alert.py 100% <100%> (ø) ⬆️
airflow/executors/local_executor.py 92.48% <100%> (ø) ⬆️
.../cloud/operators/cloud_storage_transfer_service.py 95.63% <100%> (ø) ⬆️
... and 77 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e0e2f0...0306223. Read the comment docs.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! Sorry for slight delay with reviewing
I think you will need to rebase (and there are a few changes coming from @nuclearpinguin with small refactorings which are probably going to be merged faster.

For sure you need to add a note about UPDATING.md. This is a potentially breaking change as a number of DAGs will have to be adjusted so we need a small table describing which imports were removed and how they should be changed.

The process we follow is that in UPDATING.md we describe how to do it manually but closer to the date we will add some migration assistant which will do those modifications automatically for those who would like to do it automatically. But this is a separate step and for now we need a short description on "how to modify your DAGs after this change".

@msb217 msb217 force-pushed the feature/AIRFLOW-6817 branch 2 times, most recently from 5e80c08 to 34c65a8 Compare February 21, 2020 16:52
@msb217
Copy link
Contributor Author

msb217 commented Feb 21, 2020

@potiuk Great to hear you like it. I have provided an Updating.md memo regarding these changes and will squash/rebase my commits before the merge.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

But I would love other committers to take a look. It's a huge one and super important and I might have missed something. @ashb @kaxil @nuclearpinguin @feluelle @BasPH @mik-laj ?

@potiuk
Copy link
Member

potiuk commented Feb 21, 2020

Plus you will need to rebase it @msb217 !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we import DAG explicitly?

Copy link
Contributor Author

@msb217 msb217 Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that would be good for consistency.

Would it be worth considering a similar change to airflow/models/__init__.py? Removing the implicit imports of models?

The implicit imports of models are used quite frequently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky, because we are relying on it in "resetdb" case. The models when imported register themselves in sqlalchemy and resetdb uses that information to know which tables to delete. However we already have some exceptions to that - some models are moved out of models package init.py and have to be added manually to resetdb.

I think what would make more sense is to have a separate models/all_models.py which would import all models and this one can be used by resetdb,

But I think we should make a separate PR for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what would make more sense is to have a separate models/all_models.py which would import all models and this one can be used by resetdb,

I am happy to do that as I am pylinting core code :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we import DAG directly from airflow.models.dag?

@turbaszek
Copy link
Member

I love this change! <3

@msb217 msb217 force-pushed the feature/AIRFLOW-6817 branch from e5b541a to ae33730 Compare February 21, 2020 19:03
@msb217 msb217 changed the title [AIRFLOW-6817] [WIP] - Move airflow __init__.py imports to sub-packages [AIRFLOW-6817] - Move airflow __init__.py imports to sub-packages Feb 21, 2020
@msb217 msb217 force-pushed the feature/AIRFLOW-6817 branch 3 times, most recently from 7ca8779 to 53d2a10 Compare February 21, 2020 19:24
…plicit imports with explicit imports, added entry to `UPDATING.MD` - squashed/rebased
@msb217 msb217 force-pushed the feature/AIRFLOW-6817 branch from 53d2a10 to 0306223 Compare February 21, 2020 21:50
@turbaszek turbaszek requested review from ashb and mik-laj February 21, 2020 23:15
@potiuk potiuk merged commit 4d03e33 into apache:master Feb 22, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 22, 2020

Awesome work, congrats on your first merged pull request!

@potiuk
Copy link
Member

potiuk commented Feb 22, 2020

Thanks @msb217 ! First commit and so important for us !

@msb217
Copy link
Contributor Author

msb217 commented Feb 22, 2020

Thanks @potiuk , I am happy to contribute and hope that I can help with more issues in the future!

@ashb
Copy link
Member

ashb commented Feb 22, 2020

This is going to need a change to the vast majority of dag in existence - our own docs http://airflow.apache.org/docs/stable/tutorial.html say from airflow import DAG .

Is there any clever tricks we can do to have that import still work but only import models etc when first accessed?

@BasPH
Copy link
Contributor

BasPH commented Feb 22, 2020

I like the change but also with @ashb here, it's going to break a lot of workflows which import from airflow import DAG. We should at least show deprecation warnings.

@ashb I don't understand your question? I know of the magic __import__ function for changing import behaviour, but don't know if it can be applied to your question?

@ashb
Copy link
Member

ashb commented Feb 22, 2020

For python 3 we can possibly do https://www.python.org/dev/peps/pep-0562/ do have a airflow.DAG still work, but issue a depreciation warning. I'll take a look at a warning on Monday

@potiuk
Copy link
Member

potiuk commented Feb 22, 2020

One thing to consider.

I would like to see if we can think slightly differently here.

This is a change that is super easy to add to automated "migration assistant" tool that we will have to do anyway. I think we will have far more issues to deal with in terms of backwards compatibility (for example bash_operator and python_operator have moved).

I think personally that with the 2.0 migration we should simply make people aware that they have to migrate using migration tool (or manually). Otherwise they will have a false sense of having "compatible" dags. I think telling people they will be able to move their dags to 2.0 and they will continue to work as usual, is a bit miscommunication (or even a bit of a lie - we know it's not going to work in a number of cases).

Migration to 2.0 will be anyway pain for many people but they can deal with one pain at at time - i.e. DAG migration first, then migrate to 2.0 Airflow infrastructure.

We can make it easier for people to migrate to 2.0 - if they will have the new "back-ported" providers packages installed - the migration to the "both 1.10 and 2.0" compatible dags might be done automatically even BEFORE migration to 2.0 and our tool should be able to tell how "ready" you are to migrate to 2.0. For now - I believe none of the changes we've done prevents us from having Dags that are both 1.0 and 2.0 compatible (and if we find one - we can still easily fix it in master to keep that possibility).

The "migration assistant" can produce both - 1.10 and 2.0 compatible DAGs (for example imports changed to "airflow.models.DAG" etc. This should be very easy with tools like Bowler : https://pybowler.io/ - we used it successfully to refactor tons of python code (basically all providers code!) and we can easily produce dag-migration assistant that can be used by people.

@potiuk
Copy link
Member

potiuk commented Feb 22, 2020

I think we have the easiest migration path for our users to follow:

[1.10] py2 (those still on py2) -> 
     [1.10] py3 -> 
         [1.10] py3 + providers (incremental) -> 
            [1.10] py3 + providers + 2.0-compatible DAGs (incremental) -> 
                [2.0] 

This way while migrating the users will have to deal with one problem at a time and the most difficult part (adapting DAGs) can be mostly automated and can be done incrementally while they are still running 1.10. The fact that this is incremental and that they do not have to migrate everything at once is really appealing for many users I think.

@ashb
Copy link
Member

ashb commented Feb 22, 2020

I've thought about this a bit more, and I'm coming at this from a different point of view than simple migration - that of user facing API and what we should expose to them.

Also the easier we make the migration the more likely it is to not leave users behind. So if they can run old dags with new Airflow so much the better. Lets take a lesson from Django here, they've thought long and hard about how to do migrations and upgrades. In summary:

If code works in one version without warnings, it MUST work in the next version but can issue deprecation warnings, and can only be removed in the release after that.

Also to me, from airflow import DAG is nicer -- why should the DAG authors have to care about the exact internal implementation of where we've chosen to move this class to now?. Airflow and DAGS go hand in hand -- with PEP-562 we can have lazily imported classes but still present a nice interface to users. And plus we can then re-arrange things later under the hood without having to affect our users. After all, what is the harm if the importing everything is solved?

My proposal here:

  • Add in a _getattr_ based lazy import for DAG to airflow/init.py module
  • Do NOT issue a deprecation warning for this
  • Revert the change to all the imports in example dags etc so that those do from airflow import DAG.

@ashb
Copy link
Member

ashb commented Feb 22, 2020

I'm raising the point about what should our API be on the mailing list.

Plus this PR didn't (yet) have much of an impact on import times - #7505 :)

@potiuk
Copy link
Member

potiuk commented Feb 22, 2020

Not yet indeed :).

Let's talk at the devlist indeed. I understand where you are coming from, but I think it's not good idea for the users in the future to switch to that.

While I understand (and concur) we might want to add PEP-562 behavior for backwards compatibility, I don't think it should be default behavior for examples and future users:

  • explicit is better than implicit
  • IDE support and mypy will go crazy (I believe - I would like to check it when it's done)

I think a lot of our users are not super-hackers and they use IDEs to write the code. If the PEP-562 results with having no autocomplete/verification of proper import, then it's a no-go for me.

galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
…plicit imports with explicit imports, added entry to `UPDATING.MD` - squashed/rebased (apache#7456)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI area:Scheduler including HA (high availability) scheduler provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants