Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented May 24, 2020


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@potiuk potiuk requested review from BasPH, ashb, kaxil and mik-laj May 24, 2020 08:09
@boring-cyborg boring-cyborg bot added provider:Apache provider:amazon AWS/Amazon - related issues labels May 24, 2020
@potiuk
Copy link
Member Author

potiuk commented May 24, 2020

@BasPH @kaxil @ashb as discussed -> I've added automated import for all classes in providers in 1.10. Few fixes were neeed but otherwise it looks good :)

@potiuk potiuk force-pushed the import_all_backport_classes branch from f030935 to 2812bb2 Compare May 24, 2020 08:17
@potiuk potiuk force-pushed the import_all_backport_classes branch 2 times, most recently from 8422b01 to 441b604 Compare May 24, 2020 15:18
@potiuk
Copy link
Member Author

potiuk commented May 24, 2020

@ashb @kaxil @BasPH @mik-laj

I think I nailed it with import tests for backport packages. I removed all the README changes from code changes so that it will be easier to review that one. I will regenerate the READMES again in the separate commit on top of that one.

I implemented all the refactorings that were needed in automated way using Bowler. I actually started to like Bowler :).

I got to the point however that I had to refactor it - we had one huge method that did everything - so I separated all the refactoring code out from setup_backport_packages and I converted it into a separate class (this way Query was easier to build as a field in the class rather than passing the query around. @turbaszek -> you might want to take a look and see if you are ok with that, but I think the way it is done now is much more readable - we have separate refactoring methods for separate classes.

Now - we also have two separate scripts for testing packages - one is to install them one-by-one for airflow 1.10 and see if they can be installed and the second one installs all of them, finds all the relevant classes and imports them (and fails if some of those imports fail). It's done rather neatly and for us It is I think a great tool to keep backport packages usable in the future - as those tests will be running continuously in CI with every PR. @BasPH -> it was a really good idea. I see some of the changes coming (like functional DAGs) might unknowingly destroy the backward compatibility of provider packages (similar to lineage changes for Papermill) - but with this test, it will be a lot harder to introduce such problems.

One last comment for Papermill - it's AUTO lineage feature caused import of examples to fail (yes - we are also checking the examples!), So I added info that AUTO lineage will not work in 1.10 (and I updated the example so that it behaves correctly in Airflow 1.10 as well) @bolkedebruin - you might want to take a look at this part to see if I did not introduce any problems.

@potiuk potiuk force-pushed the import_all_backport_classes branch 2 times, most recently from 571c445 to f2e885b Compare May 24, 2020 18:56
@potiuk potiuk force-pushed the import_all_backport_classes branch 4 times, most recently from 6b9debc to d25d854 Compare May 25, 2020 08:51
@potiuk potiuk force-pushed the import_all_backport_classes branch from d25d854 to 290d2eb Compare May 25, 2020 12:59
@potiuk potiuk force-pushed the import_all_backport_classes branch 2 times, most recently from 8a078a0 to 0fadc3e Compare May 25, 2020 15:26
@potiuk
Copy link
Member Author

potiuk commented May 25, 2020

Hey @ashb I think you were the last one with comments - I pushed the last fixup with all the resulting explanations and changes and I hope the build will succeed this time - I had only one datacatalog example change that was failing. Please take a look - I have two more changes that are follow - up from that one - the sooner we merge it, the earlier I can cut RC3.

The only unresolved comment is about using stdout instead of print - which I am not sure why. IMHO print in python3 is much better than stdout or stderr because it's now a function and it even has the file parameter where you can pass file=sys.stderr to it, and you do not have use all those unneeded "\n"s. I understand it is not a blocker, but I would still love to know what's the rationale for using stdout now that we switched to python 3 ?

@potiuk potiuk force-pushed the import_all_backport_classes branch from 0fadc3e to 71c7cf7 Compare May 25, 2020 17:02
@potiuk
Copy link
Member Author

potiuk commented May 25, 2020

All Green @ashb

@ashb
Copy link
Member

ashb commented May 25, 2020

Hey @ashb I think you were the last one with comments - I pushed the last fixup with all the resulting explanations and changes and I hope the build will succeed this time - I had only one datacatalog example change that was failing. Please take a look - I have two more changes that are follow - up from that one - the sooner we merge it, the earlier I can cut RC3.

The only unresolved comment is about using stdout instead of print - which I am not sure why. IMHO print in python3 is much better than stdout or stderr because it's now a function and it even has the file parameter where you can pass file=sys.stderr to it, and you do not have use all those unneeded "\n"s. I understand it is not a blocker, but I would still love to know what's the rationale for using stdout now that we switched to python 3 ?

I I mean we should do print(..., file=sys.stdout) for anything that is an "error". Not a blocker

@kaxil
Copy link
Member

kaxil commented May 25, 2020

Hey @ashb I think you were the last one with comments - I pushed the last fixup with all the resulting explanations and changes and I hope the build will succeed this time - I had only one datacatalog example change that was failing. Please take a look - I have two more changes that are follow - up from that one - the sooner we merge it, the earlier I can cut RC3.
The only unresolved comment is about using stdout instead of print - which I am not sure why. IMHO print in python3 is much better than stdout or stderr because it's now a function and it even has the file parameter where you can pass file=sys.stderr to it, and you do not have use all those unneeded "\n"s. I understand it is not a blocker, but I would still love to know what's the rationale for using stdout now that we switched to python 3 ?

I I mean we should do print(..., file=sys.stdout) for anything that is an "error". Not a blocker

Based on https://docs.python.org/3/library/functions.html#print , the default value of file argument is sys.stdout.

The file argument must be an object with a write(string) method; if it is not present or None, sys.stdout will be used.

@ashb
Copy link
Member

ashb commented May 25, 2020

Hey @ashb I think you were the last one with comments - I pushed the last fixup with all the resulting explanations and changes and I hope the build will succeed this time - I had only one datacatalog example change that was failing. Please take a look - I have two more changes that are follow - up from that one - the sooner we merge it, the earlier I can cut RC3.
The only unresolved comment is about using stdout instead of print - which I am not sure why. IMHO print in python3 is much better than stdout or stderr because it's now a function and it even has the file parameter where you can pass file=sys.stderr to it, and you do not have use all those unneeded "\n"s. I understand it is not a blocker, but I would still love to know what's the rationale for using stdout now that we switched to python 3 ?

I I mean we should do print(..., file=sys.stdout) for anything that is an "error". Not a blocker

Based on https://docs.python.org/3/library/functions.html#print , the default value of file argument is sys.stdout.

The file argument must be an object with a write(string) method; if it is not present or None, sys.stdout will be used.

I am saying, we are printing errors to stdout, which is "wrong". Errors should be printed on stderr.

@kaxil
Copy link
Member

kaxil commented May 25, 2020

I mean we should do print(..., file=sys.stdout) for anything that is an "error"

I haven't followed the PR just read the line :D "I mean we should do print(..., file=sys.stdout) for anything that is an "error", hence my comment ;)

@ashb
Copy link
Member

ashb commented May 25, 2020

@potiuk Argh, sorry, my previous comment said stdout when I meant stderr.

@potiuk potiuk force-pushed the import_all_backport_classes branch from 3a4b2cd to 8010554 Compare May 25, 2020 22:37
@potiuk potiuk force-pushed the import_all_backport_classes branch from 8010554 to 22ce507 Compare May 25, 2020 22:39
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good Jarek!

@potiuk potiuk merged commit cdb3f25 into apache:master May 26, 2020
@potiuk potiuk deleted the import_all_backport_classes branch May 26, 2020 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants