-
Notifications
You must be signed in to change notification settings - Fork 16.3k
[AIRFLOW-6883] Speed up import of airflow.utils.log.file_task_hanlder #7505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This module imports form airflow.models, which in turn means we impor all the other modules (which we can fix in a later issue) -- but there is still no need to import that module here as it is only used for type checking. So lets use the already existing guard built in to the typing module. On my machine this speeds up `import airflow` from 0.74s to 0.44s
| """File logging handler for tasks.""" | ||
| import logging | ||
| import os | ||
| from typing import Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be handled by moving stuff out of models/init.py rather by hacking stuff for TYPE_CHECKING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not both? This is what the TYPE_CHECKING was designed for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the #7484 for discussion/proposal of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if importing this one module was quick, it's still an extra import that is not needed at run time. That's a number of directory searches looking for the this one file, and a opening a file, parsing it etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really - it's a hack to hide architectural problems in legacy code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And the number of directory searches goes up if we use implicit namespaces.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know what the documentation says, it's just my view on what TYPE_CHECKING is often used for)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if importing this one module was quick, it's still an extra import that is not needed at run time. That's a number of directory searches looking for the this one file, and a opening a file, parsing it etc.
I believe removing init.py stuff is different (and much better) way of solving the long import time.
When we properly move stuff out of init.py's for models (and other places) no single import will be expensive because we will only import what is needed and nothing more. Yes some "types" will be imported needlessly but those will really be types that will be immediately loaded if anything from this function is used and they are probably already imported elsewhere.
By proper tree-like/lack-of-cycles/no implicit imports structure of the code we will achieve what we want to do now by manual TYPE_CHECKING removal of some stuf - avoid expensive imports where not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And the number of directory searches goes up if we use implicit namespaces.)
Yeah. I already fully abandoned this idea. I simply stopped being fan of implicit packages. We should not use them. There were many problems in our code that we would have to solve and gain nothing in return, behavior of some code changes when there are implicit packages and there are still many tools/packages (like sphinx but many others) that do not work well with implicit packages yet. It makes little sense to use implicit packages now that we can install providers packages without them.
|
Fixed in other ways |
This module imports form airflow.models, which in turn means we impor
all the other modules (which we can fix in a later issue) -- but there
is still no need to import that module here as it is only used for type
checking.
So lets use the already existing guard built in to the typing module.
On my machine this speeds up
import airflowfrom 0.74s to 0.44sIssue link: AIRFLOW-6883
Make sure to mark the boxes below before creating PR: [x]
[AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID** For document-only changes commit message can start with
[AIRFLOW-XXXX].In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.