Skip to content

Append filename and lineno for logs#54433

Closed
yaming-github wants to merge 1 commit into
apache:mainfrom
yaming-github:log_format
Closed

Append filename and lineno for logs#54433
yaming-github wants to merge 1 commit into
apache:mainfrom
yaming-github:log_format

Conversation

@yaming-github

@yaming-github yaming-github commented Aug 13, 2025

Copy link
Copy Markdown
Contributor

related: #54145

After using structlog, we didn't log the callsite info. In this pr, we added a processor to collect filename and lineno callsite info.

image

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@yaming-github

Copy link
Copy Markdown
Contributor Author

Hi @ashb I could verify that the both test failure is because of the format change of event section in json. Do you think it's reasonable to directly ovewrite event here?

@ashb ashb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking this up, but this approach is not right.

This is combining capturing and presentation into the capture phase which is not right. This needs to be re-worked to a) not be a performance hit, and b) To correctly take advantage of having structured logs.

In short: log_format should be a presentation time that the UI deals with, not something that actually affects how the JSON logs are written.

Comment thread airflow-core/src/airflow/config_templates/config.yml Outdated
Comment thread providers/amazon/tests/unit/amazon/aws/log/test_cloudwatch_task_handler.py Outdated
Comment thread task-sdk/src/airflow/sdk/execution_time/supervisor.py Outdated
Comment thread task-sdk/src/airflow/sdk/log.py Outdated
Comment thread task-sdk/src/airflow/sdk/log.py Outdated
Comment thread task-sdk/tests/task_sdk/execution_time/test_supervisor.py Outdated
@ashb

ashb commented Aug 14, 2025

Copy link
Copy Markdown
Member

In short -- I'm tempted to say that log_format should actually be removed and retconned as a breaking change in 3.0.

@yaming-github yaming-github changed the title Respect log_format for supervisor logs Append filename and lineno for logs Aug 15, 2025
@yaming-github

Copy link
Copy Markdown
Contributor Author

Thank you for picking this up, but this approach is not right.

This is combining capturing and presentation into the capture phase which is not right. This needs to be re-worked to a) not be a performance hit, and b) To correctly take advantage of having structured logs.

In short: log_format should be a presentation time that the UI deals with, not something that actually affects how the JSON logs are written.

Hi @ashb thanks so much for taking a look at this! Instead of replying all comments, I'll reply you here:

  1. Totally agree on all your comments. I think I make the logic over complicated here and didn't consider the performance. We should not relate this change with log_format. That's actually why I need to overwrite LOG_FORMAT in tests to pass the tests. I've removed all of them.
  2. I changed the pr to apply only a simple processor to add filename and lineno.
  3. This updated change only apply for std log and warnings but not stdout and structlog itself. I think for stdout it's fine because even though we added filename, they all come from 1 specific place which is supervisor.py. For structlog, if end user uses structlog in their dag, we also didn't log the callsite here. However, that's doable, please let me know if you think we should also consider that.
    Thanks!

@yaming-github yaming-github requested a review from ashb August 15, 2025 05:30
Comment thread task-sdk/src/airflow/sdk/log.py Outdated
@yaming-github

Copy link
Copy Markdown
Contributor Author

Hi @ashb I updated this pr. Could you please take a look again? Thanks!

Comment on lines +122 to +127
def add_callsite_parameter(logger: Any, method_name: Any, event_dict: EventDict) -> EventDict:
record = event_dict.get("_record", None)
if record:
event_dict["filename"] = record.__dict__["pathname"]
event_dict["lineno"] = record.__dict__["lineno"]
return event_dict

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only for for log messages that come from stdlib logging.

Instead we should use https://www.structlog.org/en/stable/api.html#structlog.processors.CallsiteParameterAdder which will handle that case, and the native structlog ones for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants