Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented May 23, 2023

After #29482 triggerer logs are not printed directly to the stdout, but they are queued in memory and written by a separate thread. This causes a slight flakiness in case of CI tests for test_trigger_job - especially on MSSQL.

This simple workaround should give more time to the logs to propgate during the test


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

After apache#29482 triggerer logs are not printed directly to the
stdout, but they are queued in memory and written by a separate
thread. This causes a slight flakiness in case of CI tests
for test_trigger_job - especially on MSSQL.

This simple workaround should give more time to the logs to
propgate during the test
@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label May 23, 2023
@potiuk
Copy link
Member Author

potiuk commented May 23, 2023

Example flaky test:

https://github.com/apache/airflow/actions/runs/5054738339/jobs/9070519469?pr=31469

_____________________ test_trigger_logging_sensitive_info ______________________

session = <sqlalchemy.orm.session.Session object at 0x7f2cb04f6880>
capsys = <_pytest.capture.CaptureFixture object at 0x7f2cf9c99ca0>

    def test_trigger_logging_sensitive_info(session, capsys):
        """
        Checks that when a trigger fires, it doesn't log any sensitive
        information from arguments
        """
    
        class SensitiveArgOperator(BaseOperator):
            def __init__(self, password, **kwargs):
                self.password = password
                super().__init__(**kwargs)
    
        # Use a trigger that will immediately succeed
        trigger = SuccessTrigger()
        op = SensitiveArgOperator(task_id="sensitive_arg_task", ***"some_password")
        create_trigger_in_db(session, trigger, operator=op)
        triggerer_job = Job()
        triggerer_job_runner = TriggererJobRunner(triggerer_job)
        triggerer_job_runner.load_triggers()
        # Now, start TriggerRunner up (and set it as a daemon thread during tests)
        triggerer_job_runner.daemon = True
        triggerer_job_runner.trigger_runner.start()
        try:
            # Wait for up to 3 seconds for it to fire and appear in the event queue
            for _ in range(30):
                if triggerer_job_runner.trigger_runner.events:
                    assert list(triggerer_job_runner.trigger_runner.events) == [(1, TriggerEvent(True))]
                    break
                time.sleep(0.1)
            else:
                pytest.fail("TriggerRunner never sent the trigger event out")
        finally:
            # We always have to stop the runner
            triggerer_job_runner.trigger_runner.stop = True
        stdout = capsys.readouterr().out
>       assert "test_dag/test_run/sensitive_arg_task/-1/1 (ID 1) starting" in stdout
E       AssertionError: assert 'test_dag/test_run/sensitive_arg_task/-1/1 (ID 1) starting' in '[\x1b[34m2023-05-23T08:35:13.404+0000\x1b[0m] {\x1b[34mtriggerer_job_runner.py:\x1b[0m172} INFO\x1b[0m - Setting up T...g up logging queue listener with handlers [<RedirectStdHandler (NOTSET)>, <TriggererHandlerWrapper (NOTSET)>]\x1b[0m\n'

tests/jobs/test_triggerer_job.py:147: AssertionError

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to ensure the success of the test?

@potiuk
Copy link
Member Author

potiuk commented May 23, 2023

Is it enough to ensure the success of the test?

It's magic number. In case you have REALLY busy system when the writing thread gets starved, I guess it might still happen. We could likely implement some "complete" solution that would not have to wait for it, but it would take quite some time to figure out how to tap into the in-memory queue of the writing thread.

Adding 0.5 s will of course increase time of running for the tests, and it will also likely decrease the probability of such event to a REALLY LOW one. And it's one line.

¯\_(ツ)_/¯

@potiuk potiuk merged commit ba220b0 into apache:main May 23, 2023
@potiuk potiuk deleted the decrease-flakiness-of-trigger-log-test branch May 23, 2023 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants