Skip to content
This repository was archived by the owner on Mar 6, 2026. It is now read-only.
This repository was archived by the owner on Mar 6, 2026. It is now read-only.

combination of freezegun and parallel tests causes unit tests to flake #2264

@tswast

Description

@tswast

Problem

Sometimes our retry-related unit tests flake. See: https://github.com/googleapis/python-bigquery/actions/runs/17044794898/job/48317652621?pr=2250

FAILED tests/unit/test_client.py::TestClient::test__call_api_applying_custom_retry_on_timeout - google.api_core.exceptions.RetryError: Timeout of 1.0s exceeded, last exception:
__________ TestClient.test__call_api_applying_custom_retry_on_timeout __________
[gw1] linux -- Python 3.11.13 /home/runner/work/python-bigquery/python-bigquery/.nox/unit-3-11/bin/python

target = functools.partial(functools.partial(<MagicMock name='api_request' id='140254749447632'>, foo='bar'))
predicate = <function TestClient.test__call_api_applying_custom_retry_on_timeout.<locals>.<lambda> at 0x7f8f9c037060>
sleep_generator = <generator object exponential_sleep_generator at 0x7f8f9bd9af20>
timeout = 1, on_error = None
exception_factory = <function build_retry_error at 0x7f8fbb85d300>, kwargs = {}
deadline = 471.796770038, error_list = [TimeoutError()]
sleep_iter = <generator object exponential_sleep_generator at 0x7f8f9bd9af20>

    def retry_target(
        target: Callable[[], _R],
        predicate: Callable[[Exception], bool],
        sleep_generator: Iterable[float],
        timeout: float | None = None,
        on_error: Callable[[Exception], None] | None = None,
        exception_factory: Callable[
            [list[Exception], RetryFailureReason, float | None],
            tuple[Exception, Exception | None],
        ] = build_retry_error,
        **kwargs,
    ):
        """Call a function and retry if it fails.
    
        This is the lowest-level retry helper. Generally, you'll use the
        higher-level retry helper :class:`Retry`.
    
        Args:
            target(Callable): The function to call and retry. This must be a
                nullary function - apply arguments with `functools.partial`.
            predicate (Callable[Exception]): A callable used to determine if an
                exception raised by the target should be considered retryable.
                It should return True to retry or False otherwise.
            sleep_generator (Iterable[float]): An infinite iterator that determines
                how long to sleep between retries.
            timeout (Optional[float]): How long to keep retrying the target.
                Note: timeout is only checked before initiating a retry, so the target may
                run past the timeout value as long as it is healthy.
            on_error (Optional[Callable[Exception]]): If given, the on_error
                callback will be called with each retryable exception raised by the
                target. Any error raised by this function will *not* be caught.
            exception_factory: A function that is called when the retryable reaches
                a terminal failure state, used to construct an exception to be raised.
                It takes a list of all exceptions encountered, a retry.RetryFailureReason
                enum indicating the failure cause, and the original timeout value
                as arguments. It should return a tuple of the exception to be raised,
                along with the cause exception if any. The default implementation will raise
                a RetryError on timeout, or the last exception encountered otherwise.
            deadline (float): DEPRECATED: use ``timeout`` instead. For backward
                compatibility, if specified it will override ``timeout`` parameter.
    
        Returns:
            Any: the return value of the target function.
    
        Raises:
            ValueError: If the sleep generator stops yielding values.
            Exception: a custom exception specified by the exception_factory if provided.
                If no exception_factory is provided:
                    google.api_core.RetryError: If the timeout is exceeded while retrying.
                    Exception: If the target raises an error that isn't retryable.
        """
    
        timeout = kwargs.get("deadline", timeout)
    
        deadline = time.monotonic() + timeout if timeout is not None else None
        error_list: list[Exception] = []
        sleep_iter = iter(sleep_generator)
    
        # continue trying until an attempt completes, or a terminal exception is raised in _retry_error_helper
        # TODO: support max_attempts argument: https://github.com/googleapis/python-api-core/issues/535
        while True:
            try:
>               result = target()
                         ^^^^^^^^

.nox/unit-3-11/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py:147: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/unittest/mock.py:1124: in __call__
    return self._mock_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/unittest/mock.py:1128: in _mock_call
    return self._execute_mock_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <MagicMock name='api_request' id='140254749447632'>, args = ()
kwargs = {'foo': 'bar'}, effect = <list_iterator object at 0x7f8f9a845750>
result = <class 'TimeoutError'>

    def _execute_mock_call(self, /, *args, **kwargs):
        # separate from _increment_mock_call so that awaited functions are
        # executed separately from their call, also AsyncMock overrides this method
    
        effect = self.side_effect
        if effect is not None:
            if _is_exception(effect):
                raise effect
            elif not _callable(effect):
                result = next(effect)
                if _is_exception(result):
>                   raise result
E                   TimeoutError

/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/unittest/mock.py:1187: TimeoutError

The above exception was the direct cause of the following exception:

self = <tests.unit.test_client.TestClient testMethod=test__call_api_applying_custom_retry_on_timeout>

    def test__call_api_applying_custom_retry_on_timeout(self):
        from concurrent.futures import TimeoutError
        from google.cloud.bigquery.retry import DEFAULT_RETRY
    
        creds = _make_credentials()
        client = self._make_one(project=self.PROJECT, credentials=creds)
    
        api_request_patcher = mock.patch.object(
            client._connection,
            "api_request",
            side_effect=[TimeoutError, "result"],
        )
        retry = DEFAULT_RETRY.with_deadline(1).with_predicate(
            lambda exc: isinstance(exc, TimeoutError)
        )
    
        with api_request_patcher as fake_api_request:
>           result = client._call_api(retry, foo="bar")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

tests/unit/test_client.py:333: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
google/cloud/bigquery/client.py:863: in _call_api
    return call()
           ^^^^^^
.nox/unit-3-11/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py:294: in retry_wrapped_func
    return retry_target(
.nox/unit-3-11/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py:156: in retry_target
    next_sleep = _retry_error_helper(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

exc = TimeoutError(), deadline = 471.796770038
sleep_iterator = <generator object exponential_sleep_generator at 0x7f8f9bd9af20>
error_list = [TimeoutError()]
predicate_fn = <function TestClient.test__call_api_applying_custom_retry_on_timeout.<locals>.<lambda> at 0x7f8f9c037060>
on_error_fn = None
exc_factory_fn = <function build_retry_error at 0x7f8fbb85d300>
original_timeout = 1

    def _retry_error_helper(
        exc: Exception,
        deadline: float | None,
        sleep_iterator: Iterator[float],
        error_list: list[Exception],
        predicate_fn: Callable[[Exception], bool],
        on_error_fn: Callable[[Exception], None] | None,
        exc_factory_fn: Callable[
            [list[Exception], RetryFailureReason, float | None],
            tuple[Exception, Exception | None],
        ],
        original_timeout: float | None,
    ) -> float:
        """
        Shared logic for handling an error for all retry implementations
    
        - Raises an error on timeout or non-retryable error
        - Calls on_error_fn if provided
        - Logs the error
    
        Args:
           - exc: the exception that was raised
           - deadline: the deadline for the retry, calculated as a diff from time.monotonic()
           - sleep_iterator: iterator to draw the next backoff value from
           - error_list: the list of exceptions that have been raised so far
           - predicate_fn: takes `exc` and returns true if the operation should be retried
           - on_error_fn: callback to execute when a retryable error occurs
           - exc_factory_fn: callback used to build the exception to be raised on terminal failure
           - original_timeout_val: the original timeout value for the retry (in seconds),
               to be passed to the exception factory for building an error message
        Returns:
            - the sleep value chosen before the next attempt
        """
        error_list.append(exc)
        if not predicate_fn(exc):
            final_exc, source_exc = exc_factory_fn(
                error_list,
                RetryFailureReason.NON_RETRYABLE_ERROR,
                original_timeout,
            )
            raise final_exc from source_exc
        if on_error_fn is not None:
            on_error_fn(exc)
        # next_sleep is fetched after the on_error callback, to allow clients
        # to update sleep_iterator values dynamically in response to errors
        try:
            next_sleep = next(sleep_iterator)
        except StopIteration:
            raise ValueError("Sleep generator stopped yielding sleep values.") from exc
        if deadline is not None and time.monotonic() + next_sleep > deadline:
            final_exc, source_exc = exc_factory_fn(
                error_list,
                RetryFailureReason.TIMEOUT,
                original_timeout,
            )
>           raise final_exc from source_exc
E           google.api_core.exceptions.RetryError: Timeout of 1.0s exceeded, last exception:

.nox/unit-3-11/lib/python3.11/site-packages/google/api_core/retry/retry_base.py:229: RetryError

Background

We run unit tests in parallel:

"-n=8",

We use freezegun to mock out times, especially in retry tests:

https://github.com/search?q=repo%3Agoogleapis%2Fpython-bigquery%20freezegun&type=code

Proposed solution

From https://betterstack.com/community/guides/testing/time-machine-vs-freezegun/ it seems time-machine might be a better option.

Alternative

Wherever we use freezegun, protect those tests with a lock. See: spulec/freezegun#503 (comment)

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.type: processA process-related concern. May include testing, release, or the like.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions