Skip to content

xDS interop: Improve retry logic and logging for the k8s retry operations#30607

Merged
sergiitk merged 9 commits intogrpc:masterfrom
sergiitk:xds-interop-k8s-pod-wait
Aug 23, 2022
Merged

xDS interop: Improve retry logic and logging for the k8s retry operations#30607
sergiitk merged 9 commits intogrpc:masterfrom
sergiitk:xds-interop-k8s-pod-wait

Conversation

@sergiitk
Copy link
Copy Markdown
Member

@sergiitk sergiitk commented Aug 16, 2022

  • Changes the order of waiting for pods to start: wait for the pods first, then for the deployment to transition to active. This should provide more useful information in the logs, showing exactly why the pod didn't start, instead of generic "Replicas not available" ref b/200293121. This also needed for xDS interop: collect pod logs #30594

  • Add support for check_result callback in the retryer helpers

  • Completely replaces retrying with tenacity, ref b/200293121. Retrying is not longer maintained.

  • Improves the readability of timeout errors: now they contain the timeout (or the attempt number) exceeded, and information why the timeout failed (exception/check function):
    Before:

    tenacity.RetryError: RetryError[<Future at 0x7f8ce156bc18 state=finished returned dict>]

    After:

    framework.helpers.retryers.RetryError: Retry error calling framework.infrastructure.k8s.KubernetesNamespace.get_pod: timeout 0:01:00 exceeded. Check result callback returned False.

  • Improves the readability of the k8s wait operation errors: now the log includes colorized and formatted status of the k8s object being watched, instead of dumping the full k8s object. For example, here's how an error caused by using incorrect TD bootstrap image:
    Screen Shot 2022-08-22 at 3 12 15 PM
    Screen Shot 2022-08-22 at 3 13 44 PM

Should also fix b/217861014

@sergiitk sergiitk force-pushed the xds-interop-k8s-pod-wait branch 2 times, most recently from 5e222c2 to 9c0a7b5 Compare August 17, 2022 21:26
@sergiitk sergiitk force-pushed the xds-interop-k8s-pod-wait branch from 9c0a7b5 to cc736aa Compare August 19, 2022 21:23
@sergiitk sergiitk added release notes: no Indicates if PR should not be in release notes area/psm interop labels Aug 20, 2022
@sergiitk
Copy link
Copy Markdown
Member Author

@sergiitk sergiitk changed the title xDS interop: wait for pods to start, then for the deployment xDS interop: Improve retry logic and logging for the k8s retry operations Aug 22, 2022
@sergiitk sergiitk requested a review from gnossen August 22, 2022 19:38
@sergiitk sergiitk marked this pull request as ready for review August 22, 2022 19:40
@sergiitk
Copy link
Copy Markdown
Member Author

@sergiitk
Copy link
Copy Markdown
Member Author

@sergiitk sergiitk merged commit 5abe970 into grpc:master Aug 23, 2022
@sergiitk sergiitk deleted the xds-interop-k8s-pod-wait branch August 23, 2022 00:24
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Aug 23, 2022
sergiitk added a commit to sergiitk/grpc that referenced this pull request Dec 8, 2023
…ions (grpc#30607)

- Changes the order of waiting for pods to start: wait for the pods first, then for the deployment to transition to active. This should provide more useful information in the logs, showing exactly why the pod didn't start, instead of generic "Replicas not available" ref b/200293121. This also needed for grpc#30594
- Add support for `check_result` callback in the retryer helpers
- Completely replaces `retrying` with `tenacity`, ref b/200293121. Retrying is not longer maintained.
- Improves the readability of timeout errors: now they contain the timeout (or the attempt number) exceeded, and information why the timeout failed (exception/check function):
  Before:  
  > `tenacity.RetryError: RetryError[<Future at 0x7f8ce156bc18 state=finished returned dict>]`
  
  After:
  > `framework.helpers.retryers.RetryError: Retry error calling framework.infrastructure.k8s.KubernetesNamespace.get_pod: timeout 0:01:00 exceeded. Check result callback returned False.`
- Improves the readability of the k8s wait operation errors: now the log includes colorized and formatted status of the k8s object being watched, instead of dumping the full k8s object. For example, here's how an error caused by using incorrect TD bootstrap image:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/psm interop imported Specifies if the PR has been imported to the internal repository release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants