xDS interop: Improve retry logic and logging for the k8s retry operations#30607
Merged
sergiitk merged 9 commits intogrpc:masterfrom Aug 23, 2022
Merged
xDS interop: Improve retry logic and logging for the k8s retry operations#30607sergiitk merged 9 commits intogrpc:masterfrom
sergiitk merged 9 commits intogrpc:masterfrom
Conversation
5e222c2 to
9c0a7b5
Compare
9c0a7b5 to
cc736aa
Compare
Member
Author
|
Rerunning the jobs against the last commit: |
gnossen
approved these changes
Aug 22, 2022
Member
Author
|
Rerunning with the check_result exception handling: |
sergiitk
added a commit
to sergiitk/grpc
that referenced
this pull request
Dec 8, 2023
…ions (grpc#30607) - Changes the order of waiting for pods to start: wait for the pods first, then for the deployment to transition to active. This should provide more useful information in the logs, showing exactly why the pod didn't start, instead of generic "Replicas not available" ref b/200293121. This also needed for grpc#30594 - Add support for `check_result` callback in the retryer helpers - Completely replaces `retrying` with `tenacity`, ref b/200293121. Retrying is not longer maintained. - Improves the readability of timeout errors: now they contain the timeout (or the attempt number) exceeded, and information why the timeout failed (exception/check function): Before: > `tenacity.RetryError: RetryError[<Future at 0x7f8ce156bc18 state=finished returned dict>]` After: > `framework.helpers.retryers.RetryError: Retry error calling framework.infrastructure.k8s.KubernetesNamespace.get_pod: timeout 0:01:00 exceeded. Check result callback returned False.` - Improves the readability of the k8s wait operation errors: now the log includes colorized and formatted status of the k8s object being watched, instead of dumping the full k8s object. For example, here's how an error caused by using incorrect TD bootstrap image:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes the order of waiting for pods to start: wait for the pods first, then for the deployment to transition to active. This should provide more useful information in the logs, showing exactly why the pod didn't start, instead of generic "Replicas not available" ref b/200293121. This also needed for xDS interop: collect pod logs #30594
Add support for
check_resultcallback in the retryer helpersCompletely replaces
retryingwithtenacity, ref b/200293121. Retrying is not longer maintained.Improves the readability of timeout errors: now they contain the timeout (or the attempt number) exceeded, and information why the timeout failed (exception/check function):
Before:
After:
Improves the readability of the k8s wait operation errors: now the log includes colorized and formatted status of the k8s object being watched, instead of dumping the full k8s object. For example, here's how an error caused by using incorrect TD bootstrap image:


Should also fix b/217861014