xDS interop: collect pod logs by sergiitk · Pull Request #30594 · grpc/grpc

sergiitk · 2022-08-16T16:39:30Z

Added support for pod log collection. To enable, set --collect_app_logs flag, and specify --log_dir.
Added support and helpers for operating on the --log_dir (natively provided by absl)
Added support for --follow to bin/run_test_server.py and bin/run_test_client.py to follow pod logs printed to stdout
Moved PortForwarder from k8s.py to its own file

The collection itself will be enabled per-suite in #30735.

…ions (#30607) - Changes the order of waiting for pods to start: wait for the pods first, then for the deployment to transition to active. This should provide more useful information in the logs, showing exactly why the pod didn't start, instead of generic "Replicas not available" ref b/200293121. This also needed for #30594 - Add support for `check_result` callback in the retryer helpers - Completely replaces `retrying` with `tenacity`, ref b/200293121. Retrying is not longer maintained. - Improves the readability of timeout errors: now they contain the timeout (or the attempt number) exceeded, and information why the timeout failed (exception/check function): Before: > `tenacity.RetryError: RetryError[<Future at 0x7f8ce156bc18 state=finished returned dict>]` After: > `framework.helpers.retryers.RetryError: Retry error calling framework.infrastructure.k8s.KubernetesNamespace.get_pod: timeout 0:01:00 exceeded. Check result callback returned False.` - Improves the readability of the k8s wait operation errors: now the log includes colorized and formatted status of the k8s object being watched, instead of dumping the full k8s object. For example, here's how an error caused by using incorrect TD bootstrap image:

sergiitk · 2022-08-24T17:27:55Z

Logging not enabled, confirming the framework still works:

gnossen

Awesome work! This is going to make troubleshooting issues so much easier!

gnossen · 2022-08-25T22:28:55Z

tools/run_tests/xds_k8s_test_driver/framework/xds_flags.py


+COLLECT_APP_LOGS = flags.DEFINE_bool(
+    'collect_app_logs',
+    default=False,


Why not turn this on by default?

I don't think we need this in most cases on local dev. It'll write to /tmp by default (this is an absl thing). It won't make sense for pre-provisioned pods (like url map tests) either.
Also, --log_dir is different per each environment (kokoro/staging/local), so I want setting collect_app_logs to be a conscious choice.
And there's always flagfiles to set this flags if needed.

tools/run_tests/xds_k8s_test_driver/framework/test_app/runners/base_runner.py

gnossen · 2022-08-25T22:54:13Z

tools/run_tests/xds_k8s_test_driver/framework/infrastructure/k8s_internal/k8s_log_collector.py

+logger = logging.getLogger(__name__)
+
+
+class PodLogCollector(threading.Thread):


There are a bunch of landmines when subclassing threading.Thread rather than just passing a callable to the threading.Thread constructor, but it looks like you've managed to avoid them all. No complaints on my end as long you've confirmed that it's all working as intended.

Yea, I had an impression this was the case, but couldn't find any strong advices against it. Thanks for noticing! I'll give it another looks, just in case. Won't be a difficult refactoring to not extend a thread, but manage it as a field.

- Enables pod log collection in all PSM interop jobs implemented in grpc#30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab Cherry-pick conflicts resolved: - Removed authz test

- Enables pod log collection in all PSM interop jobs implemented in grpc#30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab Cherry-pick conflicts resolved: - Removed authz test - Removed set command from the xlang script that doesn't exist in the following branches

- Enables pod log collection in all PSM interop jobs implemented in grpc#30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab Cherry-pick conflicts resolved: - Removed authz test - Ignore xlang script as it hasn't been created

…30860) - Enables pod log collection in all PSM interop jobs implemented in #30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab Cherry-pick conflicts resolved: - Removed authz test

…30861) - Enables pod log collection in all PSM interop jobs implemented in #30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab Cherry-pick conflicts resolved: - Removed authz test

…30862) - Enables pod log collection in all PSM interop jobs implemented in #30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab Cherry-pick conflicts resolved: - Removed authz test

…30863) - Enables pod log collection in all PSM interop jobs implemented in #30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab Cherry-pick conflicts resolved: - Removed authz test

…30864) - Enables pod log collection in all PSM interop jobs implemented in #30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab Cherry-pick conflicts resolved: - Removed authz test - Removed set command from the xlang script that doesn't exist in the following branches

…30865) - Enables pod log collection in all PSM interop jobs implemented in #30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab Cherry-pick conflicts resolved: - Removed authz test - Ignore xlang script as it hasn't been created

…30857) - Enables pod log collection in all PSM interop jobs implemented in #30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab

…30856) - Enables pod log collection in all PSM interop jobs implemented in #30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab

* Bump 1.49.x branch to 1.49.0.pre1 (#30615) * bump version to 1.49.0-pre1 * regenerate projects * [backport][v1.49.x] forkable fixes (#30646) * Fix forkable globals (#30608) * Fix forkable repeated registration (#30642) This fixes a bug that could occur on repeated grpc initialization (after a complete shutdown) Fixes #30640 * Drop support for ruby 2.5 (#30699) (#30762) * Drop ruby 2.5 support * Backport: "stabilize the C2P resolver URI scheme" to v1.49.x (#30654) * stabilize the C2P resolver URI scheme * Bump 1.49 branch to 1.49.0.pre2 (#30786) * bump version to 1.49.0-pre2 * regenerate projects * Update protobuf on ancillary packages (#30795) (#30805) * Bump release version on 1.49 to 1.49.0.pre3 (#30814) * bump version to 1.49.0-pre3 * regenerate projects * xDS interop: enable pod log collection in the buildscripts (#30735) (#30856) - Enables pod log collection in all PSM interop jobs implemented in #30594. - Associate test suite runs with their own log file, so it's displayed on "Target Log" tab * xDS interop: buildscripts: fix run_test return status (#30768) (#30875) To capture the return status of the test in run_test the last command must be the call to the test itself. This removes `set +x`, which makes the run_test always return success, and not propagate the test status. I can't find it, but this exact error bit us before. Looks like it leaked to other scripts. The good thing is if the test was executed, it's failure would still be picked up from the result xml. However, if the test framework didn't start in the first place, the result will be false positive. Example: https://source.cloud.google.com/results/invocations/98d3e679-ec8a-40bd-9f36-88179747b0d6/targets ``` /home/kbuilder/.pyenv/versions/k8s_xds_test_runner/bin/python3: Error while finding module specification for 'tests.authz_test' (ModuleNotFoundError: No module named 'tests') + set +x Failed test suites: 0 [ID: 3548168] Command finished after 625 secs, exit value: 0 ``` * client_channel: fix crash when cancelling a watch after SHUTDOWN (#30885) (#30928) * Support Python 3.11 (#30818) (#30944) * Support Python 3.11 * Update build images for 3.11 * Whoopsie * The architecture of this thing is garbage * Silence ownership warning * Account for change in git behavior * Fix directory * I am in great pain * Update Windows and arm linux * Agh * Clean up * Bump 1.49 branch to 1.49.0 (#30974) * bump version to 1.49.0 * regenerate projects * Update protobuf to v21.6 on 1.49.x (#31028) * Update third_party/protobuf * run tools/distrib/python/make_grpcio_tools.py * update build_handwritten.yaml * regenerate projects * Bump v1.49.x to v1.49.1 (#31037) * bump version to 1.49.1 * regenerate projects * Automated change: Fix sanity tests Co-authored-by: gnossen <gnossen@users.noreply.github.com> * Fix ruby windows ucrt build (#31053) Co-authored-by: apolcyn <apolcyn@google.com> Co-authored-by: AJ Heller <hork@google.com> Co-authored-by: Richard Belleville <rbellevi@google.com> Co-authored-by: Sergii Tkachenko <sergiitk@google.com> Co-authored-by: Mark D. Roth <roth@google.com> Co-authored-by: Richard Belleville <gnossen@gmail.com> Co-authored-by: gnossen <gnossen@users.noreply.github.com>

sergiitk force-pushed the xds-interop-pod-logs branch 7 times, most recently from 59f45cd to 987caf8 Compare August 18, 2022 21:09

sergiitk mentioned this pull request Aug 22, 2022

xDS interop: Improve retry logic and logging for the k8s retry operations #30607

Merged

sergiitk force-pushed the xds-interop-pod-logs branch 2 times, most recently from 2902ada to 5e1eeb9 Compare August 23, 2022 21:19

sergiitk added 6 commits August 24, 2022 10:15

xDS interop: collect Test App logs

a8782c1

Move out k8s port forwarder to own file

17fb9d3

Implement PodLogCollector

88d747c

Improve cleanup logic

b8e84cf

Log drain

e3db5a2

Add client logging

c056bc2

sergiitk force-pushed the xds-interop-pod-logs branch from 5e1eeb9 to c056bc2 Compare August 24, 2022 17:15

sergiitk added release notes: no Indicates if PR should not be in release notes area/psm interop labels Aug 24, 2022

sergiitk added 3 commits August 24, 2022 11:19

Log thread id instead, open file in utf-8

acd4d73

Fix lint

bcf04ad

Disable timestamps in pod logs by default

24510fc

sergiitk marked this pull request as ready for review August 25, 2022 00:18

sergiitk requested a review from gnossen August 25, 2022 00:20

Enable timestamps because not all language implementations log them

6d23fe8

gnossen approved these changes Aug 25, 2022

View reviewed changes

sergiitk added 2 commits August 25, 2022 16:22

maybe_stop_logging

966b50c

Merge branch 'master' into xds-interop-pod-logs

5fbaa97

sergiitk mentioned this pull request Sep 7, 2022

xDS interop: enable pod log collection in the buildscripts (v1.44.x backport) #30861

Merged

sergiitk mentioned this pull request Sep 7, 2022

xDS interop: enable pod log collection in the buildscripts (v1.43.x backport) #30862

Merged

sergiitk mentioned this pull request Sep 7, 2022

xDS interop: enable pod log collection in the buildscripts (v1.42.x backport) #30863

Merged

sergiitk mentioned this pull request Sep 7, 2022

xDS interop: enable pod log collection in the buildscripts (v1.41.x backport) #30864

Merged

This was referenced Sep 7, 2022

test/kokoro: enable pod log collection in the buildscripts (v1.41.x backport) grpc/grpc-go#5641

Merged

test/kokoro: enable pod log collection in the buildscripts (v1.40.x backport) grpc/grpc-go#5642

Merged

sergiitk mentioned this pull request Sep 7, 2022

xDS interop: enable pod log collection in the buildscripts (@grpc/grpc-js@1.6.x backport) grpc/grpc-node#2219

Merged

sergiitk mentioned this pull request Sep 7, 2022

xDS interop: enable pod log collection in the buildscripts (@grpc/grpc-js@1.5.x backport) grpc/grpc-node#2220

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xDS interop: collect pod logs#30594

xDS interop: collect pod logs#30594
sergiitk merged 13 commits intogrpc:masterfrom
sergiitk:xds-interop-pod-logs

sergiitk commented Aug 16, 2022 •

edited

Loading

Uh oh!

sergiitk commented Aug 24, 2022 •

edited

Loading

Uh oh!

gnossen left a comment

Uh oh!

gnossen Aug 25, 2022

Uh oh!

sergiitk Aug 25, 2022 •

edited

Loading

Uh oh!

Uh oh!

gnossen Aug 25, 2022

Uh oh!

sergiitk Aug 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		logger = logging.getLogger(__name__)


		class PodLogCollector(threading.Thread):

Conversation

sergiitk commented Aug 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergiitk commented Aug 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnossen left a comment

Choose a reason for hiding this comment

Uh oh!

gnossen Aug 25, 2022

Choose a reason for hiding this comment

Uh oh!

sergiitk Aug 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gnossen Aug 25, 2022

Choose a reason for hiding this comment

Uh oh!

sergiitk Aug 25, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sergiitk commented Aug 16, 2022 •

edited

Loading

sergiitk commented Aug 24, 2022 •

edited

Loading

sergiitk Aug 25, 2022 •

edited

Loading