[Tests] Add pod log collection for model monitoring test failures #9000

alxtkr77 · 2025-12-01T14:31:24Z

Summary

Add automatic pod log collection when model monitoring system tests fail to help debug CI failures.

Changes Made

Add collect_monitoring_pod_logs() and helper methods to TestMLRunSystemModelMonitoring
Add pytest hook in conftest.py to trigger log collection on test failure
Collect logs from monitoring pods (stream, controller, writer, serving)
Collect filtered error logs from mlrun-api pods mentioning test project

Testing

Lint passes
Manual verification of log collection on test failure

Reference

Jira: ML-11480

Add automatic collection and logging of pod logs when model monitoring system tests fail, to help debug CI failures. Changes: - Add collect_monitoring_pod_logs() to TestMLRunSystemModelMonitoring - Collect logs from monitoring pods (stream, controller, writer, serving) - Collect filtered error logs from mlrun-api pods mentioning test project - Add pytest hook in conftest.py to trigger on test failure - Requires MLRUN_SYSTEM_TEST_KUBECONFIG_PATH env var to enable ML-11480 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

liranbg

I think this can be more generalized making rest of the test suites dump their logs upon error. overall, you could collect all system pods with certain labels (the ones mlrun set across) and that would be helpful when debugging any failing test

assaf758 · 2025-12-02T06:51:02Z

I think this can be more generalized making rest of the test suites dump their logs upon error. overall, you could collect all system pods with certain labels (the ones mlrun set across) and that would be helpful when debugging any failing test

Yep, I second that!

Move pod log collection from model_monitoring-specific to all system tests: - Add collect_pod_logs_on_failure() to TestMLRunSystem base class - Project pods (name contains project_name): collect full logs - System pods (mlrun-api-*): collect time-bounded logs via since_seconds - Add autouse fixture to track test start time for duration calculation - Remove duplicate code from model_monitoring/__init__.py - Delete model_monitoring/conftest.py (functionality moved to system level) This addresses reviewer feedback to make pod log collection available for debugging failures in any system test suite, not just model monitoring.

alxtkr77 requested a review from liranbg as a code owner December 1, 2025 14:31

github-actions bot added area/tests area/system-tests labels Dec 1, 2025

liranbg reviewed Dec 1, 2025

View reviewed changes

assaf758 approved these changes Dec 2, 2025

View reviewed changes

assaf758 merged commit bf8a3b4 into mlrun:development Dec 2, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tests] Add pod log collection for model monitoring test failures #9000

[Tests] Add pod log collection for model monitoring test failures #9000

Uh oh!

alxtkr77 commented Dec 1, 2025

Uh oh!

liranbg left a comment

Uh oh!

assaf758 commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Tests] Add pod log collection for model monitoring test failures #9000

[Tests] Add pod log collection for model monitoring test failures #9000

Uh oh!

Conversation

alxtkr77 commented Dec 1, 2025

Summary

Changes Made

Testing

Reference

Uh oh!

liranbg left a comment

Choose a reason for hiding this comment

Uh oh!

assaf758 commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants