Modularized E2E Test Infrastructure by chrisgleissner · Pull Request #87 · chrisgleissner/c64stream

chrisgleissner · 2026-01-11T15:11:50Z

No description provided.

- Introduced `scenarios.sh` for loading and validating scenario configurations from YAML files. - Created `system.sh` for resource monitoring, including CPU, memory, and disk usage tracking. - Implemented `test.sh` to run E2E tests with scenario-specific assertions and logging. - Added utility functions in `util.sh` for logging, formatting, and managing C64 device streaming. - Enhanced resource management with functions to ensure adequate UDP buffer sizes and process priority capabilities. - Structured the framework to support verbose logging and scenario-specific configurations.

Copilot

Pull request overview

This PR modularizes the E2E test infrastructure by extracting functionality from monolithic scripts into focused shell library modules, improving maintainability and reusability.

Changes:

Extracts E2E test functionality into 9 modular shell libraries (util, test, system, scenarios, report, packets, deps, build, args)
Adds Python framework package structure with __init__.py files
Updates test results with new validation data and adds new artifacts (playback.csv, README.md)
Removes obsolete build-docker.sh script

Reviewed changes

Copilot reviewed 16 out of 22 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/e2e/shell_lib/util.sh	Utility functions for logging, formatting, and system operations
tests/e2e/shell_lib/test.sh	E2E test execution and scenario assertion logic
tests/e2e/shell_lib/system.sh	System resource monitoring and configuration (UDP buffers, perf permissions)
tests/e2e/shell_lib/scenarios.sh	Scenario loading and configuration management
tests/e2e/shell_lib/report.sh	Test report generation with detailed metrics and visualizations
tests/e2e/shell_lib/packets.sh	Test packet generation logic
tests/e2e/shell_lib/deps.sh	Dependency checking and installation automation
tests/e2e/shell_lib/build.sh	Plugin build and installation logic
tests/e2e/shell_lib/args.sh	Command-line argument parsing and validation
tests/e2e/results/ntsc_default/*	Updated test results and new artifacts (validation, resource, playback data)
tests/e2e/framework/init.py	Python framework package initialization
tests/e2e/framework/obs/init.py	Python OBS integration package initialization
build-docker.sh	Removed obsolete Docker build script

- Added OBSProcessManager for managing the OBS Studio lifecycle, including starting, stopping, and checking process health. - Introduced OBSWebsocketClient for interacting with the OBS WebSocket API, enabling remote control of OBS functionalities. - Created E2EOrchestrator to coordinate end-to-end testing, including environment setup, OBS configuration, and result validation. - Developed validation modules for recording output, A/V sync, and network timing metrics. - Integrated XvfbController for headless testing environments. - Updated test results and logs for improved tracking and debugging.

…guration

…mproved stability and logging

This commit fixes 4 failing E2E scenarios by addressing two separate issues: 1. Effect scenarios (amber_monitor, phosphor_glow, vintage_tv): - frame_logic.py was rejecting 'warning' status as failure - Fixed by accepting both 'pass' and 'warning' as successful states - Effects can trigger warnings due to visual analysis variations 2. Full-frame-pop scenario (ntsc_default_avsync): - Main branch skips av_sync and frame_logic validation for these - Added full_frame_pop parameter to ResultValidator - Skip av_sync validation (matches main branch behavior) - Skip frame_logic validation (frame_sequence_box set to null) - Estimate frame_processing from video packets received - Fixed report_generator.py to handle None frame_sequence_box Changes: - tests/e2e/framework/validation/frame_logic.py: Accept 'warning' status - tests/e2e/framework/validation/results.py: Skip checks for full_frame_pop - tests/e2e/framework/orchestrator.py: Pass full_frame_pop flag - tests/e2e/util/report_generator.py: Handle None frame_sequence_box Results: - ntsc_amber_monitor: PASS - ntsc_phosphor_glow: PASS - ntsc_vintage_tv: PASS - ntsc_default_avsync: PASS (av_sync/frame_logic skipped as expected) All validation_results.json structures now match main branch behavior.

Changes: - Enable record_av_sync=true in both properties_e2e_local.ini and properties_e2e_ci.ini - Make AV sync failures non-critical (warnings instead of errors) for non-avsync scenarios - Heavy effects (amber tint, afterglow) cause unreliable pop detection - Pass 'warnings' parameter to _check_av_sync() method in validation/results.py - Report generator improvements: - Show ALL AV pops in Sync Details section, including ignored pops with reason - Extract sample frame at first audio pop time when av_sync data is available - Previously extracted at 50% mark, which often missed pops - Frame progression metrics now visible in all scenario READMEs Note: Short tests (5s) will show AV sync warnings due to 4s skip window in pop detector, but this is expected. Longer tests (10s+) should show proper AV sync when effects are light.

Changed from 'framework.util.network_analysis' to 'util.network_analysis' since network_analysis.py is located in tests/e2e/util/, not framework/util/.

Python unit tests need PyYAML since e2e.py now imports yaml. This was causing CI Python unit test failures.

full-frame-pop scenarios (like ntsc_default_avsync) now run post-analysis on the MP4 recording to detect AV pops for the README, even though they skip the av-sync.csv validation (which tests the plugin's runtime detection). This ensures all scenarios report AV pops in their README.md files.

This fixes the AV sync timing issue by ensuring we don't start packet replay until the plugin has requested BOTH video and audio streams. Starting replay early (after only video start) can create artificial A/V offset in the recording. Matches the main branch behavior.

When all detected pops are out of sync (none meet the 30ms tolerance), treat this as a critical error rather than a warning. This indicates a fundamental timing issue in packet generation, replay, or plugin processing. Partial sync failures remain warnings as they may be due to effects or minor timing jitter.

…start times The packet replayer runs separate udp_replay processes for video and audio. Without synchronized start times, thread scheduling delays (100-200ms) between the two process launches caused audio packets to arrive significantly earlier than video packets, creating a systematic A/V offset of ~145ms at the network level and ~162ms at the OBS level. Fix: Use --start-at-us with a shared future timestamp (8-10 seconds ahead). Both processes preload packets and then start sending at the exact same absolute monotonic time, eliminating the scheduling-induced offset. Results (ntsc_default_avsync): - Before: obs_offset=-162ms, net_offset=-146ms (FAIL) - After: obs_offset=-18ms, net_offset=-2ms (PASS) This matches the main branch behavior and keeps A/V sync within the 40ms tolerance required for passing tests.

OBS creates recordings with timestamped filenames (e.g., '2026-01-12 16-41-33.mp4'). The recording validator was copying these to 'c64_recording.mp4' in the output directory, leaving both files and wasting disk space. Fix: Move instead of copy. If the recording is already in the output directory with a wrong name, rename it. Otherwise, move it from the external location.

Sending a machine reset to the real C64 Ultimate for every E2E test is highly disruptive when the device is being used for other purposes. Most tests use mocked packet replay and don't need the reset. Fix: Only call stop_real_c64_streaming() for the ntsc_default_avsync_device scenario at the beginning and end of the test. This prevents unnecessary resets during the normal test suite while still ensuring clean state for device testing.

Device tests now use ports 11000 (video) and 11001 (audio) while synthetic tests continue using 21000/21001. This prevents cross-pollution between real C64 Ultimate device streams and mock packet replay, ensuring complete test isolation.

High jitter scenarios (100ms) legitimately degrade A/V sync beyond the default 40ms tolerance. The validation framework now respects per-scenario tolerance settings specified in scenario.yaml (e.g., av_sync_tolerance_ms: 100). Changes: - Added av_sync_tolerance_ms parameter through validation chain - E2EOrchestrator → ResultValidator → AVSyncValidator → verify_av_sync() - Loaded from scenario.yaml with 40ms default - Fixes ntsc_delay_buffer500_jitter100 false failures

The extracted still frame should show the white square/frame (video pop) to verify visual sync markers. Previously extracted at audio_pop_time_ms which could miss the visual indicator if there was any A/V offset. Now uses closest_video_pop_ms as primary source, with fallback to audio_pop_time_ms if video pop timing unavailable.

Heavy CRT effects (green/amber monitor with afterglow) significantly dim the white video pop marker, preventing detection when using absolute brightness threshold (224+). The delta-based spike detection correctly finds the pops, but they were being rejected by the brightness check. Fix: Use adaptive brightness threshold based on the baseline median rather than absolute 224+ requirement. Allows detection of dimmed whites (150-200 range) while still filtering out false positives in dark areas. This matches behavior on main where green monitor tests passed reliably.

…o names, and project field

- Introduced new JSON and CSV files to capture resource usage metrics during tests, including CPU, RAM, and GPU statistics. - Added detailed validation results in JSON format, covering aspects such as UDP reception, frame processing accuracy, video recording size, and network timing. - Updated the report generation script to handle the new validation results and resource usage files, ensuring proper logging and success messages.

- Updated README.md to clarify the conditions under which device scenarios run. - Removed the environment variable check for device tests in scenarios.sh, allowing device tests to run automatically when applicable. - Enhanced report_generator.py to format generated report information with bullet points for better readability. - Modified verify_output.py to include a frame limit option for ffmpeg commands and improved process cleanup handling. - Updated verify_tint.py to add frame limit support for ffmpeg commands and improved process cleanup handling.

Copilot AI review requested due to automatic review settings January 11, 2026 15:11

chrisgleissner changed the title ~~Modularize E2E Tests~~ Modularize E2E Test Infrastructure Jan 11, 2026

chrisgleissner changed the title ~~Modularize E2E Test Infrastructure~~ Modularized E2E Test Infrastructure Jan 11, 2026

Copilot started reviewing on behalf of chrisgleissner January 11, 2026 15:12 View session

Copilot AI reviewed Jan 11, 2026

View reviewed changes

chrisgleissner added 24 commits January 11, 2026 15:44

fix(e2e): Remove unnecessary blank line in E2EOrchestrator class

a1a543a

feat(e2e): Enhance environment setup for OBS stability and Xvfb confi…

739d6e8

…guration

feat(e2e): Enhance environment setup and OBS process management for i…

c5468aa

…mproved stability and logging

E2E: Fix import path for network_analysis module

8e5c649

Changed from 'framework.util.network_analysis' to 'util.network_analysis' since network_analysis.py is located in tests/e2e/util/, not framework/util/.

E2E: Add PyYAML to requirements.txt

cc964c9

Python unit tests need PyYAML since e2e.py now imports yaml. This was causing CI Python unit test failures.

Merge remote-tracking branch 'origin/main' into test/modularize-e2e

1e82abc

Merge main to get latest README updates

7aee4b3

E2E: Clean up whitespace in video pop detection function

1c8c63b

Merge main into test/modularize-e2e

7c21482

Fix e2e report generation: add audio packet counting, correct scenari…

4d356a5

…o names, and project field

chrisgleissner added 3 commits January 13, 2026 15:26

Stabilize E2E reporting and device scenarios

27dd99e

Add packet loss details and env metadata to reports

f4f3cd8

chrisgleissner merged commit 6d45cfd into main Jan 13, 2026
36 checks passed

chrisgleissner deleted the test/modularize-e2e branch January 13, 2026 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modularized E2E Test Infrastructure#87

Modularized E2E Test Infrastructure#87
chrisgleissner merged 28 commits into
mainfrom
test/modularize-e2e

chrisgleissner commented Jan 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chrisgleissner commented Jan 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants