Update E2E Test Results#101
Conversation
- Fix result deletion: prevent --e2e=foo from deleting other test results - Add 3s preamble to media source MP4s to account for recording delay - Adjust orchestrator wait time for media source playback - Configure ntsc_effects_green_monitor tolerances for tint/preamble effects - Add green_monitor afterglow threshold to preset assertions Fixes ntsc_effects_green_monitor test - all assertions now pass.
There was a problem hiding this comment.
Pull request overview
Updates the E2E baselines and adjusts media-mode timing/thresholds so recorded output aligns better with expected playback behavior (especially for the Green Monitor effects scenario).
Changes:
- Add a media-file preroll (black video + silence) and extend media-mode wait time to cover preroll + content.
- Tune Green Monitor assertion thresholds/tolerances to reduce flakiness in local runs.
- Refresh stored E2E result artifacts (logs/metrics/READMEs) across multiple scenarios.
Reviewed changes
Copilot reviewed 35 out of 329 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/util/preset_assertions.py | Adds Green Monitor afterglow threshold override to reduce false failures. |
| tests/e2e/util/generate_media_source.py | Prepends a preroll to generated media files to better match OBS recording delay characteristics. |
| tests/e2e/scenarios/ntsc_effects_green_monitor/scenario.yaml | Adjusts tolerances for media-source preroll bleed and green-tint afterglow variability. |
| tests/e2e/results/pal_default/resource_usage.csv | Updates stored resource metrics for the PAL Default scenario run. |
| tests/e2e/results/pal_default/resource.json | Updates aggregated resource stats for the PAL Default scenario run. |
| tests/e2e/results/pal_default/obs_stdout.log | Updates captured OBS stdout log for the PAL Default scenario run. |
| tests/e2e/results/pal_default/network.json | Updates stored network timing stats for the PAL Default scenario run. |
| tests/e2e/results/pal_default/README.md | Updates rendered results summary for the PAL Default scenario run. |
| tests/e2e/results/ntsc_vintage_tv/resource_usage.csv | Updates stored resource metrics for the NTSC Vintage TV scenario run. |
| tests/e2e/results/ntsc_vintage_tv/resource.json | Updates aggregated resource stats for the NTSC Vintage TV scenario run. |
| tests/e2e/results/ntsc_vintage_tv/obs_stdout.log | Updates captured OBS stdout log for the NTSC Vintage TV scenario run. |
| tests/e2e/results/ntsc_vintage_tv/network.json | Updates stored network timing stats for the NTSC Vintage TV scenario run. |
| tests/e2e/results/ntsc_vintage_tv/README.md | Updates rendered results summary for the NTSC Vintage TV scenario run. |
| tests/e2e/results/ntsc_green_monitor/validation_results.json | Updates stored validation output for the NTSC Green Monitor scenario run. |
| tests/e2e/results/ntsc_green_monitor/resource_usage.csv | Updates stored resource metrics for the NTSC Green Monitor scenario run. |
| tests/e2e/results/ntsc_green_monitor/resource.json | Updates aggregated resource stats for the NTSC Green Monitor scenario run. |
| tests/e2e/results/ntsc_green_monitor/playback.csv | Updates playback timeline artifact for the NTSC Green Monitor scenario run. |
| tests/e2e/results/ntsc_green_monitor/obs_stdout.log | Updates captured OBS stdout log for the NTSC Green Monitor scenario run. |
| tests/e2e/results/ntsc_green_monitor/network.json | Updates stored network timing stats for the NTSC Green Monitor scenario run. |
| tests/e2e/results/ntsc_green_monitor/README.md | Updates rendered results summary for the NTSC Green Monitor scenario run. |
| tests/e2e/results/ntsc_default_avsync/validation_results.json | Updates stored validation output for the NTSC Default A/V Sync scenario run. |
| tests/e2e/results/ntsc_default_avsync/resource_usage.csv | Updates stored resource metrics for the NTSC Default A/V Sync scenario run. |
| tests/e2e/results/ntsc_default_avsync/resource.json | Updates aggregated resource stats for the NTSC Default A/V Sync scenario run. |
| tests/e2e/results/ntsc_default_avsync/network.json | Updates stored network timing stats for the NTSC Default A/V Sync scenario run. |
| tests/e2e/results/ntsc_default_avsync/av-sync.csv | Updates stored A/V sync CSV artifact for the NTSC Default A/V Sync scenario run. |
| tests/e2e/results/ntsc_default_avsync/README.md | Updates rendered results summary for the NTSC Default A/V Sync scenario run. |
| tests/e2e/results/ntsc_default_720p/resource_usage.csv | Updates stored resource metrics for the NTSC Default 720p scenario run. |
| tests/e2e/results/ntsc_default_720p/resource.json | Updates aggregated resource stats for the NTSC Default 720p scenario run. |
| tests/e2e/results/ntsc_default_720p/network.json | Updates stored network timing stats for the NTSC Default 720p scenario run. |
| tests/e2e/results/ntsc_default_720p/README.md | Updates rendered results summary for the NTSC Default 720p scenario run. |
| tests/e2e/results/ntsc_default/validation_results.json | Updates stored validation output for the NTSC Default scenario run. |
| tests/e2e/results/ntsc_default/resource_usage.csv | Updates stored resource metrics for the NTSC Default scenario run. |
| tests/e2e/results/ntsc_default/resource.json | Updates aggregated resource stats for the NTSC Default scenario run. |
| tests/e2e/results/ntsc_default/playback.csv | Updates playback timeline artifact for the NTSC Default scenario run. |
| tests/e2e/results/ntsc_default/network.json | Updates stored network timing stats for the NTSC Default scenario run. |
| tests/e2e/results/ntsc_default/README.md | Updates rendered results summary for the NTSC Default scenario run. |
| tests/e2e/framework/orchestrator.py | Extends media-mode wait time to include the added preroll. |
| local-build.sh | Changes result archiving behavior when copying E2E outputs into scenario result directories. |
| """Add black video frames and silent audio to the beginning to match UDP preamble. | ||
|
|
||
| In UDP mode, there's a ~9-10s preamble showing the C64 logo while waiting for packets. | ||
| For media mode, OBS starts recording ~3-4s after playback starts (natural delay). | ||
| Use 3s preamble so the natural recording delay skips most black frames. | ||
| """ |
Copilot
AI
Jan 23, 2026
•
There was a problem hiding this comment.
✅ Fixed: Updated docstring to clarify this is a black preroll to account for OBS start/recording delays, not matching the UDP logo screen.
| preamble_frames = int(preamble_duration_s * fps) | ||
| height, width, channels = frames_rgb.shape[1], frames_rgb.shape[2], frames_rgb.shape[3] |
Copilot
AI
Jan 23, 2026
•
There was a problem hiding this comment.
✅ Fixed: Changed to use math.ceil() to ensure the full preamble duration is met (e.g., NTSC 59.826 Hz now produces 180 frames = 3.0087s instead of 179 frames = 2.992s).
| logger.info("🎞️ Media mode: Waiting for OBS media playback...") | ||
| frame_rate = 50.125 if self.format == 'PAL' else 59.826 | ||
| duration = self.frames / frame_rate | ||
| time.sleep(duration + 2) | ||
| # Media files have a 3s preamble, so wait for preamble + content + buffer | ||
| time.sleep(3.0 + duration + 2) |
Copilot
AI
Jan 23, 2026
•
There was a problem hiding this comment.
✅ Fixed: Created shared constant MEDIA_PREAMBLE_DURATION_S in tests/e2e/util/constants.py. Both generate_media_source.py and orchestrator.py now import and use this constant to prevent drift.
| if [[ "$has_test_output" == "true" ]]; then | ||
| rm -rf "$results_root_dir" | ||
| # Create results directory if it doesn't exist (don't delete to preserve other test results) | ||
| mkdir -p "$results_root_dir" |
Copilot
AI
Jan 23, 2026
•
There was a problem hiding this comment.
✅ Fixed: Changed to use rsync -a --delete to properly sync test output and remove stale files from previous runs, ensuring consistent and clean results.
| # Copy test output, overwriting only the files that were generated | ||
| cp -a "$test_output_dir/." "$results_root_dir/" |
Copilot
AI
Jan 23, 2026
•
There was a problem hiding this comment.
✅ Fixed: Replaced cp -a with rsync -a --delete to ensure obsolete files from previous runs are removed, making committed/archived results internally consistent.
No description provided.