Skip to content

Conversation

@davidgamez
Copy link
Member

@davidgamez davidgamez commented Mar 10, 2025

Summary:

Feeds that failed to download or process with runtime exceptions were not marked as corrupted.

Expected behavior:

All feeds that failed to download or process with system error notices are marked as corrupted.

Closes: #2005

From our AI friend

This pull request introduces several changes to the output-comparator module, primarily focused on enhancing the handling of system errors in validation reports. The key modifications include adding new command-line parameters for system error files, updating methods to accommodate these parameters, and improving error handling logic.

Enhancements to command-line arguments and parameters:

Improved error handling and validation logic:

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@github-actions
Copy link
Contributor

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit 8df6f0b
Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (0 out of 1802 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Errors (0 out of 1802 datasets, ~0%) ✅

No changes were detected due to the code change.

New Warnings (0 out of 1802 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (0 out of 1802 datasets, ~0%) ✅

No changes were detected due to the code change.

🛡️ Corruption Check

25 out of 1827 sources (~1 %) are corrupted.
Dataset Ref Report Exists Ref Report Readable Latest Report Exists Latest Report Readable
de-baden-wuttemberg-nvbw-gtfs-2393.json
de-bayern-munchner-verkehrs--und-tarifverbund-gmbh-mvv-gtfs-2252
de-sachsen-mitteldeutscher-verkehrsverbund-gmbh-mdv-gtfs-2360
ro-buzau-transbus-buzau-gtfs-2106
ro-dambovita-servicii-publice-municipale-targoviste-gtfs-2107
ro-prahova-transport-calatori-express-ploiesti-gtfs-2108
us-california-regional-transportation-commission-of-southern-nevada-rtc-gtfs-110
us-california-santa-maria-area-transit-gtfs-26
us-california-south-county-transit-link-gtfs-2203
us-california-taft-maricopa-area-transit-gtfs-821
us-california-tri-delta-transit-gtfs-1974
us-illinois-danville-mass-transit-gtfs-2363
us-kansas-salina-gtfs-1867
us-ohio-stark-area-regional-transit-authority-sarta-gtfs-2001
us-utah-cache-valley-gtfs-1906
us-virginia-arlington-transit-gtfs-485
us-washington-coast-transportation-gtfs-2162
us-washington-columbia-county-public-transportation-gtfs-2168
us-washington-community-in-motion-gtfs-2163
us-washington-eastside-friends-of-seniors-gtfs-2166
us-washington-hopelink-transportation-gtfs-2167
us-washington-intercity-transit-gtfs-2289
us-washington-paratransit-services-gtfs-2176
us-washington-puget-sound-educational-service-district-gtfs-2177
us-washington-sound-generations-hyde-shuttle-gtfs-2183

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

Time Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 3.71 3.84 ⬆️+0.12
Median -- 1.36 1.45 ⬆️+0.08
Standard Deviation -- 11.09 11.14 ⬆️+0.05
Minimum in References Reports us-california-city-of-wasco-gtfs-1788 0.46 0.47 ⬆️+0.01
Maximum in Reference Reports gb-unknown-uk-aggregate-feed-gtfs-2014 297.99 300.25 ⬆️+2.26
Minimum in Latest Reports us-california-city-of-wasco-gtfs-1788 0.46 0.47 ⬆️+0.01
Maximum in Latest Reports gb-unknown-uk-aggregate-feed-gtfs-2014 297.99 300.25 ⬆️+2.26
📜 Memory Consumption
Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 457.10 MiB 460.52 MiB ⬆️+3.43 MiB
Median -- 335.92 MiB 335.92 MiB ⬇️0 bytes
Standard Deviation -- 738.98 MiB 734.33 MiB ⬇️-4.64 MiB
Minimum in References Reports us-colorado-town-of-telluride-gtfs-2050 39.04 MiB 43.73 MiB ⬆️+4.69 MiB
Maximum in Reference Reports gb-unknown-uk-aggregate-feed-gtfs-2014 11.19 GiB 10.68 GiB ⬇️-522.62 MiB
Minimum in Latest Reports us-iowa-citibus-gtfs-2304 56.50 MiB 39.70 MiB ⬇️-16.80 MiB
Maximum in Latest Reports gb-unknown-uk-aggregate-feed-gtfs-2014 11.19 GiB 10.68 GiB ⬇️-522.62 MiB

@davidgamez
Copy link
Member Author

The acceptance test report shows an increased number of corrupted files. This is expected, as before, the report missed the feeds failing to download. Those feeds have a report JSON generated without notices and system errors with notices.

@cka-y
Copy link
Contributor

cka-y commented Mar 11, 2025

Should we add to the Corruption Check section and extra field that says if its corrupted because we couldn't download the file? Maybe out of scope for this PR

@davidgamez
Copy link
Member Author

Should we add to the Corruption Check section and extra field that says if its corrupted because we couldn't download the file? Maybe out of scope for this PR

The cause might be different than just downloading the file. Any runtime exception that exits the validation process will count as "corrupted". We can add more information table for sure, maybe the notice description from the system_errors.json

Copy link
Contributor

@qcdyx qcdyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@davidgamez
Copy link
Member Author

The acceptance test report shows an increased number of corrupted files. This is expected, as before, the report missed the feeds failing to download. Those feeds have a report JSON generated without notices and system errors with notices.

Created a follow-up issue

@github-actions
Copy link
Contributor

📝 Acceptance Test Report

📋 Summary

✅ The rule acceptance has passed for commit f1dfa6a
Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (0 out of 1802 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Errors (0 out of 1802 datasets, ~0%) ✅

No changes were detected due to the code change.

New Warnings (0 out of 1802 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Warnings (0 out of 1802 datasets, ~0%) ✅

No changes were detected due to the code change.

🛡️ Corruption Check

25 out of 1827 sources (~1 %) are corrupted.
Dataset Ref Report Exists Ref Report Readable Latest Report Exists Latest Report Readable
de-baden-wuttemberg-nvbw-gtfs-2393.json
de-bayern-munchner-verkehrs--und-tarifverbund-gmbh-mvv-gtfs-2252
de-sachsen-mitteldeutscher-verkehrsverbund-gmbh-mdv-gtfs-2360
ro-buzau-transbus-buzau-gtfs-2106
ro-dambovita-servicii-publice-municipale-targoviste-gtfs-2107
ro-prahova-transport-calatori-express-ploiesti-gtfs-2108
us-california-regional-transportation-commission-of-southern-nevada-rtc-gtfs-110
us-california-santa-maria-area-transit-gtfs-26
us-california-south-county-transit-link-gtfs-2203
us-california-taft-maricopa-area-transit-gtfs-821
us-california-tri-delta-transit-gtfs-1974
us-illinois-danville-mass-transit-gtfs-2363
us-kansas-salina-gtfs-1867
us-ohio-stark-area-regional-transit-authority-sarta-gtfs-2001
us-utah-cache-valley-gtfs-1906
us-virginia-arlington-transit-gtfs-485
us-washington-coast-transportation-gtfs-2162
us-washington-columbia-county-public-transportation-gtfs-2168
us-washington-community-in-motion-gtfs-2163
us-washington-eastside-friends-of-seniors-gtfs-2166
us-washington-hopelink-transportation-gtfs-2167
us-washington-intercity-transit-gtfs-2289
us-washington-paratransit-services-gtfs-2176
us-washington-puget-sound-educational-service-district-gtfs-2177
us-washington-sound-generations-hyde-shuttle-gtfs-2183

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

Time Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 3.71 3.81 ⬆️+0.10
Median -- 1.38 1.46 ⬆️+0.08
Standard Deviation -- 10.78 10.78 ⬇️-0.00
Minimum in References Reports us-california-flex-v2-developer-test-feed-1-gtfs-1817 0.50 0.66 ⬆️+0.16
Maximum in Reference Reports gb-unknown-uk-aggregate-feed-gtfs-2014 280.36 279.59 ⬇️-0.77
Minimum in Latest Reports us-california-city-of-wasco-gtfs-1788 0.52 0.49 ⬇️-0.03
Maximum in Latest Reports gb-unknown-uk-aggregate-feed-gtfs-2014 280.36 279.59 ⬇️-0.77
📜 Memory Consumption
Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 463.84 MiB 461.10 MiB ⬇️-2.74 MiB
Median -- 335.92 MiB 335.92 MiB ⬇️0 bytes
Standard Deviation -- 761.77 MiB 745.67 MiB ⬇️-16.09 MiB
Minimum in References Reports ro-vrancea-consiliul-judetean-vrancea-gtfs-1984 38.73 MiB 38.13 MiB ⬇️-611.95 KiB
Maximum in Reference Reports gb-unknown-uk-aggregate-feed-gtfs-2014 10.73 GiB 10.70 GiB ⬇️-34.57 MiB
Minimum in Latest Reports ro-vrancea-consiliul-judetean-vrancea-gtfs-1984 38.73 MiB 38.13 MiB ⬇️-611.95 KiB
Maximum in Latest Reports gb-unknown-uk-aggregate-feed-gtfs-2014 10.73 GiB 10.70 GiB ⬇️-34.57 MiB

@davidgamez davidgamez merged commit d9329f0 into master Mar 11, 2025
134 checks passed
@davidgamez davidgamez deleted the fix/acceptance_test_inconsistent branch March 11, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Acceptance tests keep dropping errors for missing_required_column

4 participants