-
Notifications
You must be signed in to change notification settings - Fork 112
fix: fix logic to check for nested GTFS files in ZIP #1972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix logic to check for nested GTFS files in ZIP #1972
Conversation
|
Hey @skalexch could you take a look at the 14 datasets that contains new errors? (You can see all of them by clicking on the arrow) New Errors (14 out of 1808 datasets, ~1%) ✅Details of new errors due to code change, which is less than the provided threshold of 1%.
|
|
@qcdyx the screenshot below shows the affected datasets and above them the folders that I extracted from them. I also included mdb-2854 as control. It does seem like for all of the concerned datasets, the GTFS files exist within a subfolder. For the control dataset, the extracted folder has the same name as the zipfile, which means that the files reside in the root directory. Please note that I could not download mdb-612 and mdb-1324 |
c1342e4 to
13bf3f5
Compare
qcdyx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your contribution!

Summary:
This PR fixes a bug with our logic to check whether a ZIP file we're loading has GTFS files in a subfolder. It looks like
ZipInputStream.getNextEntrydoesn't always return subfolders, depending on how the ZIP file was created. The subfolder and ZIP file having the same name in #1912 was a red herring.Details
``` $ unzip -l piercetransit-wa-us--flex-v2.zip Archive: piercetransit-wa-us--flex-v2.zip Length Date Time Name --------- ---------- ----- ---- 170 11-28-2023 15:22 piercetransit-wa-us--flex-v2/timetables.txt 81 11-28-2023 15:22 piercetransit-wa-us--flex-v2/fare_attributes.txt 18 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stop_attributes.txt 56 11-28-2023 15:22 piercetransit-wa-us--flex-v2/transfers.txt 183 11-28-2023 15:22 piercetransit-wa-us--flex-v2/agency.txt 12 11-28-2023 15:22 piercetransit-wa-us--flex-v2/areas.txt 54 11-28-2023 15:22 piercetransit-wa-us--flex-v2/fare_rules.txt 437 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar_dates.txt 4367 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stop_times.txt 374 11-28-2023 15:22 piercetransit-wa-us--flex-v2/location_groups.txt 137 11-28-2023 15:22 piercetransit-wa-us--flex-v2/directions.txt 53 11-28-2023 15:22 piercetransit-wa-us--flex-v2/frequencies.txt 18 11-28-2023 15:22 piercetransit-wa-us--flex-v2/farezone_attributes.txt 895 11-28-2023 15:22 piercetransit-wa-us--flex-v2/shapes.txt 983 11-28-2023 15:22 piercetransit-wa-us--flex-v2/trips.txt 355 11-28-2023 15:22 piercetransit-wa-us--flex-v2/feed_info.txt 2051 11-28-2023 15:22 piercetransit-wa-us--flex-v2/locations.geojson 104 11-28-2023 15:22 piercetransit-wa-us--flex-v2/runcut.txt 2170 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stops.txt 117 11-28-2023 15:22 piercetransit-wa-us--flex-v2/linked_datasets.txt 131 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar_attributes.txt 62 11-28-2023 15:22 piercetransit-wa-us--flex-v2/timetable_stop_order.txt 1745 11-28-2023 15:22 piercetransit-wa-us--flex-v2/booking_rules.txt 265 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar.txt 520 11-28-2023 15:22 piercetransit-wa-us--flex-v2/routes.txt --------- ------- 15358 25 files $ mv piercetransit-wa-us--flex-v2.zip foobar.zip $ unzip -l foobar.zip Archive: foobar.zip Length Date Time Name --------- ---------- ----- ---- 170 11-28-2023 15:22 piercetransit-wa-us--flex-v2/timetables.txt 81 11-28-2023 15:22 piercetransit-wa-us--flex-v2/fare_attributes.txt 18 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stop_attributes.txt 56 11-28-2023 15:22 piercetransit-wa-us--flex-v2/transfers.txt 183 11-28-2023 15:22 piercetransit-wa-us--flex-v2/agency.txt 12 11-28-2023 15:22 piercetransit-wa-us--flex-v2/areas.txt 54 11-28-2023 15:22 piercetransit-wa-us--flex-v2/fare_rules.txt 437 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar_dates.txt 4367 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stop_times.txt 374 11-28-2023 15:22 piercetransit-wa-us--flex-v2/location_groups.txt 137 11-28-2023 15:22 piercetransit-wa-us--flex-v2/directions.txt 53 11-28-2023 15:22 piercetransit-wa-us--flex-v2/frequencies.txt 18 11-28-2023 15:22 piercetransit-wa-us--flex-v2/farezone_attributes.txt 895 11-28-2023 15:22 piercetransit-wa-us--flex-v2/shapes.txt 983 11-28-2023 15:22 piercetransit-wa-us--flex-v2/trips.txt 355 11-28-2023 15:22 piercetransit-wa-us--flex-v2/feed_info.txt 2051 11-28-2023 15:22 piercetransit-wa-us--flex-v2/locations.geojson 104 11-28-2023 15:22 piercetransit-wa-us--flex-v2/runcut.txt 2170 11-28-2023 15:22 piercetransit-wa-us--flex-v2/stops.txt 117 11-28-2023 15:22 piercetransit-wa-us--flex-v2/linked_datasets.txt 131 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar_attributes.txt 62 11-28-2023 15:22 piercetransit-wa-us--flex-v2/timetable_stop_order.txt 1745 11-28-2023 15:22 piercetransit-wa-us--flex-v2/booking_rules.txt 265 11-28-2023 15:22 piercetransit-wa-us--flex-v2/calendar.txt 520 11-28-2023 15:22 piercetransit-wa-us--flex-v2/routes.txt --------- ------- 15358 25 files ```Closes #1912
Expected behavior:
We get an invalid_input_files_in_subfolder notice even if the subfolder is not treated as a standalone entry.
Testing:
Before:

After:

Please make sure these boxes are checked before submitting your pull request - thanks!
gradle testto make sure you didn't break anything