-
Notifications
You must be signed in to change notification settings - Fork 112
Use smaller integer field types for GTFS entity classes #1273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
👋 Welcome back! |
asvechnikov2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests!
...or/src/main/java/org/mobilitydata/gtfsvalidator/processor/EntityImplementationGenerator.java
Outdated
Show resolved
Hide resolved
.../tests/src/test/java/org/mobilitydata/gtfsvalidator/processor/tests/EnumSizesSchemaTest.java
Outdated
Show resolved
Hide resolved
...sor/tests/src/main/java/org/mobilitydata/gtfsvalidator/processor/tests/ManyFieldsSchema.java
Show resolved
Hide resolved
|
Nice |
GTFS enums usually fit into one byte, so we don't need an int. GTFS
tables usually less than 16 fields, so we need a byte or a short int for
storing a bitmask of assigned fields.
Smaller fields allow us to save memory for the largest table -
stop_times.txt that may have 30 M lines.
Instead of:
class GtfsStopTime {
// ...
private int pickupType;
private int dropOffType;
private int continuousPickup;
private int continuousDropOff;
private int timepoint;
private int bitField0_;
}
we have now:
class GtfsStopTime {
// ...
private byte pickupType;
private byte dropOffType;
private byte continuousPickup;
private byte continuousDropOff;
private byte timepoint;
private short bitField0_;
}
which is 5 * (4 - 1) + 2 = 17 bytes smaller. Java aligns classes
by 8 bytes, so we actually save 16 bytes per line.
Total save: 0.5 GiB for 30 M lines in stop_times.txt
4fd9ac8 to
2168d70
Compare
ed5c615 to
d441c66
Compare
Thanks! |
I missed GTFS Validator :) |
asvechnikov2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
GTFS enums usually fit into one byte, so we don't need an int. GTFS tables usually less than 16 fields, so we need a byte or a short int for storing a bitmask of assigned fields.
Smaller fields allow us to save memory for the largest table - stop_times.txt that may have 30 M lines.
Instead of:
we have now:
which is 5 * (4 - 1) + 2 = 17 bytes smaller. Java aligns classes by 8 bytes, so we actually save 16 bytes per line.
Total save: 0.5 GiB for 30 M lines in stop_times.txt