Skip to content

Conversation

@aababilov
Copy link
Collaborator

@aababilov aababilov commented Oct 14, 2022

GTFS enums usually fit into one byte, so we don't need an int. GTFS tables usually less than 16 fields, so we need a byte or a short int for storing a bitmask of assigned fields.

Smaller fields allow us to save memory for the largest table - stop_times.txt that may have 30 M lines.

Instead of:

class GtfsStopTime {
  // ...
  private int pickupType;
  private int dropOffType;
  private int continuousPickup;
  private int continuousDropOff;
  private int timepoint;
  private int bitField0_;
}

we have now:

class GtfsStopTime {
  // ...
  private byte pickupType;
  private byte dropOffType;
  private byte continuousPickup;
  private byte continuousDropOff;
  private byte timepoint;
  private short bitField0_;
}

which is 5 * (4 - 1) + 2 = 17 bytes smaller. Java aligns classes by 8 bytes, so we actually save 16 bytes per line.

Total save: 0.5 GiB for 30 M lines in stop_times.txt

@CLAassistant
Copy link

CLAassistant commented Oct 14, 2022

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@isabelle-dr
Copy link
Contributor

👋 Welcome back!

Copy link
Collaborator

@asvechnikov2 asvechnikov2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests!

@nackko
Copy link
Contributor

nackko commented Oct 21, 2022

Nice

GTFS enums usually fit into one byte, so we don't need an int. GTFS
tables usually less than 16 fields, so we need a byte or a short int for
storing a bitmask of assigned fields.

Smaller fields allow us to save memory for the largest table -
stop_times.txt that may have 30 M lines.

Instead of:

class GtfsStopTime {
  // ...
  private int pickupType;
  private int dropOffType;
  private int continuousPickup;
  private int continuousDropOff;
  private int timepoint;
  private int bitField0_;
}

we have now:

class GtfsStopTime {
  // ...
  private byte pickupType;
  private byte dropOffType;
  private byte continuousPickup;
  private byte continuousDropOff;
  private byte timepoint;
  private short bitField0_;
}

which is 5 * (4 - 1) + 2 = 17 bytes smaller. Java aligns classes
by 8 bytes, so we actually save 16 bytes per line.

Total save: 0.5 GiB for 30 M lines in stop_times.txt
@aababilov
Copy link
Collaborator Author

Nice

Thanks!

@aababilov
Copy link
Collaborator Author

wave Welcome back!

I missed GTFS Validator :)

Copy link
Collaborator

@asvechnikov2 asvechnikov2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@aababilov aababilov merged commit bd68666 into MobilityData:master Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants