Skip to content

Static spec doesn't differentiate between primary and foreign IDs #266

@botanize

Description

@botanize

No where in the spec is it clear what the rules are for ID fields. The only "documentation" is in the validators.

For example, it seems reasonable that trip_id would only need to be unique within the combination of service and route IDs in the trips.txt file (in SQL terms, PRIMARY KEY (service_id, route_id, trip_id)). However, validators tell me that trip_id is required to be unique within trips.txt regardless of the combination of other ID fields (PRIMARY KEY (trip_id)), but the other ID fields, route_id and service_id, are not expected to be unique.

Relying on implied rules results in further confusion. For example, the service_id ID is the only ID in calendar_dates.txt, but is not expected to be unique (FOREIGN KEY (service_id)), which you might expect if you assumed that an ID needs to be unique if it's the only ID, or if the ID name is a close match to the file name or file purpose.

Maybe adapting language from relational databases, e.g. primary ID and foreign ID, would clarify the role of each ID in each file. If there are any examples where there is a multi-column primary key (multiple ID fields used to uniquely identify a row) and that constraint is expected by validators, it should be noted in the file information.

For example, the spec for stop_times.txt could look like this:

### stop_times.txt

File: **Required**

PRIMARY KEY(trip_id, stop_sequence)

|  Field Name | Type | Required | Description |
|  ------ | ------ | ------ | ------ |
|  `trip_id` | foreign ID referencing `trips.trip_id` | **Required** | Identifies a trip.  |

Metadata

Metadata

Assignees

No one assigned

    Labels

    GTFS ScheduleIssues and Pull Requests that focus on GTFS Schedule

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions