This folder contains several files of sample data in various formats including the GraphML file for the air-routes graph. The air-routes graph is referenced throughout the book.
Two versions (sizes) of the air-routes data set files are provided. The file air-routes.graphml contains the full graph containing over 57,000 routes and over 3,700 airports. The file air-routes-small.graphml just has routes between 47 airports all located in the US.
All the examples in the book are based on air-routes.graphml and this is considered 1.0 version of the dataset. These are the versions used by examples in the book. This way, even as the data is updated (see "Latest data" below), the examples in the book can always be verified against these original versions.
As of TinkerPop 3.8.0, the Air Routes 1.0 is officially included in the sample datasets provided in the distribution. It is packaged along with TinkerGraph specifically and can be loaded alongside other packaged datasets.
gremlin> graph = TinkerFactory.createAirRoutes()
tinkergraph[vertices:3749 edges:57645]
gremlin> g = traversal().with(graph)
graphtraversalsource[tinkergraph[vertices:3749 edges:57645], standard]There is also a small Groovy script in the '/sample-code' folder called graph-stats.groovy that can be run from within the Gremlin console to produce some statistics about the graph similar to those found in the book and in the README-air-routes.txt file.
The 'aircraft.csv' file is intended to be used with the add-aircraft.groovy sample that is located in the '/sample-code' folder.
The edges.csv file is intended to be used with the GraphFromCSV.java sample that is located in the '/sample-code' folder.
The two files air-routes-latest.graphml and air-routes-small-latest.graphml contain additional routes and airports added since the original version was uploaded. They are provided in case anyone wants to experiment with some more recent data. Note that as of the 1.0 version of the dataset, no further changes are expected. As a result, air-routes.graphml and air-routes-latest.graphml are identical and both at 1.0 (the same applies to the "small" versions).
The two files air-routes-edition-1.graphml and air-routes-small-edition-1.graphml refer to the 0.77 version of the dataset which is a snapshot of the version used for all the first edition book examples.
This folder also contains two CSV files to go along with the GraphML files. They are called air-routes-latest-nodes.csv and air-routes-latest-edges.csv. The CSV files were produced using the open source GraphML2CSV tool. You may need to edit the first line (header) of each CSV file depending on your graph database and toolset.
It is also possible to turn the CSV files into batches of Gremlin addV and addE commands using the csv-gremlin tool. This provides yet another way that the air-routes data set can be loaded into a TinkerPop compatible graph store.
To allow for some interesting comparisons in different ways of modeling the data, this folder also contains sub-folders containing RDF and SQL versions of the air-routes dataset.
The RDF data was created using the Ruby script that can be found in the sample-data/RDF folder. You may also be interested in the AWS CSV2RDF tool which can turn CSV files into NQuad format RDF files.
Please check back periodically to find any additional updates.