Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

readme.md

This is the guide for evaluating our CGO 2020 paper, Optimizing Ordered Graph Algorithms with GraphIt. The following guide consists of a step-by-step instruction to reproduce Figure 9 (generating C++ code for Single Source Shortest Paths with Delta Steppng using different schedules) and Table 4 (GraphIt's Perfromance on our 2-socket machine). The schedules we used here are very likely NOT the fastest schedules for your machine. Please only use the instructions here as examples for writing and compiling different schedules, and tune schedules to best fit your machine's features.

Setup GraphIt

This work is implemented as an ordered extension to GraphIt for supporting ordered parallelism as stated in the paper. You do NOT need any additional setup instruction other than setting up GraphIt.

Please first follow the Getting Started Guide to set up GraphIt. . This document will guide the user through compiling and testing the correctness of a series of GraphIt algorithms.

Examine the algorithm source files used in the paper

The programs used in the paper (sssp_delta_stepping.gt, pppsp_delta_stepping, astar.gt, k_core.gt, and set_cover.gt) are stored in the graphit/apps directory.

Reproducing SSSP with Delta Stepping with different schedules

This script demonstrates GraphIt's ability to generate code for lazy and eager schedules, and combine them with other schedules. Figure 9 in the paper shows the different C++ code generated by applying different schedules to Single Source Shortest Paths (SSSP) with Delta Stepping. We have built a script to generate the code for Delta Stepping with different schedules shown in the paper and a few even more sophisticated schedules. The script also makes sure the generated C++ code compiles successfully. The script simply compiles the GraphIt programs. We note that it takes some time to generate and compile all the different schedules and there are some warning messages.

  #graphit is the root directory of the repository
  cd  graphit/graphit_eval/priority_graph_cgo2020_eval
  cd sssp_delta_stepping_example
  python compile_sssp_delta_stepping_fig9.py

The program should output the information on each schedule, print the generated C++ file to stdout, and save the generated file in .cpp files in the directory. The schedules we used are stored in sssp_delta_stepping_example/schedules. We also added a eager schedule with bucket fusion schedule that was not included in the paper due to space constraints. The code in Figure 9 (c) is wrapped inside the runtime library OrderedProcessingOperator in graphit/src/runtime_lib/infra_gapbs/ordered_processing.h.

Running GraphIt (with the ordered extension) generated programs with a single thread on a small graph

Generating the C++ files from GraphIt programs

Here we provide a script that will compile (displaying the commands used) the GraphIt programs (with .gt extensions) in the graphit/apps directory, SSSP with delta stepping, PPSP, AStar Search, KCore, and SetCover into C++ programs using the schedules shown in the paper.

#start from graphit root directory
cd  graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval

#compile the graphit files (.gt) into C++ files with schedules used in the paper (from  graphit/test/input_with_schedules directory)
make graphit_files

#compile the generated C++ files with serial implementation (not parallel)
make cpps

The generated cpp files are in the perf_eval/cpps directory, and the compiled binaries are in the perf_eval/bin directory.

Running GraphIt Programs on Small Graphs

The following commands run the serial version of GraphIt compiled programs (from the earlier section) on small graphs, testGraph and Monaco. Both the unweighted and weighted versions are in graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval/graphs directory.

#start from graphit root directory
cd  graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval

#run and benchmark the performance for testGraph on SSSP with delta stepping, ppsp, wBFS, kcore, and setcover
python table4_priority_graph.py

#run astar on monaco (needs special data on coordinates)
python table4_priority_graph.py -a astar -g monaco

The script table4_priority_graph.py first runs the benchmarks and then saves the outputs to the graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval/benchmark_logs directory. The benchmark script choose the binary based on the graph. Then a separate script parses the outputs to generate the final table of performance in the following form. The application and graph information are shown in the leftmost column, and the running times are shown in the second column in seconds.

#performance results from running ds, ppsp, wBFS, kcore, and setcover on testGraph
 -------------------
frameworks: graphit
 -------------------
ds
testGraph, 2e-06
ppsp
testGraph, 1e-06
wBFS
testGraph, 2e-06
kcore
testGraph, 2.4e-05
setcover
testGraph, 8e-05
Done parsing the run outputs

#performance results from running astar on monaco graph
 -------------------
frameworks: graphit
 -------------------
astar
monaco, 7.6e-05
Done parsing the run output

These runs should complete very quickly. This test on the small graphs demonstrate that the compiler can compile and run all the GraphIt extensions in graphit programs. We have verified the correctness of the compiled programs in the test suite of the compiler run in Getting Started Guide.

To see more options that run specific graphs and algorithms, simply type the following command.

#To see options for running a specific graph or application / algorithm
python table4_priority_graph.py -h

Reproducing GraphIt Extension's Performance on 2-socekt Intel Xeon E5-2695 v3 CPUs with 30 MB LLC, TPH enabled. (OPTIONAL)

The above sections showed that the compiler can compile GraphIt programs into efficient C++ implementations for all six algorithms and run them. Below are the optional instructions for setting up the large graphs and replicating the performance numbers in table 4. Your system will need to support "numactl", "taskset", "cilk", and "openmp" to perform the following the evaluations. We use both OpenMP and CILK based parallelism. OpenMP is required for ds, ppsp, wBFS, and astar and CILK performs better for setcover. We provide step-by-step graph set up, compilation, and running instructions below.

Setting up and running algorithms on larger graphs

We have provided a few larger graphs for testing in the Dropbox folder additional_graphit_graphs. There are five folders for road-usad_origweights (USA Road graphs with original weights), socLive_logn (Live Journal graph with log num_vertices distributed weights), socLive_rand1000 (Live Journal graph with weights distributed from 1 to 1000), twitter_logn (Twitter graph with log num_vertices weight distribution), and twitter_rand1000 (Twitter graph with weights distributed between 1 to 1000). Running the experiments on Twitter graph can potentially take a significant amount of time if your machine does not have at least 100 GB memory and multiple cores. We note that running these large graphs with serial C++ implementations will be very slow. Please try to use the parallel implementations if possible (shown in the code snipet below).

Below we first show the instructions for running the socLive_rand1000 and socLive_logn graph.

#copy the files to the data directories.
#The directory names have to be socLive_logn, socLive_rand1000, twitter_logn, twitter_rand1000, road-usad_origweights as we used hard-coded names in the scripts.

cp -r socLive_logn graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval/graphs
cp -r socLive_rand1000 graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval/graphs

#start from graphit root directory
cd  graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval/

#If you have not already done so in an earlier step, please first compile the graphit files to cpp
make graphit_files

#compile cpp files with gcc parallelism (enabled with GCC_PAR=1)
make GCC_PAR=1 cpps

#run and benchmark the performance for delta stepping (ds), ppsp, kcore, and setcover on socLive_rand1000 as in the paper
python table4_priority_graph.py -a ds ppsp kcore setcover -g socLive_rand1000

#run and benchmark the performance for weighted BFS (wBFS) on socLive_logn as in the paper
python table4_priority_graph.py -a wBFS -g socLive_logn

On our machine with two sockets, we get the following results running the script, which matches the data in Table 4. Note that we only use the log num_vertices (logn) weight distribution graphs for wBFS because it is more suitable for the algorithm (as discussed in the paper).

 -------------------
frameworks: graphit
 -------------------
ds
socLive_rand1000, 0.0900572
ppsp
socLive_rand1000, 0.04255
kcore
socLive_rand1000, 0.634528
setcover
socLive_rand1000, 0.495883

 -------------------
frameworks: graphit
 -------------------
wBFS
socLive_logn, 0.0699865

The road graph and Twitter graph need to be named as twitter_logn, twitter_rand1000, road-usad_origweights. We have included some instructions below.

#copy the files to the data directories.
#The directory names have to be socLive_rand1000, socLive_logn, road-usad_origweights, twitter_rand1000, twitter_logn as we used hard-coded names in the scripts.

# copy over the directories, can also be soft links
cp -r twitter_logn graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval/graphs
cp -r twitter_rand1000 graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval/graphs
cp -r road-usad_origweights graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval/graphs

#start from graphit root directory
cd  graphit/graphit_eval/priority_graph_cgo2020_eval/perf_eval/

#If you have not already generated the cpp files, please first do so
make graphit_files

#GCC_PAR=1 enables builds for parallel programs
make GCC_PAR=1 cpps

#run and benchmark the performance on both USARoad graphs
python table4_priority_graph.py -g road-usad_origweights -a  ds ppsp kcore setcover astar

#run and benchmark the performance on twitter_rand1000 graphs
python table4_priority_graph.py -g twitter_rand1000 -a ds ppsp wBFS kcore setcover

#run and benchmark the wBFS performance on twitter_logn graphs
python table4_priority_graph.py -g twitter_logn -a wBFS

Some of the output is shown here, which matches Table 4.

#road network with original weights for ds, ppsp, kcore, setcover, and astar
 -------------------
frameworks: graphit
 -------------------
ds
road-usad_origweights, 0.2121662
ppsp
road-usad_origweights, 0.0424522
kcore
road-usad_origweights, 0.305012
setcover
road-usad_origweights, 0.868475
astar
road-usad_origweights, 0.0717215
Done parsing the run outputs

#twitter random (1,1000) weight distribution results for ds, ppsp, kcore, and setcover
 -------------------
frameworks: graphit
 -------------------
ds
twitter_rand1000, 2.978356
ppsp
twitter_rand1000, 2.5108738
kcore
twitter_rand1000, 8.55509
setcover
twitter_rand1000, 5.31656

#twitter log num_vertices weight distribution results for wBFS (we only use this graph for wBFS in the paper)
 -------------------
frameworks: graphit
 -------------------
wBFS
twitter_logn, 1.766514

Run the comparison frameworks (OPTIONAL)

For GAPBS

You can run the SSSP algorithm from GAPBS to verify the performance comparisons with Delta Stepping and wBFS very easily. wBFS is just Delta Stepping with delta fixed to one. This is the easiest as it uses the same Graph format as GraphIt (.sg and .wsg are all GAPBS formats). Please clone and GAPBS from here (https://github.com/sbeamer/gapbs)

make sssp

numactl -i all ./sssp -f socLive_rand1000_gapbs.wsg -d 100

The deltas we tuned for GAPBS Delta Stepping (SSSP) on our machine are

"socLive_rand1000" : 100 "road-usad_origweights" : 100000 "twitter_rand1000" : 8

For wBFS, it is sssp with delta set to 1 for all graphs

socLive as an example

numactl -i all ./sssp -f socLive_rand1000_gapbs.wsg -d 1

Note, the starting points are a bit different, so treat the results with a grain of salt. However, the starting points should not have a huge impact on the results if you run 6 to 7 of them. You can find the starting points we used here (https://github.com/GraphIt-DSL/graphit/blob/master/graphit_eval/ priority_graph_cgo2020_eval/perf_eval/benchmark.py#L130)

For Julienne

You can replicate the results for SSSP, wBFS, KCore, and SetCover.

First clone and compile julienne repository (same as Ligra) at https://github.com/jshun/ligra. Go into the bucketing directory (https://github.com/jshun/ligra/tree/master/apps/bucketing) to compile Julienne programs with the Makefile.

For SSSP, wBFS

We have added graphs for Julienne with _ligra.wadj inside the graph folders here https://www.dropbox.com/work/CGO_Artifact_Eval/additional_graphit_graphs

socLive as an example

numactl -i all ./DeltaStepping -delta 100 socLive_rand1000_ligra.wadj

Deltas we used for our machine

"socLive_rand1000" : 100 "road-usad_origweights" : 10000 "twitter_rand1000" : 4

Again, delta is 1 for wBFS.

KCore, and SetCover

Use the .sadj (symmetrized graphs) already in the repository.

socLive as an example

numactl -i all ./KCore socLive_rand1000_ligra.sadj

numactl -i all ./SetCover ~/graphs/socLive_rand1000/socLive_rand1000_ligra.sadj

(NOTE: only the FIRST run's running time is valid for SetCover as the graph is modified after the first run)

For Galois

You can replicate the results for SSSP for Galois. https://github.com/IntelligentSoftwareSystems/Galois Follow the instructions to build the software. We have added the galois graphs in the graph folders as before (https://www.dropbox.com/work/CGO_Artifact_Eval/additional_graphit_graphs)

The compiled binary is at

build/lonestar/sssp

socLive as an example

numactl -i all ./sssp socLive_rand1000_galois.gr -t=48 -algo=deltaStep -startNode=14 -delta=2

The delta here is actually powers of 2, the deltas we used are

"socLive_rand1000": 2 "twitter_rand1000": 0 "road-usad_origweights": 14

Galois does not support wBFS since it only offers approximate priority ordering

This concludes our artifact evaluation guide.