Link to streamspot-stream.gz: https://drive.google.com/file/d/1EUHiHYyitSrMuVQGvd8ns5ZIvJT7q7bb/view?usp=sharing
The original dataset was obtained from StreamSpot. A copy is
stored in this repository as streamspot-raw.gz. Refer to the original repository for details.
Each line (system call) of the original dataset corresponds to
shingles that are added and removed from a graph. These shingles are obtained by modifying the
StreamSpot shingle construction code
with the chunk-length parameter set to 50.
Each shingle is mapped to a shingle ID for convenient representation in the SVM-light format, though this is not
a requirement for xStream. The raw shingles are present in shingles.gz, with the shingle ID implied by the line
number (starting from zero).
The evolving stream as incoming and outgoing shingles for each system call is contained in streamspot-stream.gz.
The file is in a modified SVM-light format; each line is formatted as follows:
graph_id incoming_shingle_id:1 incoming_shingle_id:1 ... outgoing_shingle_id:-1 outgoing_shingle_id:-1 ...
A row-stream version of the dataset with complete graphs (shingle vectors) is contained in streamspot-static.gz
for convenience (to test with iForest or other static methods). The graph ID is implied by the line number
(starting from zero). The file is in the SVM-light format, with each line formatted as:
anomaly_label shingle_id:count shingle_id:count ...
Graph ID's correspond to scenarios as follows:
- YouTube (graph ID's 0 - 99)
- GMail (graph ID's 100 - 199)
- VGame (graph ID's 200 - 299)
- Flash drive-by-download attack (anomaly) (graph ID's 300 - 399)
- Download (graph ID's 400 - 499)
- CNN (graph ID's 500 - 599)
- Java attack (anomaly) (graph ID's 600 - 699)
The included test_iforest_streamspot.py script runs iForest on the StreamSpot datasets, testing detection of either
the Flash or Java attack against one or all benign scenarios. See the above section for the mapping from
scenario numbers to benign scenarios.
python test_iforest_streamspot.py <path to streamspot-static> <attack, "flash" or "java"> <scenario, index or "all">
Example runs:
python test_iforest_streamspot.py ./streamspot-static java all # Java vs. all
python test_iforest_streamspot.py ./streamspot-static flash 5 # Flash vs. Download