TgraphSpot: Fast and Effective Anomaly Detection for Time-Evolving Graphs
Authors: Mirela T. Cazzolato1,2, Saranya Vijayakumar1, Xinyi Zheng1, Namyong Park1, Meng-Chieh Lee1, Pedro Fidalgo3,4, Bruno Lages3, Agma J. M. Traina2, Christos Faloutsos1.
Affiliations: 1 Carnegie Mellon University (CMU), 2 University of São Paulo (USP), 3 Mobileum, 4 ISCTE-IUL
Conference: IEEE International Conference on Big Data (Big Data), 2022 @ Osaka, Japan.
Please cite the paper as:
@inproceedings{cazzolato2022tgraphspot,
title={{TgraphSpot:} Fast and Effective Anomaly Detection for Time-Evolving Graphs},
author={Cazzolato, M.T. and Vijayakumar, S. and Zheng, X. and Park, N. and Lee, M-C. and Fidalgo, P. and Lages, B. and Traina, A.J.M. and Faloutsos, C..},
booktitle={2022 IEEE International Conference on Big Data (Big Data)},
year={2022},
organization={IEEE},
}
Code Updates:
May 10-12, 2023
-- Organizing modules with tabs
-- Adding a single module for data input
-- Making "MEASURE" and "TIMESTAMP" columns optional
-- Updating requirements.txt file
Check file requirements.txt
To create and use a virtual environment, type:
python -m venv tgraph_venv
source tgraph_venv/bin/activate
pip install -r requirements.txt
Run the app with the following command on your Terminal:
make
or
streamlit run app/tgraphspot.py --server.maxUploadSize 5000
- Parameter
[--server.maxUploadSize 5000]is optional, and it is used to increase the size limit of input files.
We provide a toy sample dataset on folder data/. Check file sample_raw_data.csv
Matrix cross-associations
The code for generating matrix cross-associations is originally from this Github repository.
The work was proposed in this paper:
Deepayan Chakrabarti, S. Papadimitriou, D. Modha, C. Faloutsos. Fully automatic cross-associations. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 2004. DOI:10.1145/1014052.1014064.
Step-by-step tutorial on how to use TgraphSpot to generate features and visualize the results
Inform the path of a file containing columns corresponding to source, destination, and measure (e.g., call duration). We provide a sample file in the repository as an example. Click and check the option "Use example file " to use it in the application." After loading the file, click on "Run t-graph" and wait until de task is done. The application saves the file with generated features in the folder "data/."
1.FeatureExtraction.mov
Load the file with the extracted features (from Step 1), and select pairs of features to visualize. The chart is automatically updated. Labels can also be loaded and visualized separately.
2.HexBin.mov
Load the extracted features and the file with phone calls. Then select a pair of features to visualize. The application allows the user to make a lasso selection of points of interest. The selected points are listed below the chart. From the selected nodes, the application generates the corresponding EgoNet and plots the adjacency matrix and the cross-associations found. Generating the cross-associations can take some time. The user can control the maximum size of the EgoNet to generate the corresponding visualization (see the parameter in the left panel). Finally, at the bottom of the page, the application shows a plot with parallel coordinates, allowing the user to visualize many features at once.
3.LassoSelectoinParallelCoordinates.mov
The interactive scatter matrix allows the user to visualize many scatter plots simultaneously, combining many features of interest. There are pre-set feature combinations as well, defined by experts to assist in finding abnormal behavior on logs of phone calls. As mentioned in Step 3, the user can also select desired points, and generate the EgoNet and the matrix visualizations.
4.ScatterMatrix.mov
In the deep dive module, the user can visualize the incoming and outgoing behavior of the nodes from the generated EgoNet over time. In the selected period, the user can further select a node and visualize the total duration of incoming and outgoing calls per hour.
5.DeepDive.mov
The negative-list can be used to remove numbers (or nodes) that usually receive or make many calls but should be ignored during the analysis. Examples of such cases are emergency and service numbers.