This repository contains the source code for our stream-aware serial pattern mining algorithm, namely ONCESpark, which corresponds to an academic paper, namely Mining the Frequency of Time-constrained Serial Episodes over Massive Data Sequences and Streams, accepted by Future Generation Computer Systems. The source code for the work will be released here once the paper is accepted.
ONCEPSpark.scala and ONCEStreaming.scala work under the Spark platform, and ONCEStreaming.scala uses Sparkstreaming to process streaming data. The code is tested compatible with Spark v1.6.0 and Hadoop v2.6.0.
a long sequence in the form of: [(s1,time1),(s2,time2),...]
episodes:(s1,s2,...), e.g., (2, 68, 65) or ('A','B','C').
time constraint: an interger number, it equals
The code will output the frequency of the specified sequence that satisfies the time constraint in the input data set.
For more information, please refer to the homepage for the corresponding author, Prof. Hui at http://lihuixidian.github.io.