Kaggle Microsoft Malware identification challenge: http://www.kaggle.com/c/malware-classification
feature_extraction.py used to extract 16bit-ngrams for each file feature_reduction.py gives list of most common ngrams load_data.py:read_ngrams reads into single csv file my_model.py runs final analysis currently using 100 most frequent ngrams