Skip to content

aehaynes/reddit_authorship

Repository files navigation

reddit_authorship

An analysis to detect authors with multiple accounts in comments from the /r/Bitcoin subreddit. See andrehaynes.me/portfolio for the full report

Project Report: http://andrehaynes.me/static/AA_Reddit.pdf

Project Database: https://www.dropbox.com/s/sn64yu46kr5z1m9/redditDB.sqlite?dl=0

Project Files:

dependencies.txt -- Project dependencies

keywords.txt -- List of keywords used to search for relevant threads
get_submissions.py -- Module for getting submission id's that match keywords in keywords.txt
submission_id.pkl -- Submission Id's for threads used in the analysis	
main_reddit_data.py -- Run this to scrape comments from the threads in submission_id.pkl. Stores comments in sqlite  database (see dropbox link)

setup_data_index.py -- Modules for preprocessing comments, setting features, setting dataframes and indices
setup_db.py -- Module for DB schema and function calls
stoplist.pkl -- stopwords used in the analysis
main_tunemodels.py -- Run this to tune to optimal number of topics for the topic models (this may be slow)
num_topics.pkl -- Results from main_tunemodels.py used in this analysis	(use this to avoid running main_tunemodels.py)
main_dataframes.py -- Run this to create dataframes for classification

setup_experiments.py -- Functions for constructing dataframes and model building. Not well documented
analysis.py -- functions for training and testing the classifiers, and generating sock rankings. Not documented

About

An analysis to detect authors with multiple accounts in comments from the /r/Bitcoin subreddit. See README for a link to the full report

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages