I am currently working on a small project that uses the Python module Twython to extract tweets from a certain time frame relating to certain keywords/hashtags. I will be using the data from this to create a graph of Node -> Edge relationships between Twitter Users who Retweet, Quote, or Mention other Twitter Users (using those specific hashtags or keywords). In this case, the nodes will be Twitter Users and the edges will be their connection through Retweets, Mentions, or Quotes. I will assign a weighted value to the edges by counting the occurrence of each node-to-edge instance to be able to determine modularity within the graph (detecting communities between the connections). The Streaming Twitter API limits data to tweets happening in real-time or for the past 7 days and there are limits to the amount of past Tweets that can be extracted using their REST APIs. I have had to Fork and modify an existing GitHub Repo that allows you to perform a more exhaustive search of hashtags/user data over arbitrary date ranges to collect more relevant tweets.
So far, I have been able to create the nodes data set for use with Gephi (open-source visualization tool). I am currently working on creating the edge data set that will show the relationships between Twitter Users using the same hashtags or keywords in their tweets. I am essentially trying to create a Directed Network Graph that can display connections between any keywords/hashtags and Twitter Users over any arbitrary date range (overcoming the limitations of the Twitter API).
This project has taught me a lot about working with APIs, various Python Modules, basic REST-fielding-dissertation, Β some discrete math topics and more so I have been really happy with how it’s turning out. It has also led to me learning more about different aspects of software as it relates to the web. π
Pics and more detailed source code to come! (hopefully soon)
