Inspiration

As we have become more and more reliant on digital data, we found the need to better analyze one of our most important social networks: our facebook chats. I've sent and received over a million messages during my time on Facebook, yet I have little quantitative assessments of who I've actually talked to, what I've talked about, and how my friendships have changed over the years. Those who we may think to be our closest friends may really be our biggest worst enemies, and those who we consider to be friends may consider us to be the most snekkity of sneks. Perhaps we've let go of certain friends over time without even realizing it. Regardless, we've built FAF to help determine which of your friends are forever, and which are just *fake as ... *

What it does

FAF uses Facebook's digital archive tool to scrape all of your user data from your account. Specifically, your messages. We start by parsing through the massive amounts of data and beautifying it into something clean and elegant for us to read. From there, we run a few of our intensive algorithms to crunch the numbers and figure out who has (and hasn't) been treating you right!

More specifically, we generate analytics such as the average response time of each participant, the average length of responses, most frequently used word, the frequency conversations, and intricate sentiment analysis technology. Using these statistics, we push the top 4 individuals in each category (Realest Friends, Biggest Snakes, Needs More Love, and Rekindle Your Friendship) into our elegant UI.

How we built it

We started off by _ literally _ waiting hours for our Facebook archives to download (turns out some of us have hundreds of megabytes of messages; wonder how many are real friends??). After that, we got to work on crunching that data and sifting through the millions of entries. We wrote all of our algorithms in Python, for ease, while our front end was built on Node.js (utilizing Flask). Here's a breakdown of each of our algorithms:

1) Time Gap Between Conversations We looked at the distribution of time differences between each message. From there, we found the optimal length to split the messages into distinct 'conversations.'

2) Average response time between conversation participants: This is a continuation off of the previous function. Using the length of each conversation that we calculated, we took the average of the time it took each participant to respond to a message in a conversation. Given a large discrepancy in the response time gap within conversations, you can tell who's more invested in the relationship.

3) Average length of responses: This was a bit more challenging than we expected, as some people like to type like this. We didn't want to count people's streams of consciousness as multiple messages when in reality they are one single response. So we kept track of the conversation to figure out exactly when a person's train of thought would start and end, and lump those messages into one clump. Once we refined the message history, we simply iterated through the entire conversation to calculate the average length of each response. Essentially, is your friend dropping you with the "k", or does he/she really value your friendship.

4) Sentiment Analysis This part of our project was a lot of fun. We used the MonkeyLearn NLP API to send in transcripts of entire conversations to gauge how positive (or negative) an individual was throughout the entirety of their chat history with any given 'friend' (and vice versa). While I have noticed that the API doesn't do the best job of catching for sarcasm and playful arguing, what does that really tell you about your friendship?

5) Most Frequently Used Word Is your conversation filled with love and laughter? Or mundane work-related blabber and small talk? With this functionality of FAF, you can know exactly what words you and your respective friend share in common the most. So if you see a lot of "LOLOLOLs" or "I love you's" or "the warriors blew a 3-1 lead," you know that you and your friend have a deep connection. However, if your conversation consists mainly of "psets", "solutions", and "boosted," then you know you're in a toxic relationship.

We were able to calculate each of the categories of friends by combining scores from each of the algorithms above. For example, a negative sentiment, coupled with a very poor response time and short replies can result in a pretty poorly rated friendship, while one filled with positivity and nonstop messaging may just result in your soul mate. That being said, it is important to take everything we've delivered with a grain of salt. Because maybe your "real friends" are sneks too.

Challenges we ran into

Initially, it was really difficult to actually parse through such a large data set. For two of the members of the team, we had to cut down our data sets to nearly half by getting rid of all group chats, because of the sheer volume of messages.

Additionally, there were a lot of conversations that were extremely short - that is, it's difficult to make accurately analyze a friendship if your message trail consists of 100 messages. The longer the history, the more accurate analysis of the relationship.

Along with that, it's very difficult to track sarcasm during conversations. For example, the sentiment score for my conversation with my best friend results in a negative sentiment with a decent probability. That being said, to an outsider (or computer), it may seem like we hate each other, but our jokes are all in good humor. Or maybe we do despise each other and we just don't know it yet.

Accomplishments that we're proud of

This is Urmi's first hackathon, and the growth we've seen in her throughout the course of the 24 hours was extremely incredible. Along with that, this has been all of our first time working with a data set of such volume. It was definitely a great experience being able to work with all this data that pertains to you, and we were really excited to see all of the insights we generate.

What we learned

The main thing we've learned from this experience is that we need to be much more efficient when we are working with large datasets. It becomes very tedious to parse through millions of messages several times, and we hope to find a better way of approaching this in the future.

What's next for FAF

In the future, we hope to implement more complex NLP to be able to track sarcasm within conversations and generate better results for FAF. We also want to expand our features to show how many messages were received on birthdays or other special occasions and conversation topics popular with particular friends, and incorporate visualizations of relationship networks and our activity on FB Messenger over time. Along with that, we hope to be able to work with larger data sets more efficiently, to be able to point out all the sneks and real friends in your life.

Built With

Share this project:

Updates