Inspiration
We were inspired to create visualizations of women's representation in film, as measured by the Bechdel Test when we found a relatively extensive dataset on the subject while browsing kaggle. We felt that this subject was best reflected by the Social Justice track because as times change, and women take an integral role in the film industry, they should also be represented as such. While data on films isn’t the same as data on the material conditions of people, media can give you a deeper look into how a society views and treats certain people. Additionally, the insights found in these visualizations is especially upsetting, as the Bechdel Test is not a measure of great female representation and can be archived in a few seconds of dialogue in a movie.
What it does
We have 3 visualizations: One Step Forward, Two Steps Back; How Much Hollywood Values Women; and How Much Society Values Women.
One Step Forward, Two Steps Back plots the year the movies were released against the percentage of that year's movies that passed the Bechdel Test. The percentage of movies per year that pass the Bechdel Test goes up and down, with a slight positive linear trend beginning in the mid-1990s. For reference, 71% of movies in 2020 passed the Bechdel Test, which could be an indication that the positive linear trend continues, but we lack the data from 2014 - 2019 to prove that analysis correct.
How Much Hollywood Values Women plots the average budget of a movie each year in USD (adjusted for inflation to the value of a 2020 dollar) against the percent of movies that passed the Bechdel Test that year. There isn't a clear relationship between the average budget for a film and the percent of movies that will pass the Bechdel test. One thing that we noticed was that as the decades progressed, there was a general increase in budgets for films, something that could also be impacted by inflation, a factor that we then included in our calculations. When you hover over each point on the graph, a pop-up will give you the decade the film was released in, the budget (in million USD), and the percent of movies that year that passed the Bechdel test. Additionally, each point is color-coded to represent which decade it was released in.
How Much Society Values Women plots the average domestic gross USD revenue of movies in a given year (adjusted for inflation to the value of a 2020 dollar) against the percent of movies that year that passed the Bechdel Test. There is a clear left skew on this graph, which is due to the fact that 57/1700 film's budgets were missing from the dataset, causing the calculations to lower. However. there is also a trend that shows that movies with higher gross revenue are less likely to pass the Bechdel test, which shows how society values women.
How we built it
We used Python and Pandas to clean the data. We created a count of the number of movies in each year, and a separate count for the and a count of the number of movies that passed the test. Additionally, we found the average budget, average gross domestic profit, and percent of the movies that passed the Bechdel test each year. We used plotly, a flask-based data visualization web framework to display our graphs in their full interactive capability.
Additionally, we calculated inflation for the average yearly budget and revenue using this formula and annual average CPIs. This allowed us to more easily analyze the data as everything was made equivalent to 2020 USD.
As you can see here, we also attempted to utilize Machine Learning to predict the budget and revenue of 2020 films based on whether or not they passed the Bechdel Test, but we were unable to get a working script under the vizathon time constraints.
Challenges we ran into
When cleaning the data, we were both challenged because neither of us primarily code in python, which caused a bit of getting used to syntax differences. Then when working with plotly, we were challenged to learn a whole new framework and were especially challenged when attempting to alter the styling. Finally, we were challenged when attempting to create the Machine Learning model using MATLAB.
Accomplishments that we're proud of
We are proud of creating 3 unique data visualizations that are easy to read and interactive under the time constraints we were given. Neither of us had used plotly before, so learning the technology provided some challenges, especially when attempting to make any styling changes. We are proud that we were able to get the revenue and budget adjusted for inflation, as that made our analysis more accurate. Additionally, we are also proud we made progress with an ML model, with our limited experience.
What we learned
We learned how to use plotly, got better at cleaning datasets using python, learned more about ML models in MATLAB, and improved our analytical skills when making observations on our visualizations.
What's next for The Bechdel Test
Our dataset was a bit incomplete, missing the years 2014 on. We were able to scrape a portion of the 2020 film data. Due to the time constraints of the vizathon, we were unable to scrape all the data between 2014 and 2020, which would tell us whether or not there was a steady linear increase in the second half of the late 2010s. In the future, we would like to scrape this data to complete the dataset. Additionally, our data set was missing the national gross profit for 57/1700 films, which gave one of our plots a lower skew. In the future, we would like to find this data to further complete the dataset.
Finally, we would also like to improve the aesthetics of the web app. We were able to make slight changes to the aesthetics, but it looks very similar to the default plotly starter, so we would like to make changes to be a more unique look. We would also like to get the ML model working to see if the data follows enough of a pattern for any predictions to be made.
Log in or sign up for Devpost to join the conversation.