Inspiration

Humanity's biggest curiosity: are we alone in the universe? Many planets outside our solar system, known as exoplanets, have vastly different conditions from our home on Earth. Determining if exoplanets are habitable is a time-consuming and expensive process, with over 6,000 exoplanets having been discovered to date. Motivated by this challenge and the natural curiosity of possible extraterrestrial life, our team sought to create a potential habitability index to assess how likely exoplanets are to be habitable. The end goal was to create a tool that can help pinpoint which exoplanets to prioritize research for, allowing for more optimized use of resources and time. Our team adopted the name “Goldilocks” in honor of the “Goldilocks Zone,” a region not too hot and not too cold, allowing for life to exist.

What it does

Goldilocks is a machine learning classifier for exoplanet habitability. By utilizing the data from the NASA Exoplanet Archive and Habitable World Catalog, we were able to use the XGBoost library to train our model to create an index for how habitable exoplanets are based on different planetary parameters such as orbital period, radius, mass, equilibrium temperature, host star temperature, and system distance. After training the model, Goldilocks can take in different planetary and stellar features as inputs and predict if possible exoplanets are potentially habitable.

How we built it

Our project began with data collection as we gathered two data sets: the provided NASA Exoplanet Archive (containing over 6000 confirmed unique exoplanets), and an external Habitable World Catalog (containing 71 potentially habitable planets). After cleaning and merging these datasets, we got rid of duplicate planets, dropped unnecessary parameters, and created a new column label of “pl_hab” to label if planets were considered habitable or not. Next, we then split the data into two sets, 80% for training and 20% testing to train an XGBoost classifier while also experimenting with Random Forest. Initially, the model had low confidence, so we further reduced the number of parameters to improve its accuracy. Once we had a functional classifier, we deployed it using Streamlit, creating a user-friendly website that allows users to interact with our model and get their own results.

Challenges we ran into

At first, we found that our model was inaccurate. It would give us low confidence of potential habitability when we would input data for habitable exoplanets in our dataset. To resolve this, we reduced the number of parameters, which further raised our accuracy/confidence levels.

Another challenge was that our website would initially take 3+ minutes to load, which was impractical. To resolve this, we cut out a graphic of our decision tree because our peers said that it provided little value to users and we saw that it took the longest to display.

Accomplishments that we're proud of

This was our team’s first ever hackathon! Creating an app that was functional, unique, and has a practical use-case was definitely an accomplishment since we were all new to the data analysis process and creating machine-learning models. We are very proud of our work and are excited to continue exploring this field!

What we learned

The greatest thing we learned was the data-analysis process. Through this datathon, we learned how to define a problem, identify relevant external datasets, clean and merge the datasets, train a machine learning model, and then package everything into a user-friendly website. Additionally, we also learned different methods of troubleshooting our model to maximize its accuracy and practicality.

What's next for Goldilocks: Habitability Classifier

We would be interested in integrating additional datasets that include other exoplanets to expand the classifier’s scope. In addition, we want to refine our feature set by including other planetary parameters that could increase the accuracy of the habitability index, such as orbital eccentricity, stellar activity, and atmospheric data when available. On the modeling end, we can incorporate/experiment with other machine learning models such as neural networks. Finally, we want to expand the Streamlit interface into a full-featured application that allows users to filter, search, and visualize planets dynamically. Ultimately, our long-term vision is for Goldilocks to support astronomers by highlighting the most promising exoplanets for further observation.

Built With

Share this project:

Updates