Inspiration
One of the main things that inspired me was the news that I've been hearing about oil prices. It's been very volatile recently with the tensions in the Middle East, and we wanted to come up with something that analyzes exactly what's going on with oil shipments. A lot of times people analyze price data and use these complex formulas or regression models in order to come up with trading strategies. However, the things that make these prices move IS people and that's what we aim to analyze. We also hope to offer valuable insights to government agencies about the impact of port congestion and shipping on the climate.
What it does
It analyzes live shipping data to track oil tankers traveling into ports in the U.S. such as, Long Beach, CA, Houston, and Corpus Christi, Tx along with NY displaying the level of congestion along with the number of tankers that are there at a certain date and time. We also display current day trades that our algorithm is making along with a prediction for our potential returns for the next day. Additionally, we display a carbon emissions page that compares congestion at these different ports with the carbon emissions in those areas. Using Gemini, the website also intelligently provides market research for the current day and further information regarding ship congestion in key ports.
How we built it
Frontend: We primarily used Next.js/react along with tailwind css for designing our UI. We have 4 different sections: a login page, a page that looks at our historical backtesting along with current live trading and a page that overlays carbon emissions using NOAA's API data with tanker congestion data.
Backend: We primarily used Python for our backend, which focused on analyzing and testing historical data from 1-Jan 2024- 1-April 2024 from AIS which is a vessel tracking API. Specifically, our backend used a Xgboost regression model that analyzes multiple factors, such as the number of tankers that existed at each port along with the delay that a tanker may reach port (used math such as headings in order to calculate this). We also analyzed throughput (arrivals-departures) and if there was overlap (multiple ports like Houston and Corpus Christi, Texas). We then overlaid this with live polymarket data at the time (we focused on the prompt Venezuelan crude production hits 1.2 million by end of March since that depends a lot on shipping patterns along with congestion) creating a mean score (threshold value>0.3) that determined whether we traded or not. We saw generally positive results through our backtest.
We then used live trading by importing Polymarket data (we used the prompt "Will Crude Oil (CL) hit __ by end of March?"), and tested the strategy out for this past month. We saw fairly neutral results. The main reason is because oil prices have been fairly volatile, but it hasn't directly affected shipping congestion as much.
Database: We used MongoDB for storing a database of our backtested data.
Authentication: We used Oauth for our authentication.
Challenges we ran into
One of the biggest challenges was not only getting the data for the project but also issues with backtesting with Polymarket through the oil market. A major challenge was the lack of direct oil price markets on Polymarket prior to early 2025. While we found proxy markets, they weren't perfectly accurate because U.S. port shipments don't strictly correlate with Venezuelan production, especially since Venezuela utilizes its own internal reserves. Additionally, even though getting past data from AIS(Marine Cadastre which is open source) was easy, it was more difficult to get modern-day data without having to pay money. However, we did find an alternative, AISStreams API, which pulls live data on shipping congestion in order to test our model on live prompts.
Accomplishments that we're proud of
I'm really proud of the fact that we used alternative data to build something that not only could make a profit (generate alpha) from trading but can also help the community and environment. This type of data could be very useful to policymakers in order to prevent overcrowding or delays at certain ports. It's also useful, as the dashboard we made also shows the carbon impact at these locations.
What we learned
We learned a lot because this was our first time deploying a trading strategy, so learning how to use and analyze market data was something we really enjoyed. The big lesson that we earned is NOT to try and download 30GB worth of shipping data. That crashed my macbook. We also learned that up-to-date shipping and market data is incredibly difficult to acquire for free or at a low cost.
What's next for Congestion Desk
Next, we want to use more data in backtesting and see if there's more we can find. Specifically, I want to backtest at times where there were major oil/global events that affected oil prices along with times where oil prices were fairly stable. Currently, we are focusing on major ports in the US but we wish to integrate global maritime hubs such as Rotterdam or Singapore to create a more holistic perspective on the global oil supply.
Log in or sign up for Devpost to join the conversation.