SportsVision

Performance Summary that displays score, and elbow angle of the shooting key frame
System analyzing the user's uploaded video
Key Frame Carousel that shows all the key steps in a proper basketball shot and also has a live AI conversational coach to guide user

Inspiration

Many of us are athletes and understand the importance of good technique and coaching when it comes to elevating your game. We were excited to see how we could use the latest multimodal LLMs and computer vision to provide specific, detailed feedback for athletes of all levels.

What it does

SportsVision allows the player to upload a video of their sports motion, and then then we analyze the technique and provide a performance rating, key statistics, a summary of what the player did well and how they can improve, a frame by frame breakdown on important parts of the motion, and an conversational AI coach for constructive feedback. When a user uploads a video, our system analyzes it and pulls a model video from our database that it uses to compare the user against. For example, if the user uploads a video of them shooting a basketball, we would use a video of a professional basketball player as our model video.

How we built it

We analyzed each individual frame in the uploaded video (calculating stats such as elbow angle) using OpenPose and OpenCV, and we then merged these frames together to create a video with an informative overlay. We stored the video and key frames in S3 and then pulled the relevant model video from our database of professional player videos. We then used GPT4-V to analyze the overlaid video, compare it to the model video, and extract key frames, a performance rating, and specific feedback. This feedback was then fed in as context to the Hume AI coach. We used Flask for the backend and HTML/CSS for the frontend.

Challenges we ran into

We initially had problems integrating Hume. The prompting for the GPT4-V was also difficult, and we spent a good amount of time precisely instructing the model so that it would effectively capture the right frames and provide good feedback. We also had challenges when parsing the output of the GPT-4V, which did not always return a proper json object. Lastly, integrating the frontend with the backend also posed a few issues, but we eventually figured these problems out.

Accomplishments that we're proud of

We are proud of building out this fully functional app in just a day and overcoming all the bugs and problems we faced. This was the first hackathon for some of us, so we are glad that we had a fun time building something functional and impactful.

What we learned

We learned a lot when it comes to effectively prompt engineering and using computer vision frameworks such as OpenPose. If you give GPT the right context, it can do amazing things.

What's next for SportsVision

We are going to add support for a wider range of sports and create an agent that is able to analyze longer clips (such as entire games or highlight), and then offer feedback accordingly. We also want to make our AI coach extract more relevant information for different sports such as shot accuracy for basketball or number of saves in soccer. That way we can create a more comprehensive dashboard for the user and collect more data for the coach to use. We also plan on having the agent adapt to the player's style and measure growth in their abilities.