Inspiration
We've all been there: as first year undergraduate students, it's a pain hopping between different websites, scouring university websites for finding research opportunities! Many commercial products such as Indeed and Zillow can efficiently connect people for mutually beneficial relationships (employer-employee, landlord-tenant, etc). That's why we built Panda: a tool to alleviate the research-finding anxiety that so many students have to suffer through.
What it does
Panda is a platform that scrapes the web for research opportunities at UofT such as ROPs, summer research programs, labs or just any professor looking for students to hire. Using semantic embeddings from AI, students can browse, filter or query different research opportunities from professors, labs or programs that suit them the most. AI-powered descriptions of research interests, contact information and previous research are easily accessible on a research opportunity's card, and when the user is ready, applying to a reseach opportunity is just one click away! Using your resume and Amazon Bedrock technology once again, Panda will provide you personalized tips on how to reach out to the professor you're interested in, significantly reducing the barriers new students meet when faced with the daunting task of cold-emailing professors for the first time.
How we built it
The main website is just an HTML page with some minimal Javascript glue and CSS animations. However, the magic is in the backend: we create our dataset with BeautifulSoup web-scraping and feed descriptions of each professor, lab or research program into Bedrock's Titan embeddings service and store all the embeddings in a file. Then, we use RAG technology (with basic cosine similarity) and Titan embeddings again to match search queries with research opportunities. Finally, Amazon Nova foundation models help generate AI descriptions of professors and their research, and DynamoDB stores logs to see user trends.
Challenges we ran into
Claude was not very receptive at our attempts to get it to use AWS Bedrock. But we eventually persevered and discovered that Amazon Nova is integrated better so we used that instead. Also, scrapping and embedding generation takes quite a long time and is an inefficient use of time under the hackathon's time constraints, so we decided to limit our prototype to just computer science/engineering for now, and generate the AI summaries of research papers on-the-fly using Nova (instead of precomputing them).
Accomplishments that we're proud of
With efficient development practices, we managed to finish ahead of when we expected. In general, we are pretty proud of accomplishing this much within the time constraint. We spent much time brainstorming ideas and are very passionate about this one and happy with the result.
What we learned
About AWS services like DynamoDB and Amazon Bedrock's set of core offerings such as Titan embeddings and in-house Nova foundation models, which have been surprisingly helpful at improving the business productivity of our software.
What's next for Panda
First, because we're sort of desperate for research experience ourselves and are fully confident in our product, we plan to use Panda ourselves to discover research experiences before the summer arrives. Then, we will expand our dataset to more than just the computer science and engineering departments and run frequent scraping sessions on EC2 to make sure our data and embeddings are up-to-date (if we can afford to do so!).
Built With
- beautiful-soup
- bedrock
- dynamodb
- fastapi
- python
- rag
Log in or sign up for Devpost to join the conversation.