Duplicode AI

Inspiration

Before we were hackers, we were competitive programmers. Driven by interesting challenges, it quickly became apparent that the ecosystem of coding tasks were very much copy-and-paste. Company online assessments mirrored each other. Puzzles from top high-school and collegiate contests could be found elsewhere, verbatim. In a domain constrained by a somewhat narrow set of topics, brilliant minds have similar models of reasoning, and it's easy to reinvent ideas.

What it does

Duplicode seeks to ensure ingenuity by using customized OpenAI models coupled with domain-specific knowledge that we have as competitive programmers. It searches massive problem banks for similarity between programming tasks. If it finds similarity, it alerts the user, allowing ideators to focus less on plagiarism and more on creating tasks that show others the elegance of programming.

How we built it

We broke down the problem into three parts. First, we need to find a way to access a massive dataset of programming tasks. For that, we turned to Codeforces, the most popular competitive programming website on the internet. It featured well-known tasks in its Educational Rounds, while also allowing nuance by sprinkling in more complex ideas. We used Codeforces API to grab problem names, before throwing the result into a customized OpenAI model. To create the model, we trained the AI on the problem-set, and used domain-specific knowledge we had as competitors to engineer interesting prompts.

Challenges we ran into

The Codeforces API only allowed us to grab problem names, but not the problems themselves. We started by trying a web scraping approach, but different questions were formatted in different ways, and we also had to deal with issues like deciphering Latex. To overcome this, we used a packet analysis tool to find the backend GET request used to grab problem statements and solutions (weirdly public), and used the python requests library to emulate it. When creating the AI, a problem we ran into was the vast amount of optimizations available. We already had several ideas, from how to partition data to how to engineer prompts, and going to the OpenAI demo only led to more. Constrained by time, sleep, and bugs, it wasn't feasible to implement it all -- but we had fun trying!

Accomplishments that we're proud of

Overall, we were able to scrape over 8000 problems in under an hour by circumventing the API. We were also able to use OpenAI for the first time (for all three of us), and as we didn't have much AI knowledge collectively, we're very proud of how seamless it felt. Also, operating on different (lack of?) sleep schedules yet still being collaborative.

What we learned

We want to win, and we worked hard on our project, but perhaps the most important lesson we learned was that Hackathons aren't only about hacking. We loved attending the OpenAI demo. We found the opening ceremony insightful and hilarious. We enjoyed walking the corridors at night, stumbling into strangers that become friends. We loved the snack raids, the failure to do Yoga, the long walks around campus, and the people we met (including each other!). It was so much fun.

What's next for Duplicode AI

From an application standpoint, the most obvious implication of Duplicode applies to industry. Many companies writing interview questions would largely benefit from having unique yet interesting questions. Especially firms that tend to have more difficult assessments would benefit from our model, as our dataset also includes more nuanced ideas. We could also improve the functionality to allow models to solve the problems (already somewhat embedded), or even write them. Other potential users include professors looking to write CS exams, organizations wanting to host programming contests, and ourselves!