Inspiration

My inspiration came from a frustration with the research community. As a developer, I’ve always admired the open-source world of Github and Huggingface, where code is freely shared. I assumed CS research would be the same. But again and again, I'd read an exciting paper just to discover there was no published code that I could build on or understand better. This "reproducibility crisis" means countless hours are wasted rebuilding existing work, slowing down innovation for everyone. I saw forums full of students and researchers hitting the same wall I was. I aimed to build Papers2Code to address this issue.

What it does

Papers2Code is a community-driven platform that identifies research papers that lack open-source code and facilitates the collaboration to implement them. It's a hub where volunteers can find papers that need implementing, form teams, organize their work, and share the resulting open-source code back with the world, making research more accessible, verifiable, and easier to build upon.

How we built it

For the hackathon, we built the core MVP (Minimum Viable Product) of the Papers2Code platform. This included:

  • An automated script to pull data from Arxiv and Github feeds, filtering for papers that lack code.
  • A searchable and filterable web interface to browse these "unimplemented" papers.
  • A user account system (using GitHub OAuth) for researchers and developers to sign up and organize their work.
  • A project system where a user can "claim" a paper, link a GitHub repository to track their progress, and track their progress of their implementation status.

Challenges we ran into

The biggest technical challenge was accurately identifying which papers truly lacked an official code implementation. Creating a script that would sift through Arxiv data, cross-reference with GitHub, and filter out paper aggregating github (like Awesome Lists of X) was difficult. Scoping this down to a functional system in just a few days was difficult, but we managed to build a core pipeline that successfully identifies and lists target papers, which is the essential first step for the platform to work.

Accomplishments that we're proud of

I’m proud of the way that users can directly create github repositories relevant to a paper from the website, saving time for them and also providing a template and guidelines that they can build upon from. This was a pretty interesting concept that I thought wouldn’t work at the start but it was still buildable.

What we learned

I was able to learn about how to connect to MongoDB and utilize various APIs such as GitHub and the arxiv api to incorporate proper scraping of data. I also learned how to better structure my projects to deal with the large codebase.

What's next for Papers2Code

This is just the beginning. I’m going to turn this hackathon project into a full-fledged, long-term platform by building a community, implementing as many papers as possible, and maximizing making research code open.

Built With

Share this project:

Updates