The code is available for use on Github.

Inspiration

Grading homework is actually more annoying that doing homework, according to sources. Grading takes a long time and graders don't learn anything doing it. This project helps optimize the graders' valuable time by letting AI mark the correct responses.

What it does

Provide the question and solution, and Auto Gradescope will pilot a browser window through all the submissions for that question. Correct responses will be marked correct, and potentially incorrect responses will be left for human review. The app will exit automatically upon cycling through all the questions.

How we built it

Baml for interfacing with GPT-4o, Selenium for browser control, Groq for fast LLM inference.

Challenges we ran into

  1. Reliably parsing handwritten math equations into text turned out to be considerably more difficult than parsing normal text.
  2. It turns out that Groq does not like images a lot, and the preview models were not that accurate.
  3. It turns out that most off-the-shelf OCR solutions do not work for reading math equations/bad handwriting.

Accomplishments that we're proud of

  1. Accuracy is surprisingly quite high given the raggedness of some texts

What we learned

  1. Even though I didn't end up implementing any lower-level vision methods such as tesseract or cv2, it was fun to learn about the underlying Vision Transformer architecture.
  2. Web stuff is horrible to deal with

What's next for Auto-Gradescope

  1. Speed up grading time by bundling asynchronous requests to GPT-4o so that grading results can collected much faster.
  2. Streamline user experience and package the code into a python project

Built With

Share this project:

Updates