Inspiration

Instructors and teaching staff at large universities spend a disproportionate amount of time managing routine, repetitive course tasks — primarily:

  • Responding to the same student questions over and over
  • Developing and refining grading rubrics from scratch each term
  • Individually grading hundreds of near-identical submissions, writing the same feedback comment dozens of times ## What it does A hybrid AWS architecture that uses Computer Vision to group submissions with identical logic errors, and Amazon Bedrock to generate high-quality, rubric-aligned feedback for the entire group at once. > One TA review. Consistent feedback. Applied to 50 students in a single click. ## How we built it
┌─────────────────────────────────────────────────────────────────┐
│                        ClusterGrade AI                          │
│                                                                 │
│  [Scanned PDFs / Assignment Images]                             │
│           │                                                     │
│           ▼                                                     │
│  ┌─────────────────┐                                            │
│  │ Amazon Textract │  ← OCR: extracts handwritten text & code  │
│  └────────┬────────┘                                            │
│           │  raw text strings per submission                    │
│           ▼                                                     │
│  ┌──────────────────────────────┐                               │
│  │  Python + Scikit-Learn       │                               │
│  │  • TF-IDF / Sentence embeddings                             │
│  │  • K-Means clustering        │  ← groups identical errors   │
│  └──────────────┬───────────────┘                               │
│                 │  cluster representatives                       │
│                 ▼                                               │
│  ┌─────────────────────────────┐                                │
│  │      Amazon Bedrock         │  ← LLM + professor's rubric   │
│  │  (e.g., Claude / Titan)     │    generates deduction +      │
│  │                             │    personalized feedback       │
│  └──────────────┬──────────────┘                                │
│                 │                                               │
│                 ▼                                               │
│  ┌─────────────────────────────┐                                │
│  │  Human-in-the-Loop Approval │  ← TA reviews feedback ONCE   │
│  │  (TA Dashboard)             │    → applied to entire cluster │
│  └─────────────────────────────┘                                │
└─────────────────────────────────────────────────────────────────┘
## Challenges we ran into
Gradescope's rubric builder doesn't eliminate reading , the TA still has to read every answer to decide which rubric item applies; it just makes applying the decision faster

## Accomplishments that we're proud of
Is to be able to make the ai run.

## What we learned

Implement AWS extract

## What's next for Cluster grade ai 
Gradescope API integration — programmatically pull student submissions and push approved grades, eliminating the manual PDF upload step entirely
Cross-exam analytics — track which misconceptions persist cohort-to-cohort, give instructors a dashboard showing "35% of students missed base case for 3 years running"
Multi-course support — user accounts, per-course rubric storage, TA team access

Built With

  • aws-iam-(temporary-sts-credentials)-apis-aws-textract-rest-api-(via-boto3-sdk)-platform-macos
  • boto3
  • css-frameworks-&-libraries-flask
  • flask-cors
  • html
  • javascript
  • languages-python
  • numpy
  • pdf2image
  • scikit-learn
  • werkzeug-cloud-services-aws-textract-(detect-document-text)
Share this project:

Updates