GrokHunt

Inspiration

The best programmers aren’t the ones sitting inside polished FAANG pipelines or boosted by LinkedIn’s recommendation graph. The real builders are the ones shipping projects at 3am, hidden in GitHub repos, Slack threads, and forgotten corners of the internet. Traditional recruiting never finds them. GrokHunt started as a challenge to that system: could an autonomous agent identify raw, undiscovered talent using reasoning instead of keywords?

What it does

GrokHunt finds undiscovered builders by scanning multiple Twitter sources, extracting their real work, and evaluating them with Grok’s reasoning engine. Every candidate gets a structured score, a narrative explanation, and a skill breakdown. When someone looks promising, GrokHunt sends them a personalized Twitter DM with an autonomous interview link. The candidate must convince the AI interviewer in real time. Recruiter ratings feed back into the system, updating the job’s weight and improving accuracy with every evaluation.

How we built it

This project pairs two Python backends: a Flask app (backend/main.py) that handles X (Twitter) OAuth, candidate hunting, and streaming progress updates, and a FastAPI service (backend/rlBackend.py) that exposes reinforcement-learning endpoints for recruiter feedback and calibration. The Flask side wires up PKCE-based OAuth, stores tokens in-memory/cookies, and uses Grok/X AI clients plus custom helpers (XHeadHunter, XDirectMessaging, XScraper, analyze_profile_for_job) to search, score, and message candidates. Its /hunt and /hunt/stream routes orchestrate keyword generation, parallel user search, tweet fetching, and Grok-based evaluation with server-sent events for live updates. The FastAPI service adds CORS and defines typed request/response models for feedback, scoring, policy stats, reward history, and calibration metrics, delegating RL logic to backend/RLloop modules like rl_feedback and grokScore. Recruiter feedback flows convert star ratings to 0–100 scores, log rewards, update policy weights, and expose policy/error metrics per job. Both services are JSON-first, front-end friendly, and designed to run locally with .env configuration for secrets and redirect URLs. Data sources include bundled user JSON and SQLite artifacts in data/, while templates/front-end integration rely on CORS and session cookies for cross-site redirects.

Challenges we ran into

X API costs made large scale searches expensive
Deep Research was harder than expected
(finding secondary links, platforms, and cross-verifying that they actually belonged to the same person)
Sending Twitter DMs when someone has closed DMs
Deciding which signals actually matter when identifying real builders instead of noisy accounts

Accomplishments that we're proud of

Shipping an end to end autonomous recruiting system in under 48 hours
Building a pipeline that searches, scores, DMs, and interviews candidates automatically
Making the RL loop actually improve accuracy after just a few recruiter ratings
Proving we can surface real builders that traditional hiring never finds

What we learned

Reasoning based evaluation outperforms keyword or embedding matching
RL calibration is simple to implement but extremely powerful in practice
Deep Research is the real bottleneck in candidate discovery
Automating outreach and interviews gives signal you can’t get from static text

What's next for GrokHunt

Expanding beyond Twitter into multiple sourcing platforms.
A deeper research engine that automatically pulls GitHub, LinkedIn, and portfolio data.
Reverse sourcing: define the ideal candidate and le.t Grok generate who to target next