Grok Recruiter

AI-powered candidate discovery system using graph databases and Grok for intelligent matching.

Setup

1. Install Neo4j

Option A: Docker (Recommended)

docker run \
  --name neo4j-grok \
  -p7474:7474 -p7687:7687 \
  -e NEO4J_AUTH=neo4j/your-password \
  -v $(pwd)/neo4j-data:/data \
  neo4j:latest

Option B: Homebrew (macOS)

brew install neo4j
neo4j start

Access Neo4j Browser at: http://localhost:7474

2. Install Dependencies

pip install -r requirements.txt

3. Configure Environment

cp .env.example .env
# Edit .env with your Neo4j credentials

Note: If Neo4j is running for the first time, you'll need to change the default password:

Open http://localhost:7474 in your browser
Login with username: neo4j, password: neo4j
Set a new password and update it in .env

4. Initialize Database Schema

python -m src.graph.schema

5. Ingest Candidate Personas

Option A: From JSON file

python -m src.graph.ingestion --persona-file path/to/personas.json

Option B: From OpenAlex (build coauthor graph)

# Basic usage - find author and build coauthor graph
python -m src.graph.openalex_ingestion --name "Elon Musk" --institution "xAI"

# With ORCID (more accurate matching)
python -m src.graph.openalex_ingestion --name "Greg Yang" --orcid "0000-0000-0000-0000"

# With enrichment (Semantic Scholar + Firecrawl)
python -m src.graph.openalex_ingestion \
  --name "Ilya Sutskever" \
  --institution "OpenAI" \
  --use-semantic-scholar \
  --use-firecrawl \
  --min-shared-papers 2

Optional: API Keys For enrichment, add to .env:

SEMANTIC_SCHOLAR_API_KEY=your_api_key_here
FIRECRAWL_API_KEY=your_firecrawl_key

Semantic Scholar: Get a free API key at https://www.semanticscholar.org/product/api
Firecrawl: Get an API key at https://firecrawl.dev/app (for website/GitHub scraping)

Graph Schema

Nodes: Candidate

Core identity (name, github_username, etc.)
Research profile (papers, h_index, research_areas)
Technical profile (GitHub stats, code quality)
Computed scores (talent_tier, research_score, etc.)

Edges: COLLABORATED_WITH

grok_research_similarity: Candidate-to-candidate similarity (static, cached)
target_fit_score: Destination candidate's fit to current JD (dynamic)
edge_weight: Computed as (0.2 * similarity) + (0.8 * jd_fit)

Usage

from src.graph import GraphQueries, GraphIngestion

# Query candidates
queries = GraphQueries()
collaborators = queries.get_collaborators("greg-yang")
graph_data = queries.get_graph_for_visualization(seed_ids=["greg-yang"])

# Ingest new candidates
ingestion = GraphIngestion()
personas = ingestion.load_from_json("personas.json")
ingestion.ingest_personas(personas)

Project Structure

.
├── src/
│   └── graph/
│       ├── __init__.py
│       ├── schema.py              # Database schema setup
│       ├── models.py               # Pydantic models
│       ├── ingestion.py            # Data ingestion (JSON)
│       ├── openalex_ingestion.py   # OpenAlex coauthor graph builder
│       └── queries.py              # Common queries
├── requirements.txt
├── .env.example
└── README.md

OpenAlex Coauthor Graph Builder

The openalex_ingestion.py script automatically builds a collaboration graph from OpenAlex:

Identity Resolution: Finds authors by name + institution (or ORCID)
Work Fetching: Retrieves all papers for the seed author
Coauthor Extraction: Extracts all coauthors and their affiliations
Enrichment (optional):
- Semantic Scholar: h-index, citation counts, research areas
- Firecrawl: Personal websites, GitHub profiles (via web scraping)
Graph Storage: Creates Candidate nodes and COLLABORATED_WITH edges

Features:

Rate limiting (respectful API usage)
Fuzzy name matching
Normalized institution data (ROR IDs)
Edge weights based on shared paper count
Automatic GitHub username extraction from URLs

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.npm-cache/_cacache		.npm-cache/_cacache
_archive		_archive
backend		backend
docs		docs
examples		examples
frontend		frontend
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
API_CREDITS_WORKAROUND.md		API_CREDITS_WORKAROUND.md
CANDIDATE_ENHANCEMENT_README.md		CANDIDATE_ENHANCEMENT_README.md
CANDIDATE_PERSONAS.md		CANDIDATE_PERSONAS.md
CLEANUP_SUMMARY.md		CLEANUP_SUMMARY.md
DEMO_PIPELINE.md		DEMO_PIPELINE.md
EXA_PERSONA_EXPERIMENT.md		EXA_PERSONA_EXPERIMENT.md
FIXES_APPLIED.md		FIXES_APPLIED.md
INTEGRATION_OVERVIEW.md		INTEGRATION_OVERVIEW.md
QUICK_START_FIXED.md		QUICK_START_FIXED.md
README.md		README.md
README_CANDIDATE_PIPELINE.md		README_CANDIDATE_PIPELINE.md
RESTRUCTURE_COMPLETE.md		RESTRUCTURE_COMPLETE.md
SOLUTION_SUMMARY.md		SOLUTION_SUMMARY.md
STRUCTURE_PROPOSAL.md		STRUCTURE_PROPOSAL.md
TESTING_GUIDE.md		TESTING_GUIDE.md
check_setup.py		check_setup.py
check_summary_length.py		check_summary_length.py
enhance_filtered_candidates.py		enhance_filtered_candidates.py
enhance_filtered_candidates_simple.py		enhance_filtered_candidates_simple.py
enriched_candidates.csv		enriched_candidates.csv
export_candidates_to_csv.py		export_candidates_to_csv.py
filter_candidates.py		filter_candidates.py
filtered_candidates.csv		filtered_candidates.csv
grokstructuredoutputs.txt		grokstructuredoutputs.txt
grokwebsearchapi.txt		grokwebsearchapi.txt
query_stored_candidates.py		query_stored_candidates.py
quick_start.sh		quick_start.sh
requirements.txt		requirements.txt
run_evaluator.py		run_evaluator.py
structuredoutputs.txt		structuredoutputs.txt
test_candidate_evaluator.py		test_candidate_evaluator.py
test_employee_pipeline.py		test_employee_pipeline.py
test_exa_enrichment_flow.py		test_exa_enrichment_flow.py
test_exa_persona_pipeline.py		test_exa_persona_pipeline.py
test_full_pipeline.py		test_full_pipeline.py
test_parallel_output.json		test_parallel_output.json
test_parallel_persona_pipeline.py		test_parallel_persona_pipeline.py
test_parallel_pipeline.py		test_parallel_pipeline.py
test_persona_summaries.py		test_persona_summaries.py
test_rl_source_reward.py		test_rl_source_reward.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grok Recruiter

Setup

1. Install Neo4j

2. Install Dependencies

3. Configure Environment

4. Initialize Database Schema

5. Ingest Candidate Personas

Graph Schema

Nodes: Candidate

Edges: COLLABORATED_WITH

Usage

Project Structure

OpenAlex Coauthor Graph Builder

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Grok Recruiter

Setup

1. Install Neo4j

2. Install Dependencies

3. Configure Environment

4. Initialize Database Schema

5. Ingest Candidate Personas

Graph Schema

Nodes: Candidate

Edges: COLLABORATED_WITH

Usage

Project Structure

OpenAlex Coauthor Graph Builder

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages