AI-powered candidate discovery system using graph databases and Grok for intelligent matching.
Option A: Docker (Recommended)
docker run \
--name neo4j-grok \
-p7474:7474 -p7687:7687 \
-e NEO4J_AUTH=neo4j/your-password \
-v $(pwd)/neo4j-data:/data \
neo4j:latestOption B: Homebrew (macOS)
brew install neo4j
neo4j startAccess Neo4j Browser at: http://localhost:7474
pip install -r requirements.txtcp .env.example .env
# Edit .env with your Neo4j credentialsNote: If Neo4j is running for the first time, you'll need to change the default password:
- Open http://localhost:7474 in your browser
- Login with username:
neo4j, password:neo4j - Set a new password and update it in
.env
python -m src.graph.schemaOption A: From JSON file
python -m src.graph.ingestion --persona-file path/to/personas.jsonOption B: From OpenAlex (build coauthor graph)
# Basic usage - find author and build coauthor graph
python -m src.graph.openalex_ingestion --name "Elon Musk" --institution "xAI"
# With ORCID (more accurate matching)
python -m src.graph.openalex_ingestion --name "Greg Yang" --orcid "0000-0000-0000-0000"
# With enrichment (Semantic Scholar + Firecrawl)
python -m src.graph.openalex_ingestion \
--name "Ilya Sutskever" \
--institution "OpenAI" \
--use-semantic-scholar \
--use-firecrawl \
--min-shared-papers 2Optional: API Keys
For enrichment, add to .env:
SEMANTIC_SCHOLAR_API_KEY=your_api_key_here
FIRECRAWL_API_KEY=your_firecrawl_key
- Semantic Scholar: Get a free API key at https://www.semanticscholar.org/product/api
- Firecrawl: Get an API key at https://firecrawl.dev/app (for website/GitHub scraping)
- Core identity (name, github_username, etc.)
- Research profile (papers, h_index, research_areas)
- Technical profile (GitHub stats, code quality)
- Computed scores (talent_tier, research_score, etc.)
grok_research_similarity: Candidate-to-candidate similarity (static, cached)target_fit_score: Destination candidate's fit to current JD (dynamic)edge_weight: Computed as(0.2 * similarity) + (0.8 * jd_fit)
from src.graph import GraphQueries, GraphIngestion
# Query candidates
queries = GraphQueries()
collaborators = queries.get_collaborators("greg-yang")
graph_data = queries.get_graph_for_visualization(seed_ids=["greg-yang"])
# Ingest new candidates
ingestion = GraphIngestion()
personas = ingestion.load_from_json("personas.json")
ingestion.ingest_personas(personas).
├── src/
│ └── graph/
│ ├── __init__.py
│ ├── schema.py # Database schema setup
│ ├── models.py # Pydantic models
│ ├── ingestion.py # Data ingestion (JSON)
│ ├── openalex_ingestion.py # OpenAlex coauthor graph builder
│ └── queries.py # Common queries
├── requirements.txt
├── .env.example
└── README.md
The openalex_ingestion.py script automatically builds a collaboration graph from OpenAlex:
- Identity Resolution: Finds authors by name + institution (or ORCID)
- Work Fetching: Retrieves all papers for the seed author
- Coauthor Extraction: Extracts all coauthors and their affiliations
- Enrichment (optional):
- Semantic Scholar: h-index, citation counts, research areas
- Firecrawl: Personal websites, GitHub profiles (via web scraping)
- Graph Storage: Creates Candidate nodes and COLLABORATED_WITH edges
Features:
- Rate limiting (respectful API usage)
- Fuzzy name matching
- Normalized institution data (ROR IDs)
- Edge weights based on shared paper count
- Automatic GitHub username extraction from URLs