Skip to content

ishandeshpande/xai-hackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grok Recruiter

AI-powered candidate discovery system using graph databases and Grok for intelligent matching.

Setup

1. Install Neo4j

Option A: Docker (Recommended)

docker run \
  --name neo4j-grok \
  -p7474:7474 -p7687:7687 \
  -e NEO4J_AUTH=neo4j/your-password \
  -v $(pwd)/neo4j-data:/data \
  neo4j:latest

Option B: Homebrew (macOS)

brew install neo4j
neo4j start

Access Neo4j Browser at: http://localhost:7474

2. Install Dependencies

pip install -r requirements.txt

3. Configure Environment

cp .env.example .env
# Edit .env with your Neo4j credentials

Note: If Neo4j is running for the first time, you'll need to change the default password:

  • Open http://localhost:7474 in your browser
  • Login with username: neo4j, password: neo4j
  • Set a new password and update it in .env

4. Initialize Database Schema

python -m src.graph.schema

5. Ingest Candidate Personas

Option A: From JSON file

python -m src.graph.ingestion --persona-file path/to/personas.json

Option B: From OpenAlex (build coauthor graph)

# Basic usage - find author and build coauthor graph
python -m src.graph.openalex_ingestion --name "Elon Musk" --institution "xAI"

# With ORCID (more accurate matching)
python -m src.graph.openalex_ingestion --name "Greg Yang" --orcid "0000-0000-0000-0000"

# With enrichment (Semantic Scholar + Firecrawl)
python -m src.graph.openalex_ingestion \
  --name "Ilya Sutskever" \
  --institution "OpenAI" \
  --use-semantic-scholar \
  --use-firecrawl \
  --min-shared-papers 2

Optional: API Keys For enrichment, add to .env:

SEMANTIC_SCHOLAR_API_KEY=your_api_key_here
FIRECRAWL_API_KEY=your_firecrawl_key

Graph Schema

Nodes: Candidate

  • Core identity (name, github_username, etc.)
  • Research profile (papers, h_index, research_areas)
  • Technical profile (GitHub stats, code quality)
  • Computed scores (talent_tier, research_score, etc.)

Edges: COLLABORATED_WITH

  • grok_research_similarity: Candidate-to-candidate similarity (static, cached)
  • target_fit_score: Destination candidate's fit to current JD (dynamic)
  • edge_weight: Computed as (0.2 * similarity) + (0.8 * jd_fit)

Usage

from src.graph import GraphQueries, GraphIngestion

# Query candidates
queries = GraphQueries()
collaborators = queries.get_collaborators("greg-yang")
graph_data = queries.get_graph_for_visualization(seed_ids=["greg-yang"])

# Ingest new candidates
ingestion = GraphIngestion()
personas = ingestion.load_from_json("personas.json")
ingestion.ingest_personas(personas)

Project Structure

.
├── src/
│   └── graph/
│       ├── __init__.py
│       ├── schema.py              # Database schema setup
│       ├── models.py               # Pydantic models
│       ├── ingestion.py            # Data ingestion (JSON)
│       ├── openalex_ingestion.py   # OpenAlex coauthor graph builder
│       └── queries.py              # Common queries
├── requirements.txt
├── .env.example
└── README.md

OpenAlex Coauthor Graph Builder

The openalex_ingestion.py script automatically builds a collaboration graph from OpenAlex:

  1. Identity Resolution: Finds authors by name + institution (or ORCID)
  2. Work Fetching: Retrieves all papers for the seed author
  3. Coauthor Extraction: Extracts all coauthors and their affiliations
  4. Enrichment (optional):
    • Semantic Scholar: h-index, citation counts, research areas
    • Firecrawl: Personal websites, GitHub profiles (via web scraping)
  5. Graph Storage: Creates Candidate nodes and COLLABORATED_WITH edges

Features:

  • Rate limiting (respectful API usage)
  • Fuzzy name matching
  • Normalized institution data (ROR IDs)
  • Edge weights based on shared paper count
  • Automatic GitHub username extraction from URLs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors