Transform any GitHub repository into an intelligent, queryable knowledge base
Neocortex uses AST parsing, vector embeddings, and dependency graph analysis to understand codebases deeplyβenabling natural language questions like "How does authentication work?" and returning precisely the relevant code context.
You: "How does the payment system handle refunds?"
Neocortex: Found 4 relevant functions:
β’ processRefund() - payments/refund.ts
β’ validateRefundRequest() - payments/validation.ts
β’ updateOrderStatus() - orders/status.ts [via dependency graph]
β’ sendRefundNotification() - notifications/email.ts [via dependency graph]
[Returns actual code context ready for any LLM]
| Feature | Description |
|---|---|
| π Semantic Search | Natural language queries using OpenAI embeddings |
| πΈοΈ Dependency Graph | Automatic call graph analysis for context expansion |
| π― Query Classification | Detects query type (simple, architectural, debugging) and optimizes strategy |
| π Token Budgeting | Smart compression to fit LLM context windows |
| π Security Layer | Rate limiting, audit logging, token sanitization |
| π₯οΈ Dual Interface | CLI for power users, Web UI for visual exploration |
- Node.js 20+
- pnpm
- OpenAI API key
git clone https://github.com/yourusername/neocortex.git
cd neocortex
pnpm install# Create .env file with your API key
echo "OPENAI_API_KEY=sk-your-key-here" > .env
# Optional: For private repos
echo "GITHUB_TOKEN=ghp_your-token" >> .env# Query any public repo
pnpm dev context https://github.com/sindresorhus/is "what types are exported"
# Query with more context
pnpm dev context https://github.com/user/repo "how does auth work" --max-tokens 3000pnpm dev context <repo-url> "<question>" [options]
# Examples
pnpm dev context https://github.com/user/repo "how does login work"
pnpm dev context https://github.com/user/repo "what calls the database" --max-tokens 4000
pnpm dev context https://github.com/user/private-repo "show me the API" -t ghp_tokenOptions:
| Flag | Description | Default |
|---|---|---|
--max-tokens <n> |
Token budget for context | 2000 |
--top-k <n> |
Number of candidates | 10 |
--model <m> |
openai or voyage-code-2 |
openai |
-t, --token |
GitHub token for private repos | β |
pnpm dev generate https://github.com/user/repo -o ./output/data.jsonlpnpm dev graph https://github.com/user/repo --expand functionName# Terminal 1: API Server
pnpm server
# Terminal 2: Frontend
pnpm ui
# Open http://localhost:5173ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NEOCORTEX PIPELINE β
β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β CLONE βββββΆβ PARSE βββββΆβ GRAPH βββββΆβ EMBED β β
β β GitHub β β AST β β Deps β β Vectors β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β β β β β
β βΌ βΌ βΌ βΌ β
β repo/ CodeEntity[] calls/calledBy embeddings[] β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β βCLASSIFY βββββΆβ SEARCH βββββΆβ EXPAND βββββΆβ SELECT β β
β β Query β βSemantic β β Graph β β Budget β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β β β β β
β βΌ βΌ βΌ βΌ β
β QueryType matches[] +dependencies context string β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Neocortex automatically detects your intent and adjusts the search strategy:
| Type | Example Query | Strategy |
|---|---|---|
| Simple | "What does parseFile do?" | Direct match, depth=1 |
| Multi-hop | "What calls auth then writes to DB?" | Trace connections, depth=3 |
| Architectural | "How is the app structured?" | Boost entry points & types |
| Debugging | "Why might login fail?" | Include error handlers |
| Usage | "How do I use the API client?" | Find examples & imports |
src/
βββ index.ts # CLI entry point (Commander.js)
βββ server.ts # Express API server
βββ security.ts # Rate limiting, audit logs, sanitization
β
βββ clone.ts # Git operations (clone, pull, file discovery)
βββ parser.ts # AST parsing with tree-sitter
βββ graph.ts # Dependency graph construction
βββ embeddings.ts # OpenAI/Voyage vector embeddings
β
βββ retrieval/
β βββ classifier.ts # Query type detection
β βββ search.ts # Semantic + keyword + graph search
β βββ budget.ts # Token budget selection
β
βββ generator.ts # Training data generation
βββ templates.ts # Q&A templates
βββ output.ts # JSONL file handling
βββ types.ts # TypeScript interfaces
β
βββ ui/
βββ App.tsx # React application
βββ CodeContextUI.tsx # Main UI component
βββ api.ts # Frontend HTTP client
Neocortex includes a comprehensive security layer:
- Token Sanitization β API keys are redacted from all error messages
- Rate Limiting β 30 requests/minute per user
- Scope Validation β Warns about excessive GitHub token permissions
- Access Verification β Checks repo access before cloning
- Audit Logging β All actions logged with timestamps
- Session Timeout β Auto-cleanup after 30min inactivity
- Secure Cleanup β Temp files deleted on logout
π― Context Retrieval Pipeline
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUERY CLASSIFICATION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Type: SIMPLE β
β Confidence: ββββββββββ 80% β
β Targets: authentication β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Searching: "how does authentication work"...
Raw matches: 15
Top scores: Login (0.325), onSignInButtonClick (0.296)
π SEARCH RESULTS β Found 5 entities, selected 5 within budget
1. Login [keyword] - 765 tokens - src/pages/Login.tsx
2. onSignInButtonClick [keyword] - 147 tokens - src/pages/Login.tsx
3. PrivateRoute [graph] - 54 tokens - src/components/PrivateRoute.tsx
4. getUser [keyword] - 36 tokens - src/utils/auth.ts
5. AuthContext [semantic] - 89 tokens - src/contexts/AuthContext.tsx
Total tokens: 1,091 / 2,000 budget
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CONTEXT (ready for LLM)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// File: src/pages/Login.tsx
function Login() {
const [email, setEmail] = useState('');
const [password, setPassword] = useState('');
async function onSignInButtonClick() {
const result = await signIn(email, password);
if (result.success) {
history.push('/dashboard');
}
}
// ...
}
// File: src/components/PrivateRoute.tsx
function PrivateRoute({ children }) {
const { user } = useAuth();
return user ? children : <Navigate to="/login" />;
}
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
β | OpenAI API key for embeddings |
GITHUB_TOKEN |
β | For private repository access |
VOYAGE_API_KEY |
β | Alternative: Voyage AI embeddings |
PORT |
β | Server port (default: 3001) |
| Model | Provider | Use Case |
|---|---|---|
openai |
OpenAI | General purpose, recommended |
voyage-code-2 |
Voyage AI | Code-optimized embeddings |
| Metric | Value |
|---|---|
| Parse speed | ~1,000 entities/sec |
| Embedding batch | 20 entities/request |
| Search latency | <500ms |
| Memory per repo | ~50MB (1,000 entities) |
# Install dependencies
pnpm install
# Run CLI in development
pnpm dev <command>
# Build for production
pnpm build
# Start production server
pnpm start- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing) - Open a Pull Request
MIT License β see LICENSE for details.
- tree-sitter β Fast, incremental AST parsing
- OpenAI β Embedding models
- Voyage AI β Code-specific embeddings
- Express β API server
- React β Web UI
- Vite β Frontend tooling
- Tailwind CSS β Styling
π§ Neocortex
Intelligent Code Context Retrieval