Skip to content

rusheelsharma/neocortex

Repository files navigation

🧠 Neocortex

Transform any GitHub repository into an intelligent, queryable knowledge base

Neocortex uses AST parsing, vector embeddings, and dependency graph analysis to understand codebases deeplyβ€”enabling natural language questions like "How does authentication work?" and returning precisely the relevant code context.

TypeScript Node.js OpenAI React


🎯 What It Does

You: "How does the payment system handle refunds?"

Neocortex: Found 4 relevant functions:
  β€’ processRefund() - payments/refund.ts
  β€’ validateRefundRequest() - payments/validation.ts  
  β€’ updateOrderStatus() - orders/status.ts [via dependency graph]
  β€’ sendRefundNotification() - notifications/email.ts [via dependency graph]

[Returns actual code context ready for any LLM]

✨ Key Features

Feature Description
πŸ” Semantic Search Natural language queries using OpenAI embeddings
πŸ•ΈοΈ Dependency Graph Automatic call graph analysis for context expansion
🎯 Query Classification Detects query type (simple, architectural, debugging) and optimizes strategy
πŸ“Š Token Budgeting Smart compression to fit LLM context windows
πŸ” Security Layer Rate limiting, audit logging, token sanitization
πŸ–₯️ Dual Interface CLI for power users, Web UI for visual exploration

πŸš€ Quick Start

Prerequisites

  • Node.js 20+
  • pnpm
  • OpenAI API key

Installation

git clone https://github.com/yourusername/neocortex.git
cd neocortex
pnpm install

Set up environment

# Create .env file with your API key
echo "OPENAI_API_KEY=sk-your-key-here" > .env

# Optional: For private repos
echo "GITHUB_TOKEN=ghp_your-token" >> .env

Try it out

# Query any public repo
pnpm dev context https://github.com/sindresorhus/is "what types are exported"

# Query with more context
pnpm dev context https://github.com/user/repo "how does auth work" --max-tokens 3000

πŸ“– Usage

CLI Commands

context β€” Query a codebase

pnpm dev context <repo-url> "<question>" [options]

# Examples
pnpm dev context https://github.com/user/repo "how does login work"
pnpm dev context https://github.com/user/repo "what calls the database" --max-tokens 4000
pnpm dev context https://github.com/user/private-repo "show me the API" -t ghp_token

Options:

Flag Description Default
--max-tokens <n> Token budget for context 2000
--top-k <n> Number of candidates 10
--model <m> openai or voyage-code-2 openai
-t, --token GitHub token for private repos β€”

generate β€” Create training data

pnpm dev generate https://github.com/user/repo -o ./output/data.jsonl

graph β€” Analyze dependencies

pnpm dev graph https://github.com/user/repo --expand functionName

Web UI

# Terminal 1: API Server
pnpm server

# Terminal 2: Frontend  
pnpm ui

# Open http://localhost:5173

πŸ”¬ How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        NEOCORTEX PIPELINE                            β”‚
β”‚                                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚  CLONE  │───▢│  PARSE  │───▢│  GRAPH  │───▢│  EMBED  β”‚          β”‚
β”‚  β”‚ GitHub  β”‚    β”‚   AST   β”‚    β”‚  Deps   β”‚    β”‚ Vectors β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚       β”‚              β”‚              β”‚              β”‚                 β”‚
β”‚       β–Ό              β–Ό              β–Ό              β–Ό                 β”‚
β”‚    repo/         CodeEntity[]   calls/calledBy  embeddings[]        β”‚
β”‚                                                                      β”‚
β”‚  ════════════════════════════════════════════════════════════════   β”‚
β”‚                                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚CLASSIFY │───▢│ SEARCH  │───▢│ EXPAND  │───▢│ SELECT  β”‚          β”‚
β”‚  β”‚  Query  β”‚    β”‚Semantic β”‚    β”‚  Graph  β”‚    β”‚ Budget  β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚       β”‚              β”‚              β”‚              β”‚                 β”‚
β”‚       β–Ό              β–Ό              β–Ό              β–Ό                 β”‚
β”‚   QueryType      matches[]    +dependencies    context string       β”‚
β”‚                                                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Query Classification

Neocortex automatically detects your intent and adjusts the search strategy:

Type Example Query Strategy
Simple "What does parseFile do?" Direct match, depth=1
Multi-hop "What calls auth then writes to DB?" Trace connections, depth=3
Architectural "How is the app structured?" Boost entry points & types
Debugging "Why might login fail?" Include error handlers
Usage "How do I use the API client?" Find examples & imports

πŸ—οΈ Project Structure

src/
β”œβ”€β”€ index.ts              # CLI entry point (Commander.js)
β”œβ”€β”€ server.ts             # Express API server
β”œβ”€β”€ security.ts           # Rate limiting, audit logs, sanitization
β”‚
β”œβ”€β”€ clone.ts              # Git operations (clone, pull, file discovery)
β”œβ”€β”€ parser.ts             # AST parsing with tree-sitter
β”œβ”€β”€ graph.ts              # Dependency graph construction
β”œβ”€β”€ embeddings.ts         # OpenAI/Voyage vector embeddings
β”‚
β”œβ”€β”€ retrieval/
β”‚   β”œβ”€β”€ classifier.ts     # Query type detection
β”‚   β”œβ”€β”€ search.ts         # Semantic + keyword + graph search
β”‚   └── budget.ts         # Token budget selection
β”‚
β”œβ”€β”€ generator.ts          # Training data generation
β”œβ”€β”€ templates.ts          # Q&A templates
β”œβ”€β”€ output.ts             # JSONL file handling
β”œβ”€β”€ types.ts              # TypeScript interfaces
β”‚
└── ui/
    β”œβ”€β”€ App.tsx           # React application
    β”œβ”€β”€ CodeContextUI.tsx # Main UI component
    └── api.ts            # Frontend HTTP client

πŸ” Security

Neocortex includes a comprehensive security layer:

  • Token Sanitization β€” API keys are redacted from all error messages
  • Rate Limiting β€” 30 requests/minute per user
  • Scope Validation β€” Warns about excessive GitHub token permissions
  • Access Verification β€” Checks repo access before cloning
  • Audit Logging β€” All actions logged with timestamps
  • Session Timeout β€” Auto-cleanup after 30min inactivity
  • Secure Cleanup β€” Temp files deleted on logout

πŸ“Š Example Output

🎯 Context Retrieval Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ QUERY CLASSIFICATION                                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Type:       SIMPLE                                          β”‚
β”‚ Confidence: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘  80%                                 β”‚
β”‚ Targets:    authentication                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”Ž Searching: "how does authentication work"...
   Raw matches: 15
   Top scores: Login (0.325), onSignInButtonClick (0.296)

πŸ“Š SEARCH RESULTS β€” Found 5 entities, selected 5 within budget

  1. Login [keyword] - 765 tokens - src/pages/Login.tsx
  2. onSignInButtonClick [keyword] - 147 tokens - src/pages/Login.tsx
  3. PrivateRoute [graph] - 54 tokens - src/components/PrivateRoute.tsx
  4. getUser [keyword] - 36 tokens - src/utils/auth.ts
  5. AuthContext [semantic] - 89 tokens - src/contexts/AuthContext.tsx

Total tokens: 1,091 / 2,000 budget

═══════════════════════════════════════════════════════════════
CONTEXT (ready for LLM)
═══════════════════════════════════════════════════════════════

// File: src/pages/Login.tsx
function Login() {
  const [email, setEmail] = useState('');
  const [password, setPassword] = useState('');
  
  async function onSignInButtonClick() {
    const result = await signIn(email, password);
    if (result.success) {
      history.push('/dashboard');
    }
  }
  // ...
}

// File: src/components/PrivateRoute.tsx
function PrivateRoute({ children }) {
  const { user } = useAuth();
  return user ? children : <Navigate to="/login" />;
}

βš™οΈ Configuration

Environment Variables

Variable Required Description
OPENAI_API_KEY βœ… OpenAI API key for embeddings
GITHUB_TOKEN ❌ For private repository access
VOYAGE_API_KEY ❌ Alternative: Voyage AI embeddings
PORT ❌ Server port (default: 3001)

Embedding Models

Model Provider Use Case
openai OpenAI General purpose, recommended
voyage-code-2 Voyage AI Code-optimized embeddings

πŸ“ˆ Performance

Metric Value
Parse speed ~1,000 entities/sec
Embedding batch 20 entities/request
Search latency <500ms
Memory per repo ~50MB (1,000 entities)

πŸ› οΈ Development

# Install dependencies
pnpm install

# Run CLI in development
pnpm dev <command>

# Build for production
pnpm build

# Start production server
pnpm start

🀝 Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing)
  5. Open a Pull Request

πŸ“„ License

MIT License β€” see LICENSE for details.


πŸ™ Built With


🧠 Neocortex
Intelligent Code Context Retrieval

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors