Transform your GitHub profile data and resume into a professional markdown CV automatically!
Generate a comprehensive, professional CV by combining your GitHub repository statistics, project analysis, and resume information using OCR and local LLM processing.
- π GitHub Analytics: Automatically analyzes your repositories for stars, commits, lines of code, and language distribution
- π OCR Resume Processing: Extracts text from PDF and image resumes using Tesseract OCR
- π€ AI-Powered Parsing: Uses local Ollama LLM to intelligently parse resume content
- π― Smart Skills Detection: Combines resume skills with programming languages detected from GitHub projects
- π Professional Template: Generates a well-structured markdown CV with multiple sections
- π Privacy-First: All processing happens locally - no data sent to external services
- β‘ Fast & Efficient: Processes data quickly with minimal dependencies
- Python 3.8 or higher
- Tesseract OCR installed on your system
- Ollama running locally with a language model
-
Clone the repository
git clone https://github.com/PriyavKaneria/profile-generator.git cd profile-generator -
Install Python dependencies
pip install -r requirements.txt
-
Install Tesseract OCR (only for image based resume)
Ubuntu/Debian:
sudo apt-get install tesseract-ocr
macOS:
brew install tesseract
Windows: Download from UB-Mannheim/tesseract
-
Setup Ollama
# Install Ollama (visit https://ollama.ai/ for installation instructions) # Pull a model (choose one) ollama pull llama2 # General purpose ollama pull codellama # Code-focused ollama pull mistral # Alternative option or use any which you feel like
Basic usage:
python generate_cv_profile.py --github-data github_data.json --resume resume.pdf --output my_cv.mdWith custom model:
python generate_cv_profile.py \
--github-data github_profile_data.json \
--resume resume.png \
--output professional_cv.md \
--ollama-model codellama| Option | Description | Default |
|---|---|---|
--github-data |
Path to GitHub data JSON file | Required |
--resume |
Path to resume file (PDF or image) | Required |
--output |
Output markdown file path | cv.md |
--ollama-model |
Ollama model to use | llama2 |
See https://github.com/PriyavKaneria/CodeStats
Your GitHub data should be in JSON format with the following structure:
{
"repository_name": {
"featuredLevel": 3,
"total_files": 16,
"total_lines": 1466,
"total_lines_of_code": 1030,
"actual_code_lines": 1018,
"language_distribution": {
".py": 1432,
".js": 123,
".html": 164
},
"description": "Project description",
"stars": 5,
"topics": ["python", "web"],
"private": false,
"contributions": {
"total_commits": 39,
"total_lines_changed": {
"additions": 3173,
"deletions": 1694
},
"first_commit_date": "2024-07-21 15:18:05",
"last_commit_date": "2024-08-13 15:06:48"
}
}
}The generated CV includes:
- Contact Information - Extracted from resume
- Professional Summary - Parsed from resume using LLM
- GitHub Statistics - Calculated from repository data
- Total public repositories
- Total stars received
- Total commits made
- Lines of code written
- Technical Skills - Combined from resume and GitHub language analysis
- Featured Projects - Top 5 repositories with details
- Professional Experience - Extracted from resume
- Education - Academic background from resume
- Certifications - Professional certifications listed
- GitHub Analysis: Processes repository data to extract meaningful statistics and identify primary programming languages
- OCR Processing: Uses Tesseract to extract text from PDF or image resumes
- LLM Parsing: Employs local Ollama model to structure resume text into organized data
- Smart Categorization: Automatically categorizes skills and projects based on content analysis
- Template Generation: Combines all data into a professional markdown format
- PDF files (
.pdf) - Text extraction via PyMuPDF - Image files (
.png,.jpg,.jpeg) - OCR via Tesseract
llama2- General purpose, good balancecodellama- Optimized for code and technical contentmistral- Fast and efficientllama3- Latest version with improved capabilities
The script automatically detects programming languages from file extensions:
- Python, JavaScript, TypeScript, Java, C++, C, C#
- PHP, Ruby, Go, Rust, Swift, Kotlin
- HTML, CSS, Svelte, Vue.js, React
Contributions are welcome! Here are some ways you can help:
- π Report bugs and issues
- π‘ Suggest new features
- π Improve documentation
- π§ Submit pull requests
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and test thoroughly
- Submit a pull request with a clear description
Tesseract not found:
# Make sure Tesseract is in your PATH
tesseract --versionOllama connection error:
# Check if Ollama is running
curl http://localhost:11434/api/tagsPoor OCR results:
- Ensure resume image is high quality (300+ DPI)
- Try preprocessing image (contrast, brightness)
- Use PDF format when possible
LLM parsing issues:
- Try different Ollama models
- Ensure resume has clear structure
- Check if resume text was extracted correctly
- Processing Time: ~30-60 seconds for typical resume + GitHub data
- Memory Usage: ~200-500MB depending on Ollama model
- Accuracy: 85-95% for well-structured resumes
- Local Processing: All data processing happens on your machine
- No External APIs: No data sent to third-party services
- Open Source: Full transparency of data handling
This project is licensed under the MIT License - see the LICENSE file for details.
- Tesseract OCR for optical character recognition
- Ollama for local LLM capabilities
- PyMuPDF for PDF processing
- Pillow for image processing
If you encounter any issues or have questions:
- Check the Issues page
- Create a new issue with detailed information
- Include error messages and system information
Made with β€οΈ for developers who want to showcase their GitHub portfolio professionally