Skip to content

PriyavKaneria/profile-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GitHub Profile CV Generator

Transform your GitHub profile data and resume into a professional markdown CV automatically!

Python 3.8+ License: MIT OCR LLM

Generate a comprehensive, professional CV by combining your GitHub repository statistics, project analysis, and resume information using OCR and local LLM processing.

✨ Features

  • πŸ“Š GitHub Analytics: Automatically analyzes your repositories for stars, commits, lines of code, and language distribution
  • πŸ“„ OCR Resume Processing: Extracts text from PDF and image resumes using Tesseract OCR
  • πŸ€– AI-Powered Parsing: Uses local Ollama LLM to intelligently parse resume content
  • 🎯 Smart Skills Detection: Combines resume skills with programming languages detected from GitHub projects
  • πŸ“ Professional Template: Generates a well-structured markdown CV with multiple sections
  • πŸ”’ Privacy-First: All processing happens locally - no data sent to external services
  • ⚑ Fast & Efficient: Processes data quickly with minimal dependencies

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Tesseract OCR installed on your system
  • Ollama running locally with a language model

Installation

  1. Clone the repository

    git clone https://github.com/PriyavKaneria/profile-generator.git
    cd profile-generator
  2. Install Python dependencies

    pip install -r requirements.txt
  3. Install Tesseract OCR (only for image based resume)

    Ubuntu/Debian:

    sudo apt-get install tesseract-ocr

    macOS:

    brew install tesseract

    Windows: Download from UB-Mannheim/tesseract

  4. Setup Ollama

    # Install Ollama (visit https://ollama.ai/ for installation instructions)
    
    # Pull a model (choose one)
    ollama pull llama2          # General purpose
    ollama pull codellama       # Code-focused
    ollama pull mistral         # Alternative option or use any which you feel like

Usage

Basic usage:

python generate_cv_profile.py --github-data github_data.json --resume resume.pdf --output my_cv.md

With custom model:

python generate_cv_profile.py \
    --github-data github_profile_data.json \
    --resume resume.png \
    --output professional_cv.md \
    --ollama-model codellama

Command Line Options

Option Description Default
--github-data Path to GitHub data JSON file Required
--resume Path to resume file (PDF or image) Required
--output Output markdown file path cv.md
--ollama-model Ollama model to use llama2

πŸ“‹ GitHub Data Format

Use CodeStats repo for fetching and generating github repo details automatically

See https://github.com/PriyavKaneria/CodeStats

Your GitHub data should be in JSON format with the following structure:

{
  "repository_name": {
    "featuredLevel": 3,
    "total_files": 16,
    "total_lines": 1466,
    "total_lines_of_code": 1030,
    "actual_code_lines": 1018,
    "language_distribution": {
      ".py": 1432,
      ".js": 123,
      ".html": 164
    },
    "description": "Project description",
    "stars": 5,
    "topics": ["python", "web"],
    "private": false,
    "contributions": {
      "total_commits": 39,
      "total_lines_changed": {
        "additions": 3173,
        "deletions": 1694
      },
      "first_commit_date": "2024-07-21 15:18:05",
      "last_commit_date": "2024-08-13 15:06:48"
    }
  }
}

πŸ“„ Sample Output

The generated CV includes:

  • Contact Information - Extracted from resume
  • Professional Summary - Parsed from resume using LLM
  • GitHub Statistics - Calculated from repository data
    • Total public repositories
    • Total stars received
    • Total commits made
    • Lines of code written
  • Technical Skills - Combined from resume and GitHub language analysis
  • Featured Projects - Top 5 repositories with details
  • Professional Experience - Extracted from resume
  • Education - Academic background from resume
  • Certifications - Professional certifications listed

πŸ› οΈ How It Works

  1. GitHub Analysis: Processes repository data to extract meaningful statistics and identify primary programming languages
  2. OCR Processing: Uses Tesseract to extract text from PDF or image resumes
  3. LLM Parsing: Employs local Ollama model to structure resume text into organized data
  4. Smart Categorization: Automatically categorizes skills and projects based on content analysis
  5. Template Generation: Combines all data into a professional markdown format

πŸ”§ Configuration

Supported Resume Formats

  • PDF files (.pdf) - Text extraction via PyMuPDF
  • Image files (.png, .jpg, .jpeg) - OCR via Tesseract

Supported Ollama Models

  • llama2 - General purpose, good balance
  • codellama - Optimized for code and technical content
  • mistral - Fast and efficient
  • llama3 - Latest version with improved capabilities

Language Detection

The script automatically detects programming languages from file extensions:

  • Python, JavaScript, TypeScript, Java, C++, C, C#
  • PHP, Ruby, Go, Rust, Swift, Kotlin
  • HTML, CSS, Svelte, Vue.js, React

🀝 Contributing

Contributions are welcome! Here are some ways you can help:

  • πŸ› Report bugs and issues
  • πŸ’‘ Suggest new features
  • πŸ“ Improve documentation
  • πŸ”§ Submit pull requests

Development Setup

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes and test thoroughly
  4. Submit a pull request with a clear description

πŸ“‹ Troubleshooting

Common Issues

Tesseract not found:

# Make sure Tesseract is in your PATH
tesseract --version

Ollama connection error:

# Check if Ollama is running
curl http://localhost:11434/api/tags

Poor OCR results:

  • Ensure resume image is high quality (300+ DPI)
  • Try preprocessing image (contrast, brightness)
  • Use PDF format when possible

LLM parsing issues:

  • Try different Ollama models
  • Ensure resume has clear structure
  • Check if resume text was extracted correctly

πŸ“Š Performance

  • Processing Time: ~30-60 seconds for typical resume + GitHub data
  • Memory Usage: ~200-500MB depending on Ollama model
  • Accuracy: 85-95% for well-structured resumes

πŸ›‘οΈ Privacy & Security

  • Local Processing: All data processing happens on your machine
  • No External APIs: No data sent to third-party services
  • Open Source: Full transparency of data handling

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“ž Support

If you encounter any issues or have questions:

  1. Check the Issues page
  2. Create a new issue with detailed information
  3. Include error messages and system information

Made with ❀️ for developers who want to showcase their GitHub portfolio professionally

About

Profile generator meant for introspector but probably useful for more

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors