GitHub Analyzer

A Python web scraping tool that extracts comprehensive statistics from GitHub user profiles using Selenium WebDriver.

Features

📊 Basic Profile Info: Extract name and username from GitHub profiles
🔗 Social Links: Scrape social media links from user profiles
📈 Repository Analysis: Get total repositories, stars, and commits
💻 Language Detection: Find programming languages used across repositories
👥 Social Stats: Followers, following count and ratio calculation
🤖 Headless Browsing: Automated Chrome browser in headless mode

Requirements

Python 3.6+
Chrome browser installed
ChromeDriver (managed by selenium)

Installation

Install required packages:
```
pip install selenium
```
Ensure Chrome is installed on your system

Usage

Run the script:

python github_scraper.py

Enter the GitHub username when prompted:

PASTE YOUR ACCOUNT'S URL : username

Code Structure

The script contains three main classes:

`BasicInf0`

Handles basic profile information extraction.

Methods:

name() - Extracts full name and username
socials() - Gets social media links from profile
repo() - Collects all repository URLs

`RepoInsider(BasicInf0)`

Inherits from BasicInf0 and analyzes repository details.

Methods:

no_of_stars() - Counts stars for current repository
no_of_commits() - Extracts commit count using regex
no_of_languages() - Identifies programming languages
all_repo_insider() - Iterates through all repos for analysis

`Personal`

Handles follower/following statistics and ratios.

Methods:

followers_and_follwing() - Extracts follower/following counts
no_of_repos() - Displays total repository count
follower_to_follwing_ratio() - Calculates and prints ratio

What It Scrapes

Data Type	CSS Selector Used	Description
Name	`span.p-name.vcard-fullname.d-block.overflow-hidden`	Full name
Username	`span.p-nickname.vcard-username.d-block`	GitHub username
Social Links	`li.vcard-detail.pt-1 > a`	Social media URLs
Repository Names	`h3.wb-break-all > a`	Repo names and links
Repository Descriptions	`div.col-10.col-lg-9.d-inline-block > div:nth-child(2)`	Repo descriptions
Stars	`a.Link.Link--muted > strong`	Star count per repo
Commits	`span.fgColor-default`	Commit count per repo
Languages	`span.color-fg-default.text-bold.mr-1`	Programming languages
Social Stats	`div.mb-3 > a > span`	Followers/following

Sample Output

John Doe @johndoe
Twitter : https://twitter.com/johndoe
LinkedIn : https://linkedin.com/in/johndoe
{'Python', 'JavaScript', 'HTML', 'CSS', 'Java'}
Total stars : 150
Total commits : 1250
5
1.75

Code Flow

Initialize: Creates headless Chrome browser instance
Get URL: Prompts user for GitHub username
Basic Info: Scrapes name, username, and social links
Repository Collection: Navigates to repositories tab and collects all repo URLs
Repository Analysis:
- Visits each repository individually
- Extracts stars, commits, and languages
- Accumulates totals across all repositories
Social Analysis: Calculates follower-to-following ratio
Display Results: Prints all collected statistics

Key Features of Original Code

Selenium Configuration

op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op)

Data Collection Pattern

# Collects repository URLs for later analysis
self.to_store_repos.append(ele1.get_attribute('href'))

Regex Usage for Commits

# Extracts numbers from commit text
numbers = re.findall(r'\d+', commits.text)
commit_list = [int(nums) for nums in numbers]

Set Usage for Languages

# Uses set to avoid duplicate languages
self.total_lang = set()
self.total_lang.add(a.text)

Limitations

No Error Handling: Original code lacks try-catch blocks
Fixed Selectors: CSS selectors may break if GitHub updates their layout
No Rate Limiting: May overwhelm GitHub's servers with rapid requests
Input Handling: Limited validation of user input
Resource Management: Browser instance not properly closed

Potential Issues

Element Not Found: CSS selectors may fail on different profile layouts
Dynamic Content: Some elements may not load immediately
Private Repositories: Cannot access private repo data
Rate Limiting: GitHub may block excessive requests

Usage Notes

The script navigates between different GitHub pages automatically
Uses headless Chrome to avoid opening browser windows
Processes all repositories sequentially which may take time for users with many repos
Accumulates statistics across all public repositories

Legal Considerations

Only accesses publicly available GitHub data
Respects GitHub's public profile information
Use responsibly and respect GitHub's terms of service
Consider using GitHub's official API for production use

Future Improvements

Add error handling and exception management
Implement delays between requests
Add input validation
Proper browser cleanup
Export results to files
Progress indicators for long operations

Dependencies

selenium
re (built-in)

Make sure ChromeDriver is compatible with your installed Chrome version.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub Analyzer

Features

Requirements

Installation

Usage

Code Structure

`BasicInf0`

`RepoInsider(BasicInf0)`

`Personal`

What It Scrapes

Sample Output

Code Flow

Key Features of Original Code

Selenium Configuration

Data Collection Pattern

Regex Usage for Commits

Set Usage for Languages

Limitations

Potential Issues

Usage Notes

Legal Considerations

Future Improvements

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GitHub Analyzer

Features

Requirements

Installation

Usage

Code Structure

BasicInf0

RepoInsider(BasicInf0)

Personal

What It Scrapes

Sample Output

Code Flow

Key Features of Original Code

Selenium Configuration

Data Collection Pattern

Regex Usage for Commits

Set Usage for Languages

Limitations

Potential Issues

Usage Notes

Legal Considerations

Future Improvements

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`BasicInf0`

`RepoInsider(BasicInf0)`

`Personal`

Packages