Identify automation patterns in GitHub accounts through behavioral analysis
This is the core logic behind AgentScan, a tool for analyzing GitHub account behavior to detect potential AI agents and automated activity.
Built in response to increasing reports of AI agents targeting open source projects through automated contributions and cold outreach.
It applies an opinionated scoring system to GitHub activity signals to classify accounts as organic, mixed, or automation. The results are indicators, not verdicts.
npm install @unveil/identityimport { identify } from "@unveil/identity";
// Fetch user data from GitHub API
const username = "github_account_username";
const userRes = await fetch(`https://api.github.com/users/${username}`);
const user = await userRes.json();
// Fetch user's recent events
const eventsRes = await fetch(
`https://api.github.com/users/${username}/events?per_page=100`
);
const events = await eventsRes.json();
// Analyze the account
const analysis = identify({
createdAt: user.created_at,
reposCount: user.public_repos,
accountName: user.login,
events,
});
console.log(analysis);
// Output:
// {
// classification: "organic",
// score: 100,
// flags: []
// }The system analyzes GitHub activity across 39 distinct heuristics organized into 9 categories. Each heuristic assigns points that are subtracted from a baseline score of 100 (100 = human, 0 = automation).
- Recently created - Account < 30 days old
- Young account - Account 30-90 days old
- Only active on other people's repos - 0 personal repos but all activity is external
- Concentrated repository creation - 16+ repos created in 24 hours
- Frequent repository creation - 8-15 repos created in 24 hours
- 24/7 activity pattern - Single day with activity across many hours and < 3 hour sleep window (per-day analysis only)
- Narrow activity focus - Either:
- ≤3 event types with low entropy (< 0.8) AND no human interactions, OR
- ≥5 event types with very high entropy (> 0.85) AND no human interactions
- Issue comment spam - 15+ distinct repos in concentrated time window
- High issue comment frequency - 10-14 distinct repos in concentrated time window
- PR comment spam - 12+ distinct PRs in concentrated time window
- High PR comment frequency - 8-11 distinct PRs in concentrated time window
- Automated branch/PR workflow - Near 1:1 ratio with branches consistently followed by PRs within time window
- Multiple forks - 5-7 forks in 24 hours
- Fork spike detected - 8-19 forks in 24 hours
- Severe fork surge - Variable thresholds in 24h
- Extreme fork automation - 20+ forks in 24 hours
- Multi-day fork surge - Concentrated activity over 48 hours
- Severe multi-day fork surge - Rapid burst over 72 hours
- Sustained fork rate - High forks/day over 3+ days
- Extended forking pattern - Forking activity on multiple consecutive days
- Fork scatter pattern - Targeting many different repositories
- Suspicious chained automations - Fork → Branch → PR sequence with temporal ordering
- Extreme commit burst - Many commits in 1 hour
- High commit burst - Moderate commits in 1 hour
- High commit frequency - Tight bursts within seconds
- Very high PR volume - High PRs/day ratio
- High PR volume - Moderate PRs/day ratio
- Extended daily coding - Consecutive marathon days (15+ hours)
- Frequent long coding days - Multiple days with 15+ hours and uniform hourly distribution
- Highly distributed activity - Activity across many external repos
- Distributed activity - Activity across external repos
- High PR volume in the past 24 hours - Burst of PRs to external repos
- High PR volume during last week - Weekly PR surge to external repos
- Primarily external contributions - Many PRs but few/no personal repos
- Mostly external activity - High % of activity on others' repos
- Extreme PR spam (daily) - 30+ PRs in 24 hours
- Extreme PR spam (weekly) - 100+ PRs in 7 days
- Very high PR spam frequency - 50+ PRs in 7 days
- Distributed PR spam pattern - High PR count across many repos with high density OR 30-day window spam
Please drop an issue if you find something that doesn't work, or have an idea for something that works better.