🕷️ SEOcrawl

A lightning-fast web crawler that helps website owners optimize their SEO by analyzing and visualizing internal linking structures.

Why?

Internal linking is crucial for SEO, but manually tracking links across a large website is time-consuming and error-prone. SEOcrawl automates this process by:

Discovering all pages within your domain
Analyzing link relationships
Identifying orphaned pages
Generating detailed reports on link distribution
Helping prioritize content that needs more internal links

🚀 Quick Start

Install dependencies

npm install

Run the crawler

npm start https://your-website.com

The crawler will analyze your site and generate a report showing how many times each page is linked to.

📖 Usage

Command Line Arguments

The crawler accepts a single argument - the base URL to start crawling from:

npm start <BASE_URL>

Example Output

=======
REPORT!!!...
=======
Found 12 links to page: blog.example.com/popular-post
Found 8 links to page: blog.example.com/about
Found 3 links to page: blog.example.com/contact
=======
END REPORT...
=======

Features

✨ Respects domain boundaries (won't crawl external sites)
🔍 Normalizes URLs for accurate counting
🏃‍♂️ Efficient crawling with async/await
🎯 Handles both relative and absolute URLs
📊 Sorted reports by link frequency
🚫 Error handling for invalid URLs and non-HTML responses

🤝 Contributing

Setup Development Environment

Clone the repository

git clone https://github.com/yourusername/seocrawl.git
cd seocrawl

Install dependencies

npm install

Run tests

npm test

Project Structure

main.js - Entry point and CLI handling
crawl.js - Core crawling logic including URL normalization and HTML parsing
report.js - Report generation and sorting
crawl.test.js - Tests for URL handling and HTML parsing
report.test.js - Tests for report sorting functionality

Core Functions

normalizeURL() - Standardizes URLs for consistent counting
getURLsFromHTML() - Extracts and validates links from HTML content
crawlPage() - Main crawling logic with recursive link following
printReport() - Generates human-readable output
sortPages() - Sorts pages by link count

Technical Requirements

Node.js v21.2.0 or higher
Dependencies:
- Jest (testing)
- JSDOM (HTML parsing)

For major changes, please open an issue first to discuss what you'd like to change.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
.nvmrc		.nvmrc
README.md		README.md
crawl.js		crawl.js
crawl.test.js		crawl.test.js
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
report.js		report.js
report.test.js		report.test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕷️ SEOcrawl

Why?

🚀 Quick Start

Install dependencies

Run the crawler

📖 Usage

Command Line Arguments

Example Output

Features

🤝 Contributing

Setup Development Environment

Project Structure

Core Functions

Technical Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🕷️ SEOcrawl

Why?

🚀 Quick Start

Install dependencies

Run the crawler

📖 Usage

Command Line Arguments

Example Output

Features

🤝 Contributing

Setup Development Environment

Project Structure

Core Functions

Technical Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages