A lightning-fast web crawler that helps website owners optimize their SEO by analyzing and visualizing internal linking structures.
Internal linking is crucial for SEO, but manually tracking links across a large website is time-consuming and error-prone. SEOcrawl automates this process by:
- Discovering all pages within your domain
- Analyzing link relationships
- Identifying orphaned pages
- Generating detailed reports on link distribution
- Helping prioritize content that needs more internal links
npm installnpm start https://your-website.comThe crawler will analyze your site and generate a report showing how many times each page is linked to.
The crawler accepts a single argument - the base URL to start crawling from:
npm start <BASE_URL>=======
REPORT!!!...
=======
Found 12 links to page: blog.example.com/popular-post
Found 8 links to page: blog.example.com/about
Found 3 links to page: blog.example.com/contact
=======
END REPORT...
=======
- ✨ Respects domain boundaries (won't crawl external sites)
- 🔍 Normalizes URLs for accurate counting
- 🏃♂️ Efficient crawling with async/await
- 🎯 Handles both relative and absolute URLs
- 📊 Sorted reports by link frequency
- 🚫 Error handling for invalid URLs and non-HTML responses
- Clone the repository
git clone https://github.com/yourusername/seocrawl.git
cd seocrawl- Install dependencies
npm install- Run tests
npm testmain.js- Entry point and CLI handlingcrawl.js- Core crawling logic including URL normalization and HTML parsingreport.js- Report generation and sortingcrawl.test.js- Tests for URL handling and HTML parsingreport.test.js- Tests for report sorting functionality
normalizeURL()- Standardizes URLs for consistent countinggetURLsFromHTML()- Extracts and validates links from HTML contentcrawlPage()- Main crawling logic with recursive link followingprintReport()- Generates human-readable outputsortPages()- Sorts pages by link count
- Node.js v21.2.0 or higher
- Dependencies:
- Jest (testing)
- JSDOM (HTML parsing)
For major changes, please open an issue first to discuss what you'd like to change.