Skip to content

WebPastMachine is a powerful tool that lets you explore the history of any website through the Internet Archive's Wayback Machine.

License

Notifications You must be signed in to change notification settings

Shac0x/WebPastMachine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebPastMachine Python Version License

Logo

📖 Description

WebPastMachine is a powerful tool that lets you explore the history of any website through the Internet Archive's Wayback Machine. It helps you discover all archived URLs for a domain, analyze the types of content that were archived, and export the results for further analysis.

✨ Features

  • 🔍 Search for all archived URLs of any domain
  • 📊 Analyze file types and their distribution
  • 🔎 Filter results by file extension
  • 💾 Export results to a file
  • 🎨 Colored terminal output for better readability
  • ⚡ Fast and efficient processing
  • 🛠️ Easy to use command-line interface

🚀 Installation

  1. Clone this repository:
git clone https://github.com/Shac0x/WebPastMachine
cd WebPastMachine
  1. Install the required dependencies:
pip install -r requirements.txt

Note: The tool will work without additional packages, but installing colorama provides a better visual experience with colored terminal output.

💻 Usage

Basic Usage

Search for all archived URLs of a domain:

python WebPastMachine.py example.com

Advanced Options

  1. Export to file (combine summary mode with output):
python WebPastMachine.py example.com -s -o results.json
  1. Filter by file extension:
python WebPastMachine.py example.com -e pdf
  1. Show only summary without listing individual URLs:
python WebPastMachine.py example.com -s
  1. Combine filtering and export:
python WebPastMachine.py example.com -e pdf -o pdfs.json

📋 Command Line Arguments

Argument Description Example
domain The domain to search (required) example.com
-e, --extension Filter by file extension -e pdf
-o, --output Output file to save results -o results.json
-s, --summary Show only summary without listing individual URLs -s
-h, --help Show help message -h

📝 Output Format

Console Output

╔════════════════════════════════════════════════════════════════╗
║ Searching archived URLs for example.com...                     ║
╚════════════════════════════════════════════════════════════════╝

Processing URLs...
Processed 500/1200 URLs

Analysis of file types found:
--------------------------------------------------
*.html: 150 files
*.php: 45 files
*.jpg: 30 files
*.pdf: 25 files
*.js: 20 files

Total unique URLs found: 270

------------------------------------
URL: https://example.com/page.html
First capture: 2010-01-15 14:25:10
Archive link: http://web.archive.org/web/20100115142510/https://example.com/page.html
------------------------------------

With colorama installed, the output will be nicely colorized, making it easier to read and distinguish between different types of information.

File Output

The exported file will contain all URLs with their capture dates and archive links in a clean, readable format.

🎯 Use Cases

  • 📚 Research: Investigate the history of websites
  • 🔒 Security: Find old versions of sensitive pages
  • 🎨 Design: Track website design evolution
  • 📊 Analysis: Study content distribution over time
  • 🔍 Discovery: Find lost or removed content

⚙️ Technical Details

  • Uses the Wayback Machine CDX API
  • Implements efficient URL deduplication
  • Handles rate limiting and timeouts

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👏 Acknowledgments

  • Internet Archive for providing the Wayback Machine

About

WebPastMachine is a powerful tool that lets you explore the history of any website through the Internet Archive's Wayback Machine.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages