WebPastMachine

📖 Description

WebPastMachine is a powerful tool that lets you explore the history of any website through the Internet Archive's Wayback Machine. It helps you discover all archived URLs for a domain, analyze the types of content that were archived, and export the results for further analysis.

✨ Features

🔍 Search for all archived URLs of any domain
📊 Analyze file types and their distribution
🔎 Filter results by file extension
💾 Export results to a file
🎨 Colored terminal output for better readability
⚡ Fast and efficient processing
🛠️ Easy to use command-line interface

🚀 Installation

Clone this repository:

git clone https://github.com/Shac0x/WebPastMachine
cd WebPastMachine

Install the required dependencies:

pip install -r requirements.txt

Note: The tool will work without additional packages, but installing colorama provides a better visual experience with colored terminal output.

💻 Usage

Basic Usage

Search for all archived URLs of a domain:

python WebPastMachine.py example.com

Advanced Options

Export to file (combine summary mode with output):

python WebPastMachine.py example.com -s -o results.json

Filter by file extension:

python WebPastMachine.py example.com -e pdf

Show only summary without listing individual URLs:

python WebPastMachine.py example.com -s

Combine filtering and export:

python WebPastMachine.py example.com -e pdf -o pdfs.json

📋 Command Line Arguments

Argument	Description	Example
domain	The domain to search (required)	example.com
-e, --extension	Filter by file extension	-e pdf
-o, --output	Output file to save results	-o results.json
-s, --summary	Show only summary without listing individual URLs	-s
-h, --help	Show help message	-h

📝 Output Format

Console Output

╔════════════════════════════════════════════════════════════════╗
║ Searching archived URLs for example.com...                     ║
╚════════════════════════════════════════════════════════════════╝

Processing URLs...
Processed 500/1200 URLs

Analysis of file types found:
--------------------------------------------------
*.html: 150 files
*.php: 45 files
*.jpg: 30 files
*.pdf: 25 files
*.js: 20 files

Total unique URLs found: 270

------------------------------------
URL: https://example.com/page.html
First capture: 2010-01-15 14:25:10
Archive link: http://web.archive.org/web/20100115142510/https://example.com/page.html
------------------------------------

With colorama installed, the output will be nicely colorized, making it easier to read and distinguish between different types of information.

File Output

The exported file will contain all URLs with their capture dates and archive links in a clean, readable format.

🎯 Use Cases

📚 Research: Investigate the history of websites
🔒 Security: Find old versions of sensitive pages
🎨 Design: Track website design evolution
📊 Analysis: Study content distribution over time
🔍 Discovery: Find lost or removed content

⚙️ Technical Details

Uses the Wayback Machine CDX API
Implements efficient URL deduplication
Handles rate limiting and timeouts

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👏 Acknowledgments

Internet Archive for providing the Wayback Machine

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
Logo.png		Logo.png
README.md		README.md
WebPastMachine.py		WebPastMachine.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WebPastMachine

📖 Description

✨ Features

🚀 Installation

💻 Usage

Basic Usage

Advanced Options

📋 Command Line Arguments

📝 Output Format

Console Output

File Output

🎯 Use Cases

⚙️ Technical Details

📄 License

👏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

Shac0x/WebPastMachine

Folders and files

Latest commit

History

Repository files navigation

WebPastMachine

📖 Description

✨ Features

🚀 Installation

💻 Usage

Basic Usage

Advanced Options

📋 Command Line Arguments

📝 Output Format

Console Output

File Output

🎯 Use Cases

⚙️ Technical Details

📄 License

👏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages