Skip to content

Ansh5748/web-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scrape Data

A versatile web scraping tool that extracts structured data from websites through an intuitive web interface.

🚀 Features

  • Web Interface: User-friendly UI for entering URLs and viewing results
  • Smart Content Extraction: Automatically identifies and extracts:
    • Page title and headings
    • Main paragraphs
    • Links with text and URLs
    • Meta information
  • Results Visualization: View extracted data in multiple formats:
    • Summary view
    • Raw JSON data
    • Structured paragraphs and links

📋 Requirements

  • Python 3.6+
  • Flask
  • BeautifulSoup4
  • Requests

🔧 Installation

From PyPI

pip install scrape_dat

From Source

git clone https://github.com/yourusername/scrape_dat.git
cd scrape_dat
pip install -e .

🖥️ Usage

Web Interface

  1. Start the web server:
python web_app.py
  1. Open your browser and navigate to http://127.0.0.1:5000
  2. Enter a URL to scrape and click "Scrape Data"
  3. View the structured results

Command Line (if implemented)

# Basic usage
scrape-dat https://example.com

# For more options
scrape-dat --help

🧩 How It Works

  1. The application sends a request to the specified URL
  2. It parses the HTML content using BeautifulSoup
  3. Various elements are extracted:
    • Title from the <title> tag
    • Headings from <h1>, <h2>, and <h3> tags
    • Paragraphs from <p> tags
    • Links from <a> tags
    • Meta information from <meta> tags
  4. The extracted data is presented in a structured format

Screenshot

image

Screenshot 2025-06-20 105252

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages