A professional Python tool for extracting "People Also Ask" (PAA) questions from Google search results. This tool provides both a command-line interface and a REST API for easy integration into your workflows.
- Simple CLI Interface - Easy-to-use command-line tool for quick queries
- REST API - Flask-based API with proper authentication and error handling
- Batch Processing - Process multiple queries efficiently
- Configurable - Environment-based configuration management
- Type-Safe - Full type hints for better code quality
- Well-Tested - Comprehensive unit tests included
- Docker Support - Ready-to-deploy Docker configuration
- Logging - Built-in logging for debugging and monitoring
# Clone the repository
git clone https://github.com/your-username/paa-scraper.git
cd paa-scraper
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements-new.txt
# Install the package
pip install -e .# Build the image
docker build -f Dockerfile.new -t paa-scraper .
# Run the container
docker run -p 5000:5000 -e PAA_API_KEY=your-secret-key paa-scraper# Set your API key in .env file
echo "PAA_API_KEY=your-secret-key" > .env
# Start the service
docker-compose up -d# Scrape questions for a single query
paa-scraper scrape "what is python programming"
# Specify country and language
paa-scraper scrape "best restaurants" --country uk --language en
# Save results to a file
paa-scraper scrape "machine learning" --output results.json
# Batch process multiple queries from a file
paa-scraper batch queries.txt --output batch_results.jsonfrom paa_scraper import scrape_related_questions
# Scrape questions
questions = scrape_related_questions(
query="artificial intelligence",
country="us",
language="en"
)
print(f"Found {len(questions)} questions:")
for question in questions:
print(f" - {question}")# Set your API key
export PAA_API_KEY=your-secret-key
# Run the server
paa-apiHealth Check
curl http://localhost:5000/Scrape Single Query (GET)
curl "http://localhost:5000/api/v1/scrape?query=python&api_key=your-secret-key"Scrape Single Query (POST)
curl -X POST http://localhost:5000/api/v1/scrape \
-H "Content-Type: application/json" \
-d '{
"query": "machine learning",
"country": "us",
"language": "en",
"api_key": "your-secret-key"
}'Batch Scrape
curl -X POST http://localhost:5000/api/v1/batch \
-H "Content-Type: application/json" \
-d '{
"queries": ["python", "javascript", "rust"],
"country": "us",
"language": "en",
"api_key": "your-secret-key"
}'Configuration can be done via environment variables. Copy .env.example to .env and customize:
cp .env.example .env| Variable | Description | Default |
|---|---|---|
PAA_DEFAULT_COUNTRY |
Default country code | us |
PAA_DEFAULT_LANGUAGE |
Default language code | en |
PAA_REQUEST_TIMEOUT |
Request timeout in seconds | 10 |
PAA_API_HOST |
API server host | 0.0.0.0 |
PAA_API_PORT |
API server port | 5000 |
PAA_API_KEY |
API authentication key | None |
PAA_LOG_LEVEL |
Logging level | INFO |
paa-scraper/
├── src/
│ └── paa_scraper/
│ ├── __init__.py # Package initialization
│ ├── scraper.py # Core scraping logic
│ ├── text_utils.py # Text processing utilities
│ ├── config.py # Configuration management
│ ├── api/
│ │ ├── __init__.py
│ │ └── flask_app.py # Flask REST API
│ └── cli/
│ ├── __init__.py
│ └── main.py # CLI interface
├── tests/
│ ├── __init__.py
│ ├── test_scraper.py # Scraper tests
│ └── test_api.py # API tests
├── .env.example # Example environment config
├── .gitignore # Git ignore rules
├── docker-compose.yml # Docker Compose config
├── Dockerfile.new # Docker configuration
├── README.md # This file
├── requirements-new.txt # Production dependencies
├── requirements-dev.txt # Development dependencies
└── setup.py # Package setup
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
pytest
# Run tests with coverage
pytest --cov=src/paa_scraper tests/
# Format code
black src/ tests/
# Lint code
flake8 src/ tests/
pylint src/# Run all tests
python -m pytest
# Run specific test file
python -m pytest tests/test_scraper.py
# Run with verbose output
python -m pytest -v
# Run with coverage report
python -m pytest --cov=src/paa_scraper --cov-report=html{
"success": true,
"data": {
"query": "python programming",
"country": "us",
"language": "en",
"questions": [
"What is Python used for?",
"Is Python easy to learn?",
"How do I start learning Python?"
],
"count": 3
}
}{
"success": false,
"error": "Scraping failed",
"message": "Failed to fetch search results: Network timeout"
}1. No questions found
- Google may not always show PAA questions for every query
- Try different queries or check if Google is accessible from your location
2. Rate limiting
- Google may rate-limit requests from the same IP
- Consider adding delays between requests or using proxies
3. API authentication errors
- Ensure
PAA_API_KEYis set correctly - Check that the API key matches in both server and client
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 guidelines
- Use type hints for all functions
- Write docstrings for all public functions and classes
- Add tests for new features
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is for educational and research purposes only. Please respect Google's Terms of Service and robots.txt. Use responsibly and consider implementing rate limiting and caching in production environments.
Pushkar Singh
- Built with Beautiful Soup for HTML parsing
- Uses Flask for the REST API
- Inspired by the need for structured PAA data extraction