Skip to content

pushkarsingh32/wordpress-article-uploader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WordPress Article Uploader

Automated tool for uploading DOCX articles to WordPress sites with NLP-based tag generation and intelligent title processing.

Features

  • Automated Upload: Seamlessly upload DOCX files to WordPress via Selenium WebDriver
  • Multi-Site Support: Configure and manage multiple WordPress sites
  • NLP Tag Generation: Automatic tag extraction using spaCy and NLTK
  • Intelligent Title Processing: Clean and format article titles from filenames
  • Error Handling: Robust error handling and comprehensive logging
  • CLI Interface: Easy-to-use command-line interface
  • Secure Credentials: Environment-based configuration (no hardcoded passwords)
  • Progress Tracking: Automatic marking of uploaded articles

Prerequisites

  • Python 3.8 or higher
  • Chrome browser installed
  • WordPress site with Mammoth DOCX Converter plugin installed

Installation

  1. Clone the repository

    git clone <repository-url>
    cd uploading_articles
  2. Create a virtual environment (recommended)

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Download spaCy language model

    python -m spacy download en_core_web_sm
  5. Configure environment variables

    cp .env.example .env

    Edit .env and add your WordPress credentials:

    TWINSTRIPE_URL=https://twinstripe.com/wp-admin
    TWINSTRIPE_USERNAME=your_username
    TWINSTRIPE_PASSWORD=your_password
    
    # Add other sites as needed

Project Structure

uploading_articles/
├── wordpress_uploader/          # Main package
│   ├── core/                    # Core functionality
│   │   ├── uploader.py         # WordPress uploader class
│   │   └── orchestrator.py     # Upload orchestration
│   ├── utils/                   # Utility modules
│   │   ├── nlp_processor.py    # NLP tag generation
│   │   ├── file_manager.py     # File operations
│   │   └── logger.py           # Logging setup
│   └── config/                  # Configuration
│       └── settings.py         # Settings and site configs
├── docx2upload/                 # Directory for DOCX articles
├── logs/                        # Application logs
├── main.py                      # CLI entry point
├── requirements.txt             # Python dependencies
├── .env                         # Environment variables (not in repo)
└── README.md                    # This file

Usage

Basic Usage

Upload all pending articles to a site:

python main.py --site twinstripe

Advanced Options

# Upload limited number of articles
python main.py --site forkspoon --limit 5

# Run in headless mode (no browser GUI)
python main.py --site heroasian --headless

# Skip NLP tag generation
python main.py --site twinstripe --no-tags

# Skip automatic title setting
python main.py --site twinstripe --no-title

# Adjust logging level
python main.py --site twinstripe --log-level DEBUG

Available Sites

  • twinstripe - TwinStripe
  • forkspoon - Fork and Spoon Kitchen
  • heroasian - Hero Asian Kitchen
  • stancic - Stancic Health and Wellness

CLI Help

python main.py --help

Configuration

Environment Variables

Key configuration options in .env:

  • Site Credentials: {SITE}_URL, {SITE}_USERNAME, {SITE}_PASSWORD
  • Articles Directory: ARTICLES_DIR (default: docx2upload)
  • Timeouts: UPLOAD_TIMEOUT, PROCESS_TIMEOUT, IMPLICIT_WAIT
  • Browser Mode: HEADLESS_MODE (true/false)

Adding New Sites

  1. Add credentials to .env:

    NEWSITE_URL=https://example.com/wp-admin
    NEWSITE_USERNAME=admin
    NEWSITE_PASSWORD=password
  2. Add site configuration to wordpress_uploader/config/settings.py:

    "newsite": {
        "name": "New Site Name",
        "url": os.getenv("NEWSITE_URL"),
        "username": os.getenv("NEWSITE_USERNAME"),
        "password": os.getenv("NEWSITE_PASSWORD"),
        # ... XPath selectors
    }

How It Works

  1. File Discovery: Scans docx2upload/ for DOCX files not prefixed with "UPLOADED"
  2. Authentication: Logs into WordPress admin panel
  3. Article Processing:
    • Navigates to new post creation
    • Uploads DOCX file via file input
    • Extracts and sets title from filename
    • Generates relevant tags using NLP
    • Publishes the post
  4. File Marking: Renames uploaded files with "UPLOADED" prefix
  5. Logging: Records all activities to log files and console

Logging

Logs are stored in the logs/ directory with timestamps. Each run creates a new log file:

  • Console output: INFO level and above
  • File output: DEBUG level and above

Troubleshooting

ChromeDriver Issues

The tool uses webdriver-manager to automatically download and manage ChromeDriver. If you encounter issues:

# Clear ChromeDriver cache
rm -rf ~/.wdm/

spaCy Model Not Found

python -m spacy download en_core_web_sm

Login Failures

  • Verify credentials in .env file
  • Check that WordPress site is accessible
  • Ensure no 2FA is enabled on the WordPress account

Element Not Found Errors

  • WordPress theme may have different XPath selectors
  • Update XPath values in wordpress_uploader/config/settings.py
  • Use browser DevTools to inspect elements and find correct XPaths

Security Notes

  • Never commit .env file to version control
  • Store credentials securely
  • Use application-specific passwords when available
  • Regularly rotate passwords

Development

Running Tests

# Install dev dependencies
pip install pytest pytest-cov

# Run tests
pytest

# With coverage
pytest --cov=wordpress_uploader

Code Style

This project follows PEP 8 style guidelines. Use black for formatting:

pip install black
black wordpress_uploader/

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Changelog

Version 1.0.0 (2025)

  • Initial release
  • Multi-site WordPress article uploader
  • NLP-based tag generation
  • Automated title processing
  • Comprehensive error handling and logging
  • CLI interface with multiple options

Support

For issues, questions, or contributions, please open an issue on GitHub.

Acknowledgments

  • Selenium WebDriver for browser automation
  • spaCy for NLP capabilities
  • NLTK for natural language processing
  • WordPress Mammoth plugin for DOCX conversion

About

Automated WordPress article uploader with NLP-based tag generation. Upload DOCX files to WordPress sites with intelligent title processing and comprehensive logging.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages