Automated tool for uploading DOCX articles to WordPress sites with NLP-based tag generation and intelligent title processing.
- Automated Upload: Seamlessly upload DOCX files to WordPress via Selenium WebDriver
- Multi-Site Support: Configure and manage multiple WordPress sites
- NLP Tag Generation: Automatic tag extraction using spaCy and NLTK
- Intelligent Title Processing: Clean and format article titles from filenames
- Error Handling: Robust error handling and comprehensive logging
- CLI Interface: Easy-to-use command-line interface
- Secure Credentials: Environment-based configuration (no hardcoded passwords)
- Progress Tracking: Automatic marking of uploaded articles
- Python 3.8 or higher
- Chrome browser installed
- WordPress site with Mammoth DOCX Converter plugin installed
-
Clone the repository
git clone <repository-url> cd uploading_articles
-
Create a virtual environment (recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Download spaCy language model
python -m spacy download en_core_web_sm
-
Configure environment variables
cp .env.example .env
Edit
.envand add your WordPress credentials:TWINSTRIPE_URL=https://twinstripe.com/wp-admin TWINSTRIPE_USERNAME=your_username TWINSTRIPE_PASSWORD=your_password # Add other sites as needed
uploading_articles/
├── wordpress_uploader/ # Main package
│ ├── core/ # Core functionality
│ │ ├── uploader.py # WordPress uploader class
│ │ └── orchestrator.py # Upload orchestration
│ ├── utils/ # Utility modules
│ │ ├── nlp_processor.py # NLP tag generation
│ │ ├── file_manager.py # File operations
│ │ └── logger.py # Logging setup
│ └── config/ # Configuration
│ └── settings.py # Settings and site configs
├── docx2upload/ # Directory for DOCX articles
├── logs/ # Application logs
├── main.py # CLI entry point
├── requirements.txt # Python dependencies
├── .env # Environment variables (not in repo)
└── README.md # This file
Upload all pending articles to a site:
python main.py --site twinstripe# Upload limited number of articles
python main.py --site forkspoon --limit 5
# Run in headless mode (no browser GUI)
python main.py --site heroasian --headless
# Skip NLP tag generation
python main.py --site twinstripe --no-tags
# Skip automatic title setting
python main.py --site twinstripe --no-title
# Adjust logging level
python main.py --site twinstripe --log-level DEBUGtwinstripe- TwinStripeforkspoon- Fork and Spoon Kitchenheroasian- Hero Asian Kitchenstancic- Stancic Health and Wellness
python main.py --helpKey configuration options in .env:
- Site Credentials:
{SITE}_URL,{SITE}_USERNAME,{SITE}_PASSWORD - Articles Directory:
ARTICLES_DIR(default:docx2upload) - Timeouts:
UPLOAD_TIMEOUT,PROCESS_TIMEOUT,IMPLICIT_WAIT - Browser Mode:
HEADLESS_MODE(true/false)
-
Add credentials to
.env:NEWSITE_URL=https://example.com/wp-admin NEWSITE_USERNAME=admin NEWSITE_PASSWORD=password
-
Add site configuration to
wordpress_uploader/config/settings.py:"newsite": { "name": "New Site Name", "url": os.getenv("NEWSITE_URL"), "username": os.getenv("NEWSITE_USERNAME"), "password": os.getenv("NEWSITE_PASSWORD"), # ... XPath selectors }
- File Discovery: Scans
docx2upload/for DOCX files not prefixed with "UPLOADED" - Authentication: Logs into WordPress admin panel
- Article Processing:
- Navigates to new post creation
- Uploads DOCX file via file input
- Extracts and sets title from filename
- Generates relevant tags using NLP
- Publishes the post
- File Marking: Renames uploaded files with "UPLOADED" prefix
- Logging: Records all activities to log files and console
Logs are stored in the logs/ directory with timestamps. Each run creates a new log file:
- Console output: INFO level and above
- File output: DEBUG level and above
The tool uses webdriver-manager to automatically download and manage ChromeDriver. If you encounter issues:
# Clear ChromeDriver cache
rm -rf ~/.wdm/python -m spacy download en_core_web_sm- Verify credentials in
.envfile - Check that WordPress site is accessible
- Ensure no 2FA is enabled on the WordPress account
- WordPress theme may have different XPath selectors
- Update XPath values in
wordpress_uploader/config/settings.py - Use browser DevTools to inspect elements and find correct XPaths
- Never commit
.envfile to version control - Store credentials securely
- Use application-specific passwords when available
- Regularly rotate passwords
# Install dev dependencies
pip install pytest pytest-cov
# Run tests
pytest
# With coverage
pytest --cov=wordpress_uploaderThis project follows PEP 8 style guidelines. Use black for formatting:
pip install black
black wordpress_uploader/This project is licensed under the MIT License.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Initial release
- Multi-site WordPress article uploader
- NLP-based tag generation
- Automated title processing
- Comprehensive error handling and logging
- CLI interface with multiple options
For issues, questions, or contributions, please open an issue on GitHub.
- Selenium WebDriver for browser automation
- spaCy for NLP capabilities
- NLTK for natural language processing
- WordPress Mammoth plugin for DOCX conversion