This Python script analyzes data from Google Search Console and provides insights into potential opportunities for page-level consolidation based on click and impression metrics.
Before running this script, please ensure you have the following:
- Python Dependencies: You must install the required Python dependencies on your system. You can install them using pip:
pip install pandas numpy
- Google Search Console Data: You should have a CSV file named
dataset.csvthat contains the data exported from Google Search Console. The CSV file should have the following columns:
page: The URL of the page.query: The search query.clicks: The number of clicks for the page-query combination.impressions: The number of impressions for the page-query combination.position: The average position of the page-query combination.
Note: You can use this tool to extract this data from the API (https://www.seoworkflows.com/tools/extraction/google-search-console-api). I'd recommend getting the past 3 to 6mo of data.
- Update 'CONFIG': You must update the "FILE_LOCATION" and "BRAND_VARIANTS" at the top of the
main.pyfile
-
Clone or download this repository to your local machine.
-
Navigate to the directory where the script (
main.py) is located. -
Place your
dataset.csvfile with the required columns in a newdatadirectory. -
Open a terminal or command prompt and run the script using the following command:
python main.py
- The script will generate an Excel file (
output.xlsx) with analysis results.
The script will create an Excel file with multiple tabs for different analysis stages:
all_potential_opps: Contains the final analysis output.high_likelihood_opps: Lists queries with 2+ pages marked as 'Potential Opportunity.'risk_qa_data: Includes query-level data for pages in 'all_potential_opps.'
- Queries marked as 'Potential Opportunity' are considered for consolidation.
- Queries marked as 'Risk' suggest caution, as they may risk more valuable queries during consolidation.
Please review the Excel file generated by the script for consolidation opportunities and further instructions.
Note: This README assumes that you have already obtained the necessary data from Google Search Console and that it is stored in the required format in the dataset.csv file.
Important: Only use this process if you have a large amount of rows within the high_likelihood_opps tab of the output dataset from the main.py script. Doing this at scale runs risks, it's tailored for large websites with much similar content, where cannibalization risks are high.
- Create a new file called
data.csvin the root directory containing all the data from thehigh_likelihood_oppstab of the output dataset - Run the script to create an automated
url_from+url_toredirect map, which can be used to write 301 redirects at scale.