Google Search Console - SEO Cannibalization Analysis Script

This Python script analyzes data from Google Search Console and provides insights into potential opportunities for page-level consolidation based on click and impression metrics.

Prerequisites

Before running this script, please ensure you have the following:

Python Dependencies: You must install the required Python dependencies on your system. You can install them using pip:

pip install pandas numpy

Google Search Console Data: You should have a CSV file named dataset.csv that contains the data exported from Google Search Console. The CSV file should have the following columns:

page: The URL of the page.
query: The search query.
clicks: The number of clicks for the page-query combination.
impressions: The number of impressions for the page-query combination.
position: The average position of the page-query combination.

Note: You can use this tool to extract this data from the API (https://www.seoworkflows.com/tools/extraction/google-search-console-api). I'd recommend getting the past 3 to 6mo of data.

Update 'CONFIG': You must update the "FILE_LOCATION" and "BRAND_VARIANTS" at the top of the main.py file

Running main.py

Clone or download this repository to your local machine.
Navigate to the directory where the script (main.py) is located.
Place your dataset.csv file with the required columns in a new data directory.
Open a terminal or command prompt and run the script using the following command:

python main.py

The script will generate an Excel file (output.xlsx) with analysis results.

Output of main.py

The script will create an Excel file with multiple tabs for different analysis stages:

all_potential_opps: Contains the final analysis output.
high_likelihood_opps: Lists queries with 2+ pages marked as 'Potential Opportunity.'
risk_qa_data: Includes query-level data for pages in 'all_potential_opps.'

Interpretation

Queries marked as 'Potential Opportunity' are considered for consolidation.
Queries marked as 'Risk' suggest caution, as they may risk more valuable queries during consolidation.

Please review the Excel file generated by the script for consolidation opportunities and further instructions.

Note: This README assumes that you have already obtained the necessary data from Google Search Console and that it is stored in the required format in the dataset.csv file.

Running build_redirect_map.py

Important: Only use this process if you have a large amount of rows within the high_likelihood_opps tab of the output dataset from the main.py script. Doing this at scale runs risks, it's tailored for large websites with much similar content, where cannibalization risks are high.

Create a new file called data.csv in the root directory containing all the data from the high_likelihood_opps tab of the output dataset
Run the script to create an automated url_from + url_to redirect map, which can be used to write 301 redirects at scale.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
helpers		helpers
.gitignore		.gitignore
README.md		README.md
build_redirect_map.py		build_redirect_map.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Search Console - SEO Cannibalization Analysis Script

Prerequisites

Running main.py

Output of main.py

Interpretation

Running build_redirect_map.py

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Google Search Console - SEO Cannibalization Analysis Script

Prerequisites

Running main.py

Output of main.py

Interpretation

Running build_redirect_map.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages