Skip to content

jmelm93/seo_cannibalization_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Search Console - SEO Cannibalization Analysis Script

This Python script analyzes data from Google Search Console and provides insights into potential opportunities for page-level consolidation based on click and impression metrics.

Prerequisites

Before running this script, please ensure you have the following:

  1. Python Dependencies: You must install the required Python dependencies on your system. You can install them using pip:
  • pip install pandas numpy
  1. Google Search Console Data: You should have a CSV file named dataset.csv that contains the data exported from Google Search Console. The CSV file should have the following columns:
  • page: The URL of the page.
  • query: The search query.
  • clicks: The number of clicks for the page-query combination.
  • impressions: The number of impressions for the page-query combination.
  • position: The average position of the page-query combination.

Note: You can use this tool to extract this data from the API (https://www.seoworkflows.com/tools/extraction/google-search-console-api). I'd recommend getting the past 3 to 6mo of data.

  1. Update 'CONFIG': You must update the "FILE_LOCATION" and "BRAND_VARIANTS" at the top of the main.py file

Running main.py

  1. Clone or download this repository to your local machine.

  2. Navigate to the directory where the script (main.py) is located.

  3. Place your dataset.csv file with the required columns in a new data directory.

  4. Open a terminal or command prompt and run the script using the following command:

  • python main.py
  1. The script will generate an Excel file (output.xlsx) with analysis results.

Output of main.py

The script will create an Excel file with multiple tabs for different analysis stages:

  • all_potential_opps: Contains the final analysis output.
  • high_likelihood_opps: Lists queries with 2+ pages marked as 'Potential Opportunity.'
  • risk_qa_data: Includes query-level data for pages in 'all_potential_opps.'

Interpretation

  • Queries marked as 'Potential Opportunity' are considered for consolidation.
  • Queries marked as 'Risk' suggest caution, as they may risk more valuable queries during consolidation.

Please review the Excel file generated by the script for consolidation opportunities and further instructions.

Note: This README assumes that you have already obtained the necessary data from Google Search Console and that it is stored in the required format in the dataset.csv file.

Running build_redirect_map.py

Important: Only use this process if you have a large amount of rows within the high_likelihood_opps tab of the output dataset from the main.py script. Doing this at scale runs risks, it's tailored for large websites with much similar content, where cannibalization risks are high.

  1. Create a new file called data.csv in the root directory containing all the data from the high_likelihood_opps tab of the output dataset
  2. Run the script to create an automated url_from + url_to redirect map, which can be used to write 301 redirects at scale.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages