Entries by Greg Bernhardt

Evaluate Subreddit Posts in Bulk Using GPT4 Prompting

Reddit is a rich source of user-generated content — people share opinions, ask questions, and discuss a wide range of topics. Analyzing subreddit posts reveals what users care about, emerging trends, and common pain points. Combined with OpenAI’s language models, you can quickly generate summaries, topic ideas, or keyword suggestions tailored to your goals. In […]

Calculate Similarity Between Article Elements Using spaCy

In this Python SEO tutorial, we’ll walk through a script that uses SpaCy to calculate similarity metrics between content keywords and an article’s body. This analysis helps SEOs and content creators assess content relevance and keyword alignment. Using natural language processing (NLP), we’ll compute similarity scores to gauge how well keywords match main content. The […]

Audit URLs for SEO Using ahrefs Backlink API Data

This step-by-step SEO tutorial shows a Python script that retrieves and analyzes domain data using the Ahrefs API. It helps SEOs, webmasters, and data analysts monitor metrics such as broken backlinks, total backlinks, and domain rating. Note: The script requires an Ahrefs API key. For large-scale or high-speed requests, use a paid API plan; free […]

Storing CrUX CWV Data for URLs Using Python for SEOs

The CWV panic days appear to be over, but keeping tabs on the data is still useful. CrUX data is useful for SEOs because it offers performance analysis opportunities for real-user Core Web Vitals metrics like Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS), which are minor ranking factors. By […]

Scraping YouTube Video Page Metadata with Python for SEO

In this tutorial, we explore a Python script that scrapes and analyzes YouTube video metadata for free. This framework provides a solid starting point for tools aimed at SEOs, content creators, and data analysts. Note that paid APIs such as SerpAPI can process requests faster and more reliably but are not free. If you prefer […]

Calculate SERP Rank Readability Scores Using Python

  Readability scores are not a verified SEO ranking factor. That said, you should still care: if your content doesn’t match your audience’s reading level, you may see higher bounce rates, lower engagement, and fewer conversions. Your audience expects content written appropriately for the subject matter; when your content meets those expectations, better results follow. […]

Find Interlinking Opps via Entity N-gram Matches Using Python

Any seasoned SEO knows that finding internal links at scale is difficult but important. This is especially true if your content isn’t well organized topically. If your blog is disorganized or full of seemingly random articles and you need to add internal links intelligently, this tutorial is for you. In this Python SEO tutorial, I’ll […]

Collect Domain Security Information with Python

In this tutorial, we will learn how to automate the collection of various domain-related technical information using Python. The script gathers data such as WHOIS details, DNS records, SSL certificates, reverse IP lookup, blacklist status, robots.txt, and more. Using the pandas library, we also show how to store the collected data in a CSV file. […]

Build and Run Python Scripts on the Fly With GPT-3

GPT-3 and its forms have taken the world by storm and for good reason. It’s an exciting time full of possibilities. The limits are being pushed every day. Python as an SEO skill has always been a bit niche due to some learning curve. Today, we start blowing that learning curve out of the water! […]

Compare Keyword SERP Similarity in Bulk with Python

Studying search engine results pages (SERPs) is one of SEO’s oldest and still best methods for understanding keywords and how Google treats them. When conducting keyword research, we often end up with a large list of candidates. Then we must validate and clean that list. Cleaning matters because items that appear to be unique opportunities […]

Analyze SERP Backlink Profiles in Bulk for SEO Using Python

The importance of backlinks as a quality signal has changed little since the early 2000s. Algorithms have evolved, but backlinks remain a strong ranking factor. Looking up your backlink stats and comparing them to competitors is an SEO staple. What’s new is operationalizing this process and working smarter. Reduce one-off competitor checks — analyze in […]

Detect Generic Anchor Text in Links for SEO using Python

Optimizing anchor text for internal links has long been a core SEO practice. Google documents anchor text in its SEO guidelines. Anchor text gives users and search engines contextual signals about the linked page’s topic. It’s an opportunity to tell both Google and users what the next page is about and why it is relevant. […]

Detect Text in Images in Bulk With Tesseract Using Python for SEO

Imagery in articles can be a wonderful communication device when used correctly. One issue that still plagues SEO content teams is how to properly handle text in images. Historically, text within an image is trapped and the contextual message is lost to search engines that didn’t have the processing power to decode (they still likely […]

Extracting Data from PDFs Using PDFMiner

PDF files are ubiquitous in various industries, but programmatically extracting data from them can be complex. PDFMiner, a powerful Python library, helps parse and extract content from PDFs in formats like plain text, HTML, XML, or tagged text. This tutorial explains how to use a comprehensive PDF extraction script. We’ll explore its structure and functionality […]

Classify Anchor Text N-Grams for Interlinking Insights with Python

In this Python SEO tutorial, I’ll show you a programmatic method to start analyzing your internal anchor text for topical relevance. Internal anchor text remains one of the most powerful topical endorsements you can provide. Anchor texts are explicit contextual signals Google can use to help understand and calculate the linked page’s topical authority. Let’s […]

Webpage Word Sense Disambiguation for SEO Using Python and NLTK

In semantics, ambiguity is partially defined as a word having multiple “senses”. A sense is a meaning or definition. Effective content in SEO should be as free of ambiguity as possible. When you have ambiguity in your content you risk machines (that evaluate your content via natural language understanding), not being able to understand your […]

Calculate GSC CTR Stats By Position Using Python for SEO

Last week SEO Clarity came out with a new SERP CTR study. The numbers were lower than I expected even as an average for all queries. It got me thinking. What is MY average CTR by position? Turns out, it’s much higher. This is likely due to good SEO by optimizing the title, meta, and […]

Use Python and Google Trends to Forecast Your Top GSC Keywords

Google Search Console already gives SEO’s amazing historical data for how the queries you rank for are performing. Google Trends also is a useful platform that can give insights into a query’s relative popularity within Google’s system (by Geo) historically and a little forecasting for the future. What if we could begin to marry these […]

Detect Google SERP Title and Snippet Rewrites with Python

Back in early August of 2021 word began to travel through the industry that titles were being rewritten in the Google SERPs in a frequency and manner not seen before. Plenty of SERP analysis has been done to understand the why, how, and what to do about it, but it starts with an analysis of […]

Use Python to Create a GSC to BigQuery Pipeline

Google Search Console is likely the most important source of data for an SEO. However, like most GUI platforms, it suffers from the same large downside. You’re stuck in a GUI that only gives you 16 months of data. You can manually export data to a Google Sheet. Exporting to a Google Sheet is fine, […]

Overlay GSC Data with Google Algo Updates Using Python

Most SEO’s hearts skip a beat when they hear a Google algorithm update is unfolding and for a few days relentlessly check analytics. Then, there is a natural lull, the panic or excitement fades and you get back to your work. Google algorithms don’t always result in a dramatic spike one way or another. It […]

Build an N-Gram Text Analyzer for SEO using Python

The days where content SEO was simply copywriting are over. Modern content SEO now employs massive resources for technical analysis for the words you write/manage. Actually, this has been the case for nearly 10 years now with the introduction of machine learning in search engines. The tools are now widely available to SEOs to achieve […]

Bulk IP Filter for Google Analytics Using Python and RegEx

Not every Python script needs to be complex, long, and work of art. Sometimes it can help with quick mundane tasks. One such opportunity presented itself a couple of weeks ago where a client asked for 50+ IPs to be filtered from a Google Analytics view. We could have taken 20 minutes and manually added […]

Compare Wikipedia Search Data with Google Trends with Python

There are countless ways to understand trends which are important in understanding the past, present, and future. I’m sure everyone is familiar with Google Trends. No doubt it’s very powerful, but there are options as well. One being Wikipedia. Wikipedia currently is the 4th most visit website in the US. If only there were a […]

Measure Causal Impact from GSC Data Using Python

Causal Impact is a Bayesian-like statistical algorithm pioneered by Kay Brodersen working at Google that aims to predict the counterfactual after an event. Take for example you make a large SEO change to a website. Sometimes it’s not obvious whether or not the change was beneficial. You can compare against the past, but the past […]

Competitive SEO URL Analysis with Python

Match your URLs to your competitor’s URLs, find title keyword and ranking keyword count differences with this step-by-step Python SEO tutorial.  SEO is not an island. You are not simply improving your site/pages in a vacuum. You need to consider your competition as you all are jockeying for positions in the same SERPs. Some URLs […]

Use Python to Label Query Intent, Entities and Keyword Count

Query analysis is a large topic, but I wanted to focus on intent and entity recognition. Intent and entity recognition are very important concepts to understand in SEO. Google’s use of machine learning has rapidly increased since 2013 when they introduced their Knowledge Graph. For intent, what is important is how Google’s understanding of the […]

Generate a 404 Redirect List for SEO with Polyfuzz Using Python

We’ve all had a client where we pop in their Google Search Console or ahrefs account and see they have hundreds or thousands of reported 404s. Perhaps from a migration or perhaps a decade of regular pruning. This tutorial won’t cover evaluating whether they are worth redirecting or not, but rather simply the case if […]

Greg Bernhardt Joins Webinar on How to Perform a Content Audit

I had a great time joining this webinar on content audits for SEO. Thanks for the opportunity Authoritas! Was great to see and hear from Laura Monckton and Daniel Heredia Mejias! Sadly no Python in this webinar, but great advice nonetheless! See the outline and video below… When Evaluating Content for SEO, Consider These 7 Core Concepts: Accessibility […]

Analyze Words Using WordsAPI App and Python for SEO

Ask any SEO writer, the words you choose for your copy matter. Sometimes we think we know the attributes, relationships, and the word universe words live in, but often we don’t. It can be a challenge to generate ideas and inspiration from singular words. Understanding words can help you explore possibilities for your content that […]

Scraping YouTube Video Pages for SEO with Python

I had a project this week that tasked my team with optimizing YouTube tags for a couple hundred videos. We could do it manually but thought this was a nice chance to use Python. Our idea was we could scrape YouTube video page information, put it into a spreadsheet for easier organization and identification of […]

SEO Data Blending with Python for Beginners

Data is everything for an SEO and it’s all too often scattered across proprietary platforms that do a good job of visualizing and analyzing that data according to how they think you want. Even when these platforms give you export methods you still need to load it into Excel and Google Sheets and perform some […]

Crawl and Optimize All Website Images With Python

Last month I released a tutorial for automating new image optimization over FTP. This time we’re going to crawl an entire website and locally optimize the images we come across, organized by URL. Note this short but intermediate level script is not for massive sites as it is. For one thing, all images are dumped […]

Automate Image Compression with Python over FTP

Image compression isn’t new to the tech SEO world, but with site performance and Core Web Vitals now influencing rankings, it’s time to take action. I’ve done dozens of site audits and find that roughly 80% of performance issues fall into two buckets: images or JavaScript. When images are a major issue, I cheer — […]

Create a Custom Twitter Tweet Alert System with Python

Do you follow hundreds or thousands of Twitter accounts and miss important announcements from Google Search? Were you late to learn about the latest core update? In this tutorial I’ll show you how to build a simple Twitter alert system using Python. We’ll use the advertools module, created by Elias Dabbas, to connect to the […]

Find Search Volume Ceiling for Keyword Categories Using Python

In this tutorial I’ll show how to use Python to generate broad keyword categories from current ranking keyword data and automatically label keywords with those categories. This helps you get an overview of the topics you rank for and the potential search-volume ceilings for each category. The largest opportunities often appear in categories with the […]

Analyze Crawled PDF Text Using Python for SEO

Google has indexed PDFs for many years and treats them like web pages, so it’s logical to analyze your PDFs for optimization opportunities just as you would a web page. This can be more challenging because of file constraints, but we can begin the process using Python. I’ll show how easy it is to convert […]

Use Python and Brightlocal API to Grab Your Keyword Rankings

BrightLocal is a common tool for local SEO, citations, and ranking tracking. It offers an extensive API that lets you retrieve data and automate many tasks. In this tutorial I show how to fetch ranking data from the BrightLocal API using Python. The BrightLocal API is paid, but it includes 1,000 free calls for testing. […]

Automate Google Lighthouse with Python for SEO Reports

Google’s web page scanner Lighthouse has been a fixture among the most important tools for evaluating web pages. At a high level, this scanner measures a page’s performance, SEO, accessibility, and best practices. At a deeper level, it provides granular metrics for each category and displays recommendations. Most SEOs are familiar with running Lighthouse within […]

Detect Website Technologies with Python & BuiltWith API

For SEO audits, one area you may want to detect and store is the different technologies a website uses. Sure, you can spot-check and run a few console commands, but what if an API could do it all for you? The service BuiltWith provides that capability. BuiltWith offers several useful APIs, but this tutorial focuses […]

Compare Web Page Entities with Google NLP in Python

This is part 2 of a two-part series. Please see Getting Started with Google NLP API Using Python first. For search engines and SEO, Natural Language Processing (NLP) has been a revolution. NLP is the methodology by which machines understand human language. This matters because machines perform the bulk of page evaluation. While some knowledge […]

Getting Started with Google NLP API Using Python

Natural Language Processing (NLP) has been a revolution for search engines and SEO. NLP is the set of methods that lets machines understand human language. This matters because machines now perform the bulk of page evaluation, not humans. Although understanding some of the science behind NLP is useful, today’s tools let you apply NLP without […]

Use Python to Scrape Technical Info for Domains

SEOs wear many hats. During a technical audit or troubleshooting, it’s useful to have a domain’s public technical information on hand. Below are some Python tools you can use to fetch that domain information. You can easily loop this over your client list—using a Python list or a database—and automate it to run every morning […]

Website Uptime Monitor With LEDs and LCD Screen Using Python

Earlier in the tutorial, SEO Guide to Creating a Website Uptime Monitor Using Python I showed you how to create a simple uptime monitor and store that information in MySQL. This next phase is a direct extension of the previous tutorial and assumes you have that all set up. The code in the tutorial will […]

Monitor robots.txt Changes with Python and Difflib

Robots.txt is a useful tool for SEOs to control crawling by spiders. However, it is sensitive: a simple mistake can cause significant damage. When collaborating with an SEO team or a client’s developers, someone might tinker where they shouldn’t, or accidentally make a change. Damage is easier to mitigate if caught early. Use the Python […]

Retrieve the Google Cache Date for URLs Using Python

Viewing cached links in Google is a common troubleshooting and information-recovery method used by SEOs. Google caches some pages it crawls and creates a snapshot of the page at the time of the crawl. You will often notice missing resources or images, so the cache is rarely a perfect copy, but it is useful for […]

Extract Google Suggestions API Data for SEO Insights with Python

One of the main tenets of SEO is understanding the search climate for the keywords you are targeting: the trends, what people are searching for, and the relative search volume. Those insights help generate targeted keyword ideas. An often-overlooked source is the Google Suggestions API — it taps Google’s search autocomplete feature. You can call […]

Find Keyword Opportunities with Google Trends, Python and Ahrefs

Google Trends has long been a powerful tool for SEOs. Understanding past, present, and emerging trends helps reveal seasonality and major events such as the coronavirus pandemic. Who back in 2019 would have thought that toilet paper would hit 100 on Google Trends in March 2020? The Google Trends web interface is user-friendly and reveals […]

Submit a WordPress Gravity Form via API with Python

Gravity Forms is a popular WordPress form plugin. If you run a leads-based business, you need to know your form is working and that prospects can contact you. We’ve all looked at the entry log and noticed an unusual gap between the last entry and the present — maybe there was a glitch. How would […]

Use Python and Chrome to Take Webpage Screenshots

In 2017 Chrome, Google released a headless (no GUI) mode that can capture a screenshot of a web page from a specified viewport. This is useful for archiving pages for version comparison, monitoring, and client deliverables. Because it runs without a GUI, it is well suited for automation with Python. In just a few lines […]

How to Get Cached Pages From Wayback Machine API

Archive.org’s Wayback Machine is a staple in the SEO industry for examining cached historical web pages. Each cached page is called a snapshot. It’s useful for tracking progress, troubleshooting issues, or—if you’re lucky—recovering data. The Wayback Machine GUI can be slow or frustrating. The steps below show how to use Python to call the free […]