Getting Started with Google NLP API Using Python
Natural Language Processing (NLP) has been a revolution for search engines and SEO. NLP is the set of methods that lets machines understand human language. This matters because machines now perform the bulk of page evaluation, not humans. Although understanding some of the science behind NLP is useful, today’s tools let you apply NLP without a data-science degree. By understanding how machines interpret our content, we can correct misalignment or ambiguity.
This is a two-part series:
- Process using user-entered text
- Process a comparison between two different web pages
In this intermediate tutorial, I’ll walk you through basic implementations of four of the five Google NLP API features (excluding Syntax). Given a text, we will:
- Identify Entities and generate salience scores
- Calculate sentiment scores
- Calculate sentiment magnitude
- Categorize text
I recommend reading the Google NLP documentation for instructions on setting up Google Cloud Platform, enabling the NLP API, and configuring authentication: Google NLP documentation.
These scripts include modified portions from Google’s samples — no need to reinvent the wheel.
Table of Contents
Requirements and Assumptions
- Python 3 is installed and you understand basic Python syntax
- Access to a Linux installation (I recommend Ubuntu) or Google Colab
- A Google Cloud Platform account
- NLP API enabled
- A service account was created and its JSON key downloaded
Import Modules and Set Authentication
We’ll import several modules. In Google Colab they are preinstalled; otherwise, install the Google NLP client library.
- os – set the environment variable for credentials
- google.cloud – Google’s NLP modules
- numpy – used for a dictionary comparison function
- matplotlib – used for scatter plots
import os from google.cloud import language_v1 from google.cloud.language_v1 import enums from google.cloud import language from google.cloud.language import types import numpy as np import matplotlib.pyplot as plt from matplotlib.pyplot import figure
Next, set the environment variable that points to your credentials JSON file. Google uses this environment variable for authentication. The example assumes Google Colab (upload the file). On Linux (I use Ubuntu), add the following line to ~/.profile or ~/.bashrc and replace “path_to_json_credentials_file” as needed: export GOOGLE_APPLICATION_CREDENTIALS="path_to_json_credentials_file". Keep this JSON file secure.
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "path_to_json_credentials_file"
Now we’re ready to use the API. The code below is presented as a single block so you can see it in context. The text_content variable holds the text to analyze. I limit it to 1000 characters (one unit) because Google NLP API pricing is unit-based — avoid pasting very long texts to prevent unexpected charges. Next we initialize the NLP client, specify the document type (plain text), and set a language (optional; the API can auto-detect).
After packaging the request, we send it to Google’s NLP. We loop over the returned entities to print name, type, salience score, and metadata. Salience is rounded to three decimal places; adjust as needed.
Identify Entities
text_content = "The key to successful internet marketing is to make decisions that make sense for your business, your company and your customers. We work with you to build a custom strategy that drives both visits and conversions."
text_content = text_content[0:1000]
client = language_v1.LanguageServiceClient()
type_ = enums.Document.Type.PLAIN_TEXT
language = "en"
document = {"content": text_content, "type": type_, "language": language}
encoding_type = enums.EncodingType.UTF8
response = client.analyze_entities(document, encoding_type=encoding_type)
for entity in response.entities:
print(u"Entity Name: {}".format(entity.name))
print(u"Entity type: {}".format(enums.Entity.Type(entity.type).name))
print(u"Salience score: {}".format(round(entity.salience,3)))
for metadata_name, metadata_value in entity.metadata.items():
print(u"{}: {}".format(metadata_name, metadata_value))
print('\n')
Below is the example entity output for the text (from Wikipedia): “Summerfest, the largest music festival in the world, is also a large economic engine and cultural attraction for the city. In 2018, Milwaukee was named “The Coolest City in the Midwest” by Vogue magazine.”
The salience score measures an entity’s relative importance within the text. If salience scores don’t align with your content goals, adjust the text to steer the model. “MID” stands for machine ID — an identifier for the entity. Entities with mids indicate strong confidence and often correspond to entries in the Google Knowledge Graph.

Calculate Sentiment Score
We’ll pass the document to client.analyze_sentiment(), which returns a sentiment score and magnitude. Here we process the score (magnitude is handled next). Scores are rounded to four decimal places. Sentiment score ranges from -1 (most negative) to 1 (most positive). The code maps score ranges to human-readable labels and prints the result. I also visualize the score using a simple scatter plot as a number line, coloring negative scores red and non-negative scores green; adjust as desired.
document = types.Document(
content=text_content,
type=enums.Document.Type.PLAIN_TEXT)
sentiment = client.analyze_sentiment(document=document).document_sentiment
sscore = round(sentiment.score,4)
smag = round(sentiment.magnitude,4)
if sscore < 1 and sscore < -0.5:
sent_label = "Very Negative"
elif sscore < 0 and sscore > -0.5:
sent_label = "Negative"
elif sscore == 0:
sent_label = "Neutral"
elif sscore > 0.5:
sent_label = "Very Positive"
elif sscore > 0 and sscore < 0.5:
sent_label = "Positive"
print('Sentiment Score: {} is {}'.format(sscore,sent_label))
predictedY =[sscore]
UnlabelledY=[0,1,0]
if sscore < 0:
plotcolor = 'red'
else:
plotcolor = 'green'
plt.scatter(predictedY, np.zeros_like(predictedY),color=plotcolor,s=100)
plt.yticks([])
plt.subplots_adjust(top=0.9,bottom=0.8)
plt.xlim(-1,1)
plt.xlabel('Negative Positive')
plt.title("Sentiment Attitude Analysis")
plt.show()
Below is the sentiment output for the text we used for the entity analysis above. As you can see, it registers as slightly positive.

Calculate Sentiment Magnitude
Next we process and visualize sentiment magnitude. Magnitude quantifies the amount of emotional content in the text. The code labels magnitude: 0–1 as no/little emotion, 1–2 as low emotion, and 2+ as high emotion. Larger documents often have larger magnitudes, so adjust these thresholds as needed. The visualization follows the same approach used for the sentiment score.
if smag > 0 and smag < 1:
sent_m_label = "No Emotion"
elif smag > 2:
sent_m_label = "High Emotion"
elif smag > 1 and smag < 2:
sent_m_label = "Low Emotion"
print('Sentiment Magnitude: {} is {}'.format(smag,sent_m_label))
predictedY =[smag]
UnlabelledY=[0,1,0]
if smag > 0 and smag < 2:
plotcolor = 'red'
else:
plotcolor = 'green'
plt.scatter(predictedY, np.zeros_like(predictedY),color=plotcolor,s=100)
plt.yticks([])
plt.subplots_adjust(top=0.9,bottom=0.8)
plt.xlim(0,5)
plt.xlabel('Low Emotion High Emotion')
plt.title("Sentiment Magnitiude Analysis")
plt.show()
Below is the sentiment magnitude for the text. Values near zero indicate little or neutral emotion.

Calculate Categorization
Category analysis assigns the text to Google-defined categories when the API has sufficient confidence.
response = client.classify_text(document)
for category in response.categories:
print(u"Category name: {}".format(category.name))
print(u"Confidence: {}%".format(int(round(category.confidence,3)*100)))
Below is the calculated categorization. If the categories don’t match your intent, adjust the content and try again.

Here is the Google Colab notebook.
Now you have tools to identify entities, categorize text, and calculate sentiment and magnitude. Part two will show how to apply these NLP tools to web page content rather than pasted text; in part three we’ll compare two web pages. Stay tuned!
Google NLP and Python FAQ
- Evaluate Subreddit Posts in Bulk Using GPT4 Prompting - December 12, 2024
- Calculate Similarity Between Article Elements Using spaCy - November 13, 2024
- Audit URLs for SEO Using ahrefs Backlink API Data - November 11, 2024

















