Natural Language Processing

Capture Information from Unstructured Text Like Sentences and Paragraphs

Get the Meaning of All Your Documents

With Natural Language Processing, Grooper acts as an AI accelerator to help you quickly find critical, un-labeled information buried in paragraphs, sentences, or other language in your documents that convey specific meaning.

Leverage powerful machine understanding that accurately recommends correct values from the body of documents by considering the surrounding flow of human language.

Examples of Natural Language Processing

  • Find paragraphs or sentences “similar” to one or more training examples.
  • Extract data elements that span across multiple text lines or pages by understanding the flow of language
  • Determines if a date is a Loan Date or Maturity Date by “reading” the surrounding language to gain the context needed to make a decision.
  • Distinguish fine details that hugely alter meaning: “SW ¼ of the NW ¼” vs “SW ¼ and the NW ¼”.

Paragraph Detection & Analysis

The Grooper paragraph ranking engine looks at a document’s structure and intelligently groups words into paragraphs.

It then compares them against training samples to find the “best match,” and presents a recommendation list.

Paragraph Isolation

Indents, double spaces, bullets, key phrases, line length, and other factors are considered to determine where a paragraph starts and stops.

Grooper provides a console to tune paragraph detection settings for each project.

Lexical Analysis

Use Grooper data types to collect features in each paragraph. These can be n-grams, entries from a lexicon, or something different, like: address, phone number, name, etc.

The analysis spans lines of text to ensure accurate feature collection.

Data Merge

Once paragraphs with an ideal match are recognized, they’re grouped together as a single paragraph for more NLP analysis or text export.

You can process paragraphs not adjacent in the original document or are spread across multiple pages. Improve data collection from natural language documents to get the data you need.

Natural Language Processing Spatial Analysis

When many separate data elements are found on a document, how will the system know how to group portions of data?

By using radial spatial analysis, each choice is ranked by analyzing nearby words and features. In this example, you will easily find the separate data for the borrower vs. co-borrower.

Label/Value Pairs

In structured documents, data is in label/value pairs. This means that a value has a corresponding label on the page. Field labels are generally written above and/or to the left of the value.

Grooper ranks possible values by looking spatially in one or more general directions and provides a confidence level percentage.

Radial Analysis and Geotagging

This technology looks at nearby words and the direction each word is located in relation to the candidate. This leads to better accuracy in identifying field values from documents.

Geotagging increases this, and it allows for filtering features based on direction as a simple way to remove features not likely to define a value.

Natural Language Processing is Integrated in Grooper

Working with legal documents like contracts or leases? Grooper’s techniques make it possible to find specific provisions or legal descriptions and then break them down into the data you need. Abstract any data more quickly – find dates, individual tracts of land, legal clauses, and more.

You don’t need custom development or multiple data science tools. Fuel your workflows with production-quality results.

Grooper natively processes text as n-grams and via porter stemming in addition to supporting configurations that implement more complex methods such as:

  • Sentiment NLP analysis
  • Part-of-speech tagging
  • Named entity tagging
  • Feature-based tagging


The main difference between a standard NLP library and Grooper is that NLP is integrated into the product/solution, not just as an add-on. NLP and other ML / AI functionality are embedded in Grooper.

natural language processing nlp