Inspiration

“Big data” grows bigger every year, but today’s enterprise leaders don’t only need to manage larger volumes of data, but they critically need to generate insight from their existing data. So how we could look at this proliferation differently both as problem and solution with Tiger graphs as core to generate much needed intelligent, competitive insights and significant business value from connected data? Certainly sees the potential for knowledge graphs to be used for knowledge generation and not just querying. Today most knowledge work is siloed and over-specialized, causing researchers and technologists to miss out on valuable connections between concepts in different fields. Knowledge graphs provide a way to identify promising connections between disparate ideas that wouldn’t normally exist in any one person’s head.

What it does

Tools GPT will slowly become a “bridge” that connects the world of documents and text with the world of EKGs. NLP services built on GPT will cost-effectively ingest millions of documents and return precisely coded concept graphs for each document linking documents that both discuss the same concept in a materialized and queryable edge between the concept graphs for each document. The first thing that GPT-3 overwhelms with is its sheer size of trainable parameters which is 10x more than any previous model out there. In general, the more parameters a model has, the more data is required to train the model. As per the creators, the OpenAI GPT-3 model has been trained about 45 TB text data from multiple sources which include Wikipedia and books. The multiple datasets used to train the model are shown below: Common Crawl corpus contains petabytes of data collected over 8 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts with light filtering. WebText2 is the text of web pages from all outbound Reddit links from posts with 3+ upvotes. Books1 & Books2 are two internet-based books corpora. Wikipedia pages in the English language are also part of the training corpus. The third column in the table “Weight in training mix” refers to the fraction of examples during training that are drawn from a given dataset. A major issue in training models and in particular such large training models with so much data from the internet is that these models have the capacity to memorise the content and then contaminate downstream tasks like testing as they might have already seen the data. Though the creators of GPT-3 took some measures to avoid the training and test data overlaps but a bug in the filtering caused some of the data to leak.

How we built it

The GPT-3 is not one single model but a family of models. Each model in the family has a different number of trainable parameters. The following table shows each model, architecture and its corresponding parameters: In fact, the OpenAI GPT-3 family of models is based on the same transformer-based architecture of the GPT-2 model including the modified initialization, pre-normalization, reverse tokenization, with the exception that it uses alternating dense and sparse attention patterns. The largest version GPT-3 175B or “GPT-3” has 175 B Parameters, 96 attention layers and 3.2 M batch size. While language models like BERT use the Encoder to generate embeddings from the raw text which can be used in other machine learning applications, the GPT family use the Decoder half, so they take in embeddings and produce text.

Our GPT-3 models can understand and generate natural language. We offer four main models with different levels of power suitable for different tasks. Davinci is the most capable model, and Ada is the fastest.

Share this project:

Updates