Inspiration

I was inspired to create Kenniscentrum based on my need for an AI application that I can use to get real-time information and also to have a better understanding of how to work with Google’s Gemini models and Langchain together.

What it does

In this application, users can search the web, chat with documents, and search Wikipedia using Google’s Gemini AI models. I have also implemented the functionality to soft delete threads and files. Soft deleted files and threads can be restored. I have also tried to implement safety settings option available on the gemini models to make this application responsible.

How I built it

For this application I have used the following:

  1. Next.js as the framework for the application
  2. Typescript as the programming language
  3. Tailwind CSS as the css framework
  4. Clerk for authentication
  5. Google Gemini 1.5 Pro and Gemini 1.5 Pro as the choice of LLM to answer users questions.
  6. Google embedding model text-embedding-004 to embed content from document to vector db
  7. Upstash Vector as the vector database
  8. MySql as Database from aiven
  9. Prisma as the ORM
  10. Shadcn/ui as the UI library
  11. Langchain to work with the language models and process uploaded documentInternet
  12. Brave Search API for real-time Interneta search
  13. Cheerio to get information about an webpage.

Challenges I ran into

Working on this application had both its ups and downs. But the biggest challenge I ran into was while returning the stream response. There were some times when nothing was getting sent to the front-end but the response was being generated by the LLM [could see the response in the console].

Accomplishments that I am proud of

After finishing this part of the application I am proud to say that I have a better understanding of how both Google Gemini APIswork and langchain works. Also about the potential it holds together.

At one point while working with the application, I was proud of understanding the definition of chunkSize and chunkOverlap on a deeper level and how it can affect the answers provided.

I had that moment of delight while I was working on the API responsible for indexing the PDF document after it was uploaded.

How does it work?

In this section, I will go over how the 4 different APIs that I created for this application work.

Web Search API

Gemini API in use: gemini-1.5-pro-latest, text-embedding-004

Location: app/api/realtime/[threadId]/route.ts

When a user submits data from the client-side interface, the following sequence of events is triggered upon reaching the API:

  1. Initialization of the Chat Model: The ChatGoogleGenerativeAI model is initialized with a specific model name (gemini-1.5-pro-latest), a maximum output token limit (2048), and safety settings to prevent harassment.
  2. Initialization of the Embeddings Model: The GoogleGenerativeAIEmbeddings model is initialized with a specific model name (text-embedding-004) and a task type (SEMANTIC_SIMILARITY).
  3. Conversion Function: The convertVercelMessageToLangChainMessage function is defined to convert messages into a format that the LangChain model can understand.
  4. POST Function: This is the main function that handles POST requests. It does the following:
    • Retrieves the current profile and the body of the request.
    • Finds the thread associated with the threadId from the database.
    • It checks if the thread exists and if the profile is authorized to access it.
    • If the thread title is the same as the thread ID and there are messages in the thread, it updates the thread title with the first 20 characters of the first message.
    • Creates a new message in the database with the content of the question, the role of the user, and the thread ID.
    • Filters the last 15 messages from the thread that is either from the user or the system and converts them into a format that the LangChain model can understand.
    • Initializes the agent executor with the chat model, embeddings model, and a set of tools (BraveSearchCalculatorWebBrowser).
    • Starts a new LangChain stream with the question and the chat history, and sets up handlers for when the stream completes or finalizes.
    • Returns a streaming text response with the stream.
  5. Error Handling: If any error occurs during the execution of the POST function, it returns an internal error response.

Wiki API

Gemini API in use: gemini-pro, gemini-1.5-pro-latest and text-embedding-004

Location: app/api/wiki/[threadId]/route.ts

When a user submits data from the client-side interface, the following sequence of events is triggered upon reaching the API:

  1. Initialization of the Google Generative AI: The GoogleGenerativeAI model is initialized with a specific API key.
  2. POST Function: This is the main function that handles POST requests. It does the following:
    • Retrieves the current profile and the body of the request.
    • Initializes a WikipediaQueryRun tool with specific parameters.
    • Finds the thread associated with the threadId from the database.
    • Checks if the thread exists and if the profile is authorized to access it.
    • If the thread title is the same as the thread id and there are messages in the thread, it updates the thread title with the first 20 characters of the first message.
    • Creates a new message in the database with the content of the question, the role of the user, and the thread id.
    • Uses the Google Generative AI to extract the main keywords from the user’s question.
    • Calls the WikipediaQueryRun tool with the extracted keywords to get relevant Wikipedia content.
    • Generates a response based on the user’s question and the Wikipedia content using the Google Generative AI.
    • Creates a new message in the database with the generated response, the role of the system, and the thread id.
    • Updates the thread’s updatedAt field with the current date.
    • Returns a streaming text response with the generated response.
  3. Error Handling: If any error occurs during the execution of the POST function, it returns an internal error response.

In summary, The POST function retrieves the current user profile and the request body, and initializes a WikipediaQueryRun tool with specific parameters. It searches for a thread in the database using the provided threadId, checks its existence and the profile’s authorization. If the thread title is identical to the thread id and there are messages in the thread, it updates the thread title with the first 20 characters of the first message. A new message is created in the database with the content of the question, the role of the user, and the thread id.

The Google Generative AI is used to extract the main keywords from the user’s question, and the WikipediaQueryRun tool is called with the extracted keywords to fetch relevant Wikipedia content. A response is then generated based on the user’s question and the Wikipedia content using the Google Generative AI. A new message is created in the database with the generated response, the role of the system, and the thread id. The updatedAt field of the thread is updated with the current date, and a streaming text response with the generated response is returned.

Index API

Gemini API in use: text-embedding-004

Location: app/api/index/route.ts

When a user uploads a document from the client-side interface, the following sequence of events is triggered upon reaching the API:

POST Function: This function handles POST requests. It does the following:

  • Retrieves the form data from the request.
  • Extracts the uploaded file from the form data.
  • Retrieves the current user profile. If no profile is found, it returns a 401 status with a “Profile not found” message.
  • Initializes an Index instance with the URL and token from the environment variables.
  • Checks if the uploaded file is a PDF. If it is, it proceeds with the following steps:
    • Initializes a PDFLoader to load the PDF document.
    • Loads the PDF document and gets the number of pages. If the number of pages is more than 10, it returns a 500 status with a message indicating that it currently supports documents up to 10 pages.
    • Filters the loaded documents to only include those with content.
    • Creates a new file record in the database with the name of the uploaded file, the profile ID, the upload status set to “PROCESSING”, the number of pages, and the file type set to “PDF”.
    • For each document in the loaded documents, it does the following:
      • Gets the page number and the content of the document.
      • Initializes a RecursiveCharacterTextSplitter to split the document content into chunks.
      • Splits the document content into chunks.
      • Initializes a GoogleGenerativeAIEmbeddings instance with a specific model name and task type.
      • Gets the embeddings for the chunks of document content.
      • For each chunk, it creates a vector with an ID composed of the file ID, the page number, and the index of the chunk, the values of the embeddings, and the metadata of the chunk.
      • When the batch of vectors is full or it’s the last item, it upserts the vectors to the index.
    • The code updates the file record in the database to set the uploadStatus to “SUCCESS” and indexDone to true.
    • A new thread is created in the database with a specific prompt, the name of the file as the title, the file ID, the profile ID, and the thread type set to “DOC”.
    • Two new activity records are created in the database. One for the creation of the thread and another for the successful creation and embedding of the file.
    • The function returns a response with the created thread and a status of 200.
    • Error Handling: If any error occurs during the execution of the PDF embedding or parsing, it returns a 500 status with an appropriate error message. If the uploaded file is not a PDF, it returns a 500 status with a “Format Error” message.

In summary, this API handles the uploading and processing of PDF files. It parses the PDF, splits the content into chunks, generates embeddings for each chunk, and stores them in an index. It also creates a new thread for each uploaded file and records the activities in the database. The API is designed to handle errors and return appropriate responses. It only supports PDF files and has a limit of 10 pages per document.

Document Chat API

Gemini API in use: gemini-pro and text-embedding-004

Location: app/api/thread/[threadId]/route.ts

When a user submits data from the client-side interface, the following sequence of events is triggered upon reaching the API:

  1. Retrieve Profile and Request Data: The function retrieves the current user profile and the body of the request. It also extracts the user’s question from the request body.
  2. Thread Retrieval: It retrieves a thread from the database using the thread ID provided in the request parameters. If the thread doesn’t exist, it returns a 404 status with a “Not Found” message. If the profile doesn’t exist or the profile ID doesn’t match the profile ID of the thread, it returns a 403 status with an “Unauthorized” message.
  3. Thread Title Update: If the thread title is the same as the thread ID and there are messages in the thread, it updates the thread title with the first 20 characters of the first message.
  4. Message Creation: It creates a new message in the database with the content of the question, the role of the user, and the thread ID.
  5. Embedding Generation: It initializes a GoogleGenerativeAIEmbeddings instance and uses it to generate embeddings for the user’s question.
  6. Content Retrieval: It queries an index to retrieve content that matches the embeddings of the user’s question. It then concatenates the content of the retrieved documents into a single string.
  7. Response Generation: It initializes a GoogleGenerativeAI instance and uses it to generate a response based on the user’s question and the retrieved content.
  8. Stream Creation: It creates a stream with the generated response. When the stream completes, it creates a new message in the database with the generated response, the role of the system, and the thread ID. It also updates the thread’s updatedAt field with the current date.
  9. Response Return: It returns a streaming text response with the stream.
  10. Error Handling: If any error occurs during the execution of the function, it returns a 500 status with an “Internal Error” message.

This function essentially enables a chatbot functionality where a user can ask a question, and the system generates a response based on the content retrieved from an index that matches the embeddings of the user’s question. The response is then sent back to the user as a streaming text response.

What I learned

After finishing this part of the application I learned the potential of the Gemini models have and how beneficial they can be for AI applications. Also, Langchain integrating with Gemini models is a really powerful combination. I now also have a basic understanding of safety settings provided by Gemini-models but I want to have a deeper understanding of use cases.

Reflection upon myself: I have realized that I do not have an in-depth knowledge of how streaming works under the hood. I need to work on that.

What's next for Kenniscentrum

I have made the source code public but after the end of the judging period I am planning to expand my knowledge of function calls with Gemini API, also I had another implementation of langchain agents with the tools (BraveSearchCalculatorWebBrowser) but I was not able to make it work with the front end. The issue I was facing was that the response was not being sent to the front-end but I could see the response on my console. That implementation provided better answers than the current real-time implementation. It was frustrating since I was so close yet so far away. So going in-depth with the issue is my first plan with Kenniscentrum after the judging period.

Limitation

With this submission, I have provided the GitHub repo link, the Vercel deployment link, and a Railway deployment link.

For best results, run it locally.

While running locally you can change/comment out the following code lines which can be found inside app/api/index/route.ts file.

if (pagesAmt > 10) {
  return new NextResponse("Currently supports document upto 10 pages", {
    status: 500,
  });
}

By removing those lines you can upload pdf document of length longer than 10 pages. Vercel Deployment causes a timeout issue if the request takes longer than 10 sec for the server to respond. Railway deployment is working but I might run out of credits in the account.

Built With

Share this project:

Updates