Skip to content

ball2004244/LogosDB-TopicCluster-Lite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LogosDB-TopicCluster-Lite

A lighter version of TopicCluster, built fully with Python and SQLite

Simplified pipeline: TopicCluster -> SumAI -> SumDB.

To setup SumAI, check out https://github.com/ball2004244/LogosDB-AI-Models. Or running:

Prerequisites

  • Python 3.6 or higher
  • SQLite3
  • Docker

Installation

  1. Clone the repository

  2. Create conda environment

    conda env create -f environment.yml
  3. Install marqo-vectordb

    bash scripts/setup_sumdb.sh

Usage

  1. Have a csv dataset in the root directory

  2. Modify process_input.py to reformat dataset to LogosDB input

  3. Modify the scripts/pipeline.py file to suit your needs

  4. Run the pipeline

    python3 -m scripts.pipeline

Benchmark

  1. For benchmarking, need additional requirements: Ollama with a trained model (LLama3, Mixtral, etc.)

  2. Change required parameters in benchmark folder. Most parameters are in constants.py

  3. Run scripts/multi_benchmark.py for benchmarking LogosDB on MMLU datasets

    python3 -m scripts.multi_benchmark

Features

  • LogosCluster (Data Storage)
  • SumDB (VectorDB)
  • SumAI (Summarization - Extractive)
  • SumAI (Summarization - Abstractive)
  • SmartQuery (Search for relevant documents)

About

A lighter version of TopicCluster, built fully with Python and SQLite

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors