A lighter version of TopicCluster, built fully with Python and SQLite
Simplified pipeline: TopicCluster -> SumAI -> SumDB.
To setup SumAI, check out https://github.com/ball2004244/LogosDB-AI-Models. Or running:
- Python 3.6 or higher
- SQLite3
- Docker
-
Clone the repository
-
Create conda environment
conda env create -f environment.yml
-
Install marqo-vectordb
bash scripts/setup_sumdb.sh
-
Have a csv dataset in the root directory
-
Modify process_input.py to reformat dataset to LogosDB input
-
Modify the
scripts/pipeline.pyfile to suit your needs -
Run the pipeline
python3 -m scripts.pipeline
-
For benchmarking, need additional requirements: Ollama with a trained model (LLama3, Mixtral, etc.)
-
Change required parameters in benchmark folder. Most parameters are in
constants.py -
Run
scripts/multi_benchmark.pyfor benchmarking LogosDB on MMLU datasetspython3 -m scripts.multi_benchmark
- LogosCluster (Data Storage)
- SumDB (VectorDB)
- SumAI (Summarization - Extractive)
- SumAI (Summarization - Abstractive)
- SmartQuery (Search for relevant documents)