Overview
Create performance benchmark scripts to compare Elasticsearch, Infinity, and OceanBase across document storage, vector retrieval, full-text search, and other scenarios.
Technical Details
Implementation Plan
Create test/benchmark/storage_benchmark.py with the following test scenarios:
-
Document Storage Performance
- Batch write throughput (docs/s)
- Single document write latency (p50/p95/p99)
- Different document sizes (1KB/10KB/100KB/1MB)
-
Vector Retrieval Performance
- Different vector dimensions (128/768/1536)
- Different top-k values (10/100/1000)
- Retrieval latency and accuracy
-
Full-text Search Performance
- Keyword query latency
- Complex boolean query performance
- Pagination query performance
-
Concurrent Performance
- Throughput at different concurrency levels
- Resource usage (CPU/memory)
Test Framework
class StorageBenchmark:
def __init__(self, engine_type):
self.engine = self.create_engine(engine_type)
def benchmark_write(self, num_docs, doc_size):
"""Test write performance"""
pass
def benchmark_vector_search(self, dimension, top_k):
"""Test vector retrieval performance"""
pass
def benchmark_fulltext_search(self, query):
"""Test full-text search performance"""
pass
def run_all_benchmarks(self):
"""Run all tests and generate report"""
pass
Output Report
- Performance comparison report in Markdown format
- Analysis of strengths and weaknesses of each engine
- Selection recommendations
Estimated Effort
- Code size: ~200 lines
- Difficulty: ⭐⭐⭐⭐ High
- Priority: 🟢 Low
Related Files
test/benchmark/storage_benchmark.py - New file
test/benchmark/README.md - Test documentation
rag/nlp/search.py - Search functionality reference
rag/app/retriever.py - Retriever reference
Acceptance Criteria
- Support ES, Infinity, and OceanBase engines
- Test scenarios cover core use cases
- Generate readable performance comparison reports
- Reproducible test results
Background
This is part of the RAGFlow + OceanBase Hackathon to provide objective performance test data, helping users understand the performance characteristics of different storage engines and make informed technical choices.
Overview
Create performance benchmark scripts to compare Elasticsearch, Infinity, and OceanBase across document storage, vector retrieval, full-text search, and other scenarios.
Technical Details
Implementation Plan
Create
test/benchmark/storage_benchmark.pywith the following test scenarios:Document Storage Performance
Vector Retrieval Performance
Full-text Search Performance
Concurrent Performance
Test Framework
Output Report
Estimated Effort
Related Files
test/benchmark/storage_benchmark.py- New filetest/benchmark/README.md- Test documentationrag/nlp/search.py- Search functionality referencerag/app/retriever.py- Retriever referenceAcceptance Criteria
Background
This is part of the RAGFlow + OceanBase Hackathon to provide objective performance test data, helping users understand the performance characteristics of different storage engines and make informed technical choices.