Skip to content

[Feature Request - RAGFlow+OceanBase] Storage Engine Performance Benchmarking #12773

@yuzhichang

Description

@yuzhichang

Overview

Create performance benchmark scripts to compare Elasticsearch, Infinity, and OceanBase across document storage, vector retrieval, full-text search, and other scenarios.

Technical Details

Implementation Plan

Create test/benchmark/storage_benchmark.py with the following test scenarios:

  1. Document Storage Performance

    • Batch write throughput (docs/s)
    • Single document write latency (p50/p95/p99)
    • Different document sizes (1KB/10KB/100KB/1MB)
  2. Vector Retrieval Performance

    • Different vector dimensions (128/768/1536)
    • Different top-k values (10/100/1000)
    • Retrieval latency and accuracy
  3. Full-text Search Performance

    • Keyword query latency
    • Complex boolean query performance
    • Pagination query performance
  4. Concurrent Performance

    • Throughput at different concurrency levels
    • Resource usage (CPU/memory)

Test Framework

class StorageBenchmark:
    def __init__(self, engine_type):
        self.engine = self.create_engine(engine_type)
    
    def benchmark_write(self, num_docs, doc_size):
        """Test write performance"""
        pass
    
    def benchmark_vector_search(self, dimension, top_k):
        """Test vector retrieval performance"""
        pass
    
    def benchmark_fulltext_search(self, query):
        """Test full-text search performance"""
        pass
    
    def run_all_benchmarks(self):
        """Run all tests and generate report"""
        pass

Output Report

  • Performance comparison report in Markdown format
  • Analysis of strengths and weaknesses of each engine
  • Selection recommendations

Estimated Effort

  • Code size: ~200 lines
  • Difficulty: ⭐⭐⭐⭐ High
  • Priority: 🟢 Low

Related Files

  • test/benchmark/storage_benchmark.py - New file
  • test/benchmark/README.md - Test documentation
  • rag/nlp/search.py - Search functionality reference
  • rag/app/retriever.py - Retriever reference

Acceptance Criteria

  • Support ES, Infinity, and OceanBase engines
  • Test scenarios cover core use cases
  • Generate readable performance comparison reports
  • Reproducible test results

Background

This is part of the RAGFlow + OceanBase Hackathon to provide objective performance test data, helping users understand the performance characteristics of different storage engines and make informed technical choices.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions