[Feature Request - RAGFlow+OceanBase] Add OceanBase Storage Support for Table Parser

## Overview
RAGFlow's Table Parser is a chunking method that parses Excel/CSV files and stores the structured data for retrieval in chat conversations. Currently it supports storing parsed table data in Elasticsearch and Infinity. This feature request aims to add OceanBase as a storage backend for the Table Parser, enabling users to query Excel data stored in OceanBase through natural language in chat.

## Technical Details

### Location
- `rag/app/table.py` - Table parser implementation (lines 460-495)
- `rag/utils/ob_conn.py:1632` - OceanBase connection with TODO for `sql()` function

### Current Implementation
The Table Parser processes Excel/CSV files and stores data differently based on the document engine:

1. **For Infinity** (`DOC_ENGINE_INFINITY`):
   - Stores data in a JSON column `chunk_data` with original column names as keys
   - Flexible schema handling via JSON
   ```python
   data_json[str(clmns[j])] = row[clmns[j]]
   d["chunk_data"] = data_json
   ```

2. **For Elasticsearch/OpenSearch** (default):
   - Stores data as individual fields with type suffixes
   - Type suffixes: `_tks` (text tokens), `_kwd` (keyword), `_long` (integer), `_flt` (float), `_dt` (datetime), `_bool` (boolean)
   ```python
   fld = clmns_map[j][0]  # e.g., "name_kwd", "age_long"
   d[fld] = row[clmns[j]] if clmn_tys[j] != "text" else rag_tokenizer.tokenize(row[clmns[j]])
   ```

### Implementation Plan

1. **Add OceanBase storage logic in `rag/app/table.py`** (after line 495)
   ```python
   elif settings.DOC_ENGINE_OCEANBASE:
       # Store in OceanBase using JSON column for flexibility
       data_json[str(clmns[j])] = row[clmns[j]]
   d["chunk_data"] = data_json
   ```

2. **Implement `sql()` function in `rag/utils/ob_conn.py`** (line 1632)
   ```python
   def sql(self, sql_text: str, fetch_size: int =1024, format: str = "json"):
       """
       Execute SQL query on OceanBase for table data retrieval
       
       Args:
           sql_text: SQL query string (generated by LLM via text-to-SQL)
           fetch_size: Maximum number of rows to return
           format: Output format ("json" or "markdown")
       
       Returns:
           Query results in specified format
       """
       # 1. Parse and validate SQL
       # 2. Execute query against OceanBase
       # 3. Handle JSON column data (chunk_data)
       # 4. Format results (JSON or Markdown table)
       # 5. Apply fetch_size limit
   ```

3. **Key Technical Challenges**:
   - **Dynamic Schema**: Excel files have different columns, use JSON column for flexibility
   - **Data Type Mapping**: Map Python types to OceanBase types (int, float, text, datetime, bool)
   - **Text-to-SQL Integration**: Generate SQL from natural language queries
   - **Result Formatting**: Support both JSON and Markdown output formats

### Core Functionality
The Table Parser provides:
- **Excel/CSV Parsing**: Multi-sheet support, header detection, data type inference
- **Data Type Detection**: Automatically detects int, float, text, datetime, bool types
- **Tokenization**: Text fields are tokenized for full-text search
- **Field Mapping**: Creates pinyin-based field names for Chinese content
- **Chunk Creation**: Each row becomes a searchable chunk with metadata
- **Image Extraction**: Extracts and describes images within Excel cells

### Estimated Effort
- Code size: ~150 lines (table.py modifications + sql() implementation)
- Difficulty: ⭐⭐⭐ Medium
- Priority: 🟡 Medium

### Related Files
- `rag/app/table.py` - Table parser (modify storage logic)
- `rag/utils/ob_conn.py` - OceanBase connection (implement sql() function)
- `deepdoc/parser/excel_parser.py` - Excel parsing utilities
- `common/settings.py` - DOC_ENGINE configuration

## Acceptance Criteria
Provide screenshot(s) demonstrating:
1. Upload an Excel file to a knowledge base with OceanBase as the document engine
2. Configure the Table Parser as the chunking method
3. Successfully parse and store the Excel data in OceanBase
4. Query the stored table data through chat using natural language
5. Display query results correctly (Markdown table format)

Example test scenario:
```
User uploads: sales_data.xlsx with columns (date, product, quantity, revenue)
Chat query: "Show me the top 5 products by revenue"
System: Generates SQL → Queries OceanBase → Returns results
```

## Background
This is part of the RAGFlow + OceanBase Hackathon. Adding OceanBase storage support to the Table Parser will enable users to leverage OceanBase's powerful SQL capabilities for querying structured Excel data through natural language conversations in RAGFlow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request - RAGFlow+OceanBase] Add OceanBase Storage Support for Table Parser #12770

Overview

Technical Details

Location

Current Implementation

Implementation Plan

Core Functionality

Estimated Effort

Related Files

Acceptance Criteria

Background

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature Request - RAGFlow+OceanBase] Add OceanBase Storage Support for Table Parser #12770

Description

Overview

Technical Details

Location

Current Implementation

Implementation Plan

Core Functionality

Estimated Effort

Related Files

Acceptance Criteria

Background

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions