Skip to content

feat(embedding): combine document embedder and query embedder to avoi…#702

Merged
ZaynJarvis merged 1 commit intovolcengine:mainfrom
zhougit86:feature/combine_document_query_embed
Mar 17, 2026
Merged

feat(embedding): combine document embedder and query embedder to avoi…#702
ZaynJarvis merged 1 commit intovolcengine:mainfrom
zhougit86:feature/combine_document_query_embed

Conversation

@zhougit86
Copy link
Copy Markdown
Contributor

@zhougit86 zhougit86 commented Mar 17, 2026

…d complexity

Description

本次变更主要对 Embedder 模块进行了重构,简化了非对称嵌入(Asymmetric Embedder)的实现逻辑,移除了冗余代码,使代码结构更加清晰。

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  1. 重构 Embedder 实现 : 简化了 OpenAIEmbedder , JinaDenseEmbedder , VolcEngineEmbedder 等类的非对称嵌入处理逻辑,大幅减少了代码量(Deleted ~345 lines)。
  2. 更新配置与测试 : 同步更新了 embedding_config.py 配置处理逻辑以及 test_embedding_input_type.py 等相关测试用例以适配新的接口。

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style (Applied ruff format )
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


xiaogang.zhou seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Collaborator

@ZaynJarvis ZaynJarvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhougit86 zhougit86 force-pushed the feature/combine_document_query_embed branch from 15d60a3 to 84d1043 Compare March 17, 2026 15:33
@zhougit86
Copy link
Copy Markdown
Contributor Author

这个 is query 的设置看起来是不必要的。main 已有的 query embedder / document embedder 的使用场景区分已经可以满足需求。可以检查下是否如此。对于 minimax 集成,看下是否直接基于 main 的实现即可

这个 is query 的设置看起来是不必要的。main 已有的 query embedder / document embedder 的使用场景区分已经可以满足需求。可以检查下是否如此。对于 minimax 集成,看下是否直接基于 main 的实现即可

现在看起来这个context使得整个逻辑有点复杂,其实只需要判断有没有设置query params以及是不是query就可以了,不必实现两套,略显冗余

"query_task",
"document_task",
):
for key in ("query_value", "document_value", "query_task", "document_task"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可能要改成 query_param document_param

Copy link
Copy Markdown
Collaborator

@ZaynJarvis ZaynJarvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@zhougit86 zhougit86 force-pushed the feature/combine_document_query_embed branch from 776fb7f to 27c4971 Compare March 17, 2026 16:17
@ZaynJarvis ZaynJarvis merged commit 9800b40 into volcengine:main Mar 17, 2026
5 of 6 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 17, 2026
ZaynJarvis added a commit to ZaynJarvis/OpenViking that referenced this pull request Mar 17, 2026
Clean implementation on latest main (post PR volcengine#702 merge):
- Add _parse_param_string() method for key=value parsing
- Enhance _build_extra_body() to support multiple parameters
- Maintain backward compatibility with simple string format
- Update docstrings and examples

Usage:
- Simple: query_param='query' → {'input_type': 'query'}
- Enhanced: query_param='input_type=query,task=search' → {'input_type': 'query', 'task': 'search'}

Supports OpenAI-compatible servers with custom parameters while
maintaining clean integration with PR volcengine#702's is_query API.
qin-ctx pushed a commit that referenced this pull request Mar 18, 2026
* feat: add key=value parameter parsing to OpenAI embedder

Clean implementation on latest main (post PR #702 merge):
- Add _parse_param_string() method for key=value parsing
- Enhance _build_extra_body() to support multiple parameters
- Maintain backward compatibility with simple string format
- Update docstrings and examples

Usage:
- Simple: query_param='query' → {'input_type': 'query'}
- Enhanced: query_param='input_type=query,task=search' → {'input_type': 'query', 'task': 'search'}

Supports OpenAI-compatible servers with custom parameters while
maintaining clean integration with PR #702's is_query API.

* fix: lint issues
chethanuk added a commit to chethanuk/OpenViking that referenced this pull request Mar 20, 2026
…ngine#702 pattern

- GeminiDenseEmbedder: accept query_param/document_param, use is_query
  in embed() and embed_batch() to select task_type at call time
- EmbeddingConfig: add Gemini provider, factory, validation, dimension
- No get_query_embedder/get_document_embedder/_get_contextual_embedder
  (removed in volcengine#702; embed(is_query=True/False) is the pattern)
- Tests use embed(text, is_query=True/False) pattern throughout
- Rebased onto current upstream/main
ZaynJarvis pushed a commit that referenced this pull request Mar 20, 2026
* feat(gemini): add GeminiDenseEmbedder with is_query routing per #702 pattern

- GeminiDenseEmbedder: accept query_param/document_param, use is_query
  in embed() and embed_batch() to select task_type at call time
- EmbeddingConfig: add Gemini provider, factory, validation, dimension
- No get_query_embedder/get_document_embedder/_get_contextual_embedder
  (removed in #702; embed(is_query=True/False) is the pattern)
- Tests use embed(text, is_query=True/False) pattern throughout
- Rebased onto current upstream/main

* fix(gemini): remove task_type config field, fix conditional import for CI

- Remove task_type from EmbeddingModelConfig (query_param/document_param suffice)
- Wrap GeminiDenseEmbedder import in try/except (google-genai is optional)
- Update tests for removed field
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants