Skip to content

feat: Add Dimension Mismatch Handling for ChromaDB (#157)#207

Merged
kevin-mindverse merged 3 commits intomindverse:masterfrom
PStarH:master
Apr 22, 2025
Merged

feat: Add Dimension Mismatch Handling for ChromaDB (#157)#207
kevin-mindverse merged 3 commits intomindverse:masterfrom
PStarH:master

Conversation

@PStarH
Copy link
Contributor

@PStarH PStarH commented Apr 11, 2025

Summary

This PR implements the dimension mismatch handling feature for ChromaDB in the Second-Me project, as outlined in Issue #157. This feature enables seamless transitions between embedding models with different output dimensions, enhancing flexibility and reducing manual intervention.


Key Features

1. Automatic Dimension Detection

  • Embedding dimensions are inferred based on the model name using a robust mapping defined in chroma_utils.py.
  • Supported models:
    • OpenAI: text-embedding-ada-002 (1536), text-embedding-3-large (3072), etc.
    • Ollama: e.g., mistral (768), llama2 (1024)
  • Defaults to 1536 dimensions if the model is unrecognized.

2. Dimension Validation

  • Validates consistency between configured model dimensions and existing ChromaDB collection dimensions.
  • Validation logic integrated into embedding_service.py and init_chroma.py.

3. Automatic Reinitialization

  • On detection of a mismatch, collections are safely reinitialized to match the new embedding dimension.
  • Function: reinitialize_chroma_collections
    • Deletes and recreates collections.
    • Verifies new dimensions post-reinit.

4. Robust Error Handling and Logging

  • Detailed logs are generated for:
    • Detected mismatches
    • Successful or failed reinitializations
  • Clear error messages to aid debugging and user understanding.

5. User Documentation

  • Embedding Model Switching.md added:
    • Table of commonly used models and their embedding dimensions.
    • Step-by-step explanation of the automatic handling mechanism.
    • Troubleshooting guide for edge cases.

Impact

This implementation significantly improves user experience by:

  • Removing the need for manual database resets.
  • Ensuring reliability during embedding model transitions.
  • Offering clear diagnostics when issues arise.

Related Issue

#157

PStarH added 2 commits April 11, 2025 12:18
Add chroma_utils.py to manage chromaDB and added docs for explanation
- Enhanced the`reinitialize_chroma_collections` function in`chroma_utils.py` to properly check if collections exist before attempting to delete them, preventing potential errors when collections don't exist.
- Improved error handling in the`_handle_dimension_mismatch` method in`embedding_service.py` by adding more robust exception handling and verification steps after reinitialization.
- Enhanced the collection initialization process in`embedding_service.py` to provide more detailed error messages and better handle cases where collections still have incorrect dimensions after reinitialization.
- Added additional verification steps to ensure that collection dimensions match the expected dimension after creation or retrieval.
- Improved logging throughout the code to provide more context in error messages, making debugging easier.
@CLAassistant
Copy link

CLAassistant commented Apr 11, 2025

CLA assistant check
All committers have signed the CLA.

@yingapple
Copy link
Contributor

please resolve the conflict. Thanks!

@kevin-mindverse
Copy link
Contributor

Hey there, thank you for your contribution, but still there's a few conflicts to solve :)

@PStarH
Copy link
Contributor Author

PStarH commented Apr 17, 2025

I resolved the conflict I think

Copy link
Contributor

@kevin-mindverse kevin-mindverse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Job

@kevin-mindverse kevin-mindverse merged commit 5868f94 into mindverse:master Apr 22, 2025
1 check passed
kevin-mindverse pushed a commit that referenced this pull request Apr 22, 2025
* Fix Issue #157

Add chroma_utils.py to manage chromaDB and added docs for explanation

* Add logging and debugging process

- Enhanced the`reinitialize_chroma_collections` function in`chroma_utils.py` to properly check if collections exist before attempting to delete them, preventing potential errors when collections don't exist.
- Improved error handling in the`_handle_dimension_mismatch` method in`embedding_service.py` by adding more robust exception handling and verification steps after reinitialization.
- Enhanced the collection initialization process in`embedding_service.py` to provide more detailed error messages and better handle cases where collections still have incorrect dimensions after reinitialization.
- Added additional verification steps to ensure that collection dimensions match the expected dimension after creation or retrieval.
- Improved logging throughout the code to provide more context in error messages, making debugging easier.
lazy-ape pushed a commit to lazy-ape/Second-Me that referenced this pull request Apr 22, 2025
…indverse#207)

* Fix Issue mindverse#157

Add chroma_utils.py to manage chromaDB and added docs for explanation

* Add logging and debugging process

- Enhanced the`reinitialize_chroma_collections` function in`chroma_utils.py` to properly check if collections exist before attempting to delete them, preventing potential errors when collections don't exist.
- Improved error handling in the`_handle_dimension_mismatch` method in`embedding_service.py` by adding more robust exception handling and verification steps after reinitialization.
- Enhanced the collection initialization process in`embedding_service.py` to provide more detailed error messages and better handle cases where collections still have incorrect dimensions after reinitialization.
- Added additional verification steps to ensure that collection dimensions match the expected dimension after creation or retrieval.
- Improved logging throughout the code to provide more context in error messages, making debugging easier.
kevin-mindverse added a commit that referenced this pull request Apr 24, 2025
* fix: modify thinking_model loading configuration

* feat: realize thinkModel ui

* feat:store

* feat: add combined_llm_config_dto

* add thinking_model_config & database migration

* directly add thinking model to user_llm_config

* delete thinking model repo dto service

* delete thinkingmodel table migration

* add is_cot config

* feat: allow define  is_cot

* feat: simplify logs info

* feat: add training model

* feat: fix is_cot problem

* fix: fix chat message

* fix: fix progress error

* fix: disable no settings thinking

* feat: add thinking warning

* fix: fix start service error

* feat:fix init trainparams problem

* feat: change playGround prompt

* feat: Add Dimension Mismatch Handling for ChromaDB (#157) (#207)

* Fix Issue #157

Add chroma_utils.py to manage chromaDB and added docs for explanation

* Add logging and debugging process

- Enhanced the`reinitialize_chroma_collections` function in`chroma_utils.py` to properly check if collections exist before attempting to delete them, preventing potential errors when collections don't exist.
- Improved error handling in the`_handle_dimension_mismatch` method in`embedding_service.py` by adding more robust exception handling and verification steps after reinitialization.
- Enhanced the collection initialization process in`embedding_service.py` to provide more detailed error messages and better handle cases where collections still have incorrect dimensions after reinitialization.
- Added additional verification steps to ensure that collection dimensions match the expected dimension after creation or retrieval.
- Improved logging throughout the code to provide more context in error messages, making debugging easier.

* Change topics_generator timeout to 30 (#263)

* quick fix

* fix: shade -> shade_merge_info (#265)

* fix: shade -> shade_merge_info

* add convert array

* quick fix import error

* add log

* add heartbeat

* new strategy

* sse version

* add heartbeat

* zh to en

* optimize code

* quick fix convert function

* Feat/new branch management (#267)

* feat: new branch management

* feat: fix multi-upload

* optimize contribute management

---------

Co-authored-by: Crabboss Mr <1123357821@qq.com>
Co-authored-by: Ye Xiangle <yexiangle@mail.mindverse.ai>
Co-authored-by: Xinghan Pan <sampan090611@gmail.com>
Co-authored-by: doubleBlack2 <108928143+doubleBlack2@users.noreply.github.com>
Co-authored-by: kevin-mindverse <kevin@mindverse.ai>
Co-authored-by: KKKKKKKevin <115385420+kevin-mindverse@users.noreply.github.com>
Heterohabilis pushed a commit to Heterohabilis/Second-Me that referenced this pull request May 29, 2025
…indverse#207)

* Fix Issue mindverse#157

Add chroma_utils.py to manage chromaDB and added docs for explanation

* Add logging and debugging process

- Enhanced the`reinitialize_chroma_collections` function in`chroma_utils.py` to properly check if collections exist before attempting to delete them, preventing potential errors when collections don't exist.
- Improved error handling in the`_handle_dimension_mismatch` method in`embedding_service.py` by adding more robust exception handling and verification steps after reinitialization.
- Enhanced the collection initialization process in`embedding_service.py` to provide more detailed error messages and better handle cases where collections still have incorrect dimensions after reinitialization.
- Added additional verification steps to ensure that collection dimensions match the expected dimension after creation or retrieval.
- Improved logging throughout the code to provide more context in error messages, making debugging easier.
Heterohabilis pushed a commit to Heterohabilis/Second-Me that referenced this pull request May 29, 2025
* fix: modify thinking_model loading configuration

* feat: realize thinkModel ui

* feat:store

* feat: add combined_llm_config_dto

* add thinking_model_config & database migration

* directly add thinking model to user_llm_config

* delete thinking model repo dto service

* delete thinkingmodel table migration

* add is_cot config

* feat: allow define  is_cot

* feat: simplify logs info

* feat: add training model

* feat: fix is_cot problem

* fix: fix chat message

* fix: fix progress error

* fix: disable no settings thinking

* feat: add thinking warning

* fix: fix start service error

* feat:fix init trainparams problem

* feat: change playGround prompt

* feat: Add Dimension Mismatch Handling for ChromaDB (mindverse#157) (mindverse#207)

* Fix Issue mindverse#157

Add chroma_utils.py to manage chromaDB and added docs for explanation

* Add logging and debugging process

- Enhanced the`reinitialize_chroma_collections` function in`chroma_utils.py` to properly check if collections exist before attempting to delete them, preventing potential errors when collections don't exist.
- Improved error handling in the`_handle_dimension_mismatch` method in`embedding_service.py` by adding more robust exception handling and verification steps after reinitialization.
- Enhanced the collection initialization process in`embedding_service.py` to provide more detailed error messages and better handle cases where collections still have incorrect dimensions after reinitialization.
- Added additional verification steps to ensure that collection dimensions match the expected dimension after creation or retrieval.
- Improved logging throughout the code to provide more context in error messages, making debugging easier.

* Change topics_generator timeout to 30 (mindverse#263)

* quick fix

* fix: shade -> shade_merge_info (mindverse#265)

* fix: shade -> shade_merge_info

* add convert array

* quick fix import error

* add log

* add heartbeat

* new strategy

* sse version

* add heartbeat

* zh to en

* optimize code

* quick fix convert function

* Feat/new branch management (mindverse#267)

* feat: new branch management

* feat: fix multi-upload

* optimize contribute management

---------

Co-authored-by: Crabboss Mr <1123357821@qq.com>
Co-authored-by: Ye Xiangle <yexiangle@mail.mindverse.ai>
Co-authored-by: Xinghan Pan <sampan090611@gmail.com>
Co-authored-by: doubleBlack2 <108928143+doubleBlack2@users.noreply.github.com>
Co-authored-by: kevin-mindverse <kevin@mindverse.ai>
Co-authored-by: KKKKKKKevin <115385420+kevin-mindverse@users.noreply.github.com>
EOMZON pushed a commit to EOMZON/Second-Me that referenced this pull request Feb 1, 2026
…indverse#207)

* Fix Issue mindverse#157

Add chroma_utils.py to manage chromaDB and added docs for explanation

* Add logging and debugging process

- Enhanced the`reinitialize_chroma_collections` function in`chroma_utils.py` to properly check if collections exist before attempting to delete them, preventing potential errors when collections don't exist.
- Improved error handling in the`_handle_dimension_mismatch` method in`embedding_service.py` by adding more robust exception handling and verification steps after reinitialization.
- Enhanced the collection initialization process in`embedding_service.py` to provide more detailed error messages and better handle cases where collections still have incorrect dimensions after reinitialization.
- Added additional verification steps to ensure that collection dimensions match the expected dimension after creation or retrieval.
- Improved logging throughout the code to provide more context in error messages, making debugging easier.
EOMZON pushed a commit to EOMZON/Second-Me that referenced this pull request Feb 1, 2026
* fix: modify thinking_model loading configuration

* feat: realize thinkModel ui

* feat:store

* feat: add combined_llm_config_dto

* add thinking_model_config & database migration

* directly add thinking model to user_llm_config

* delete thinking model repo dto service

* delete thinkingmodel table migration

* add is_cot config

* feat: allow define  is_cot

* feat: simplify logs info

* feat: add training model

* feat: fix is_cot problem

* fix: fix chat message

* fix: fix progress error

* fix: disable no settings thinking

* feat: add thinking warning

* fix: fix start service error

* feat:fix init trainparams problem

* feat: change playGround prompt

* feat: Add Dimension Mismatch Handling for ChromaDB (mindverse#157) (mindverse#207)

* Fix Issue mindverse#157

Add chroma_utils.py to manage chromaDB and added docs for explanation

* Add logging and debugging process

- Enhanced the`reinitialize_chroma_collections` function in`chroma_utils.py` to properly check if collections exist before attempting to delete them, preventing potential errors when collections don't exist.
- Improved error handling in the`_handle_dimension_mismatch` method in`embedding_service.py` by adding more robust exception handling and verification steps after reinitialization.
- Enhanced the collection initialization process in`embedding_service.py` to provide more detailed error messages and better handle cases where collections still have incorrect dimensions after reinitialization.
- Added additional verification steps to ensure that collection dimensions match the expected dimension after creation or retrieval.
- Improved logging throughout the code to provide more context in error messages, making debugging easier.

* Change topics_generator timeout to 30 (mindverse#263)

* quick fix

* fix: shade -> shade_merge_info (mindverse#265)

* fix: shade -> shade_merge_info

* add convert array

* quick fix import error

* add log

* add heartbeat

* new strategy

* sse version

* add heartbeat

* zh to en

* optimize code

* quick fix convert function

* Feat/new branch management (mindverse#267)

* feat: new branch management

* feat: fix multi-upload

* optimize contribute management

---------

Co-authored-by: Crabboss Mr <1123357821@qq.com>
Co-authored-by: Ye Xiangle <yexiangle@mail.mindverse.ai>
Co-authored-by: Xinghan Pan <sampan090611@gmail.com>
Co-authored-by: doubleBlack2 <108928143+doubleBlack2@users.noreply.github.com>
Co-authored-by: kevin-mindverse <kevin@mindverse.ai>
Co-authored-by: KKKKKKKevin <115385420+kevin-mindverse@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants