Skip to content

Conversation

@bevzzz
Copy link
Collaborator

@bevzzz bevzzz commented Jul 7, 2025

Users are more likely to select a vectorizer on collection creation and postpone vector index configuration until some later point in time. Our goal then is to surface vectorizer configuration and push vector index config a level deeper in the configuration so as to avoid information overload.

The new API becomes very expressive:

  collection -> collection
    .vectors(Vectorizers.text2vecWeaviate('custom', t2v -> t2v.vectorizeCollectionName(true)));

with a possibility to configure a different vector index by passing a .vectorIndex() parameter.

Vectorizers is a separate class which holds static constructors for all available vectorizers, similarly to how Iterators helps create Iterator instances.
Vectorizers.none() returns a Map.Entry<String, Vectorizer> to allow creating named vectors without storing the name in the vectorizer object itself. Hopefully that's not too awkward.

CollectionConfig.Builder also has a method which accepts a Map<String, Vectorizer> in case a user needs to pass them all at once.

Finally, I had to update some test cases in JSONTest to reflect that fact that each vectorizer now inevitably carries the vector index config.

The default vector index type is HNSW and the default name is 'default'.

Users are more likely to select a vectorizer on collection creation
and postpone vector index configuration until some later point in time.
Our goal then is to surface vectorizer configuration and shove vector index
config a level deeper in the configuration so as to avoid information overload.

The new API becomes very expressive:
  collection -> collection
    .vectors(Vectorizers.text2vecWeaviate('custom', t2v -> t2v.vectorizeCollectionName(true)));

with a possibility to configure a different vector index by passing .vectorIndex() parameter.

'Vectorizers' is a separate class which holds static constructors for all available
vectorizers, similarly how Iterators helps create Iterator instances.
Vectorizers.none() returns a Map.Entry<String, Vectorizer> to allow creating named
vectors without storing the name in the vectorizer object iself.
Hopefully that's not too awkward.

CollectionConfig.Builder also has a method which accepts a Map<String, Vectorizer>
in case a user needs to pass them all at once.

Finally, I had to update some test cases in JSONTest to reflect that fact that
each vectorizer now inevitably carries the vector index config.

The default vector index type is HNSW and the default name is 'default'.
@bevzzz bevzzz requested a review from g-despot July 7, 2025 12:01
@bevzzz bevzzz self-assigned this Jul 7, 2025
Copy link

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca

@bevzzz bevzzz merged commit b19c424 into v6 Jul 7, 2025
2 checks passed
@bevzzz bevzzz deleted the v6-vectorizer-api branch July 7, 2025 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants