Make DataType and DataFormat top-level enums#143312
Conversation
|
Pinging @elastic/search-inference-team (Team:Search - Inference) |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
| */ | ||
| public enum DataType { | ||
| TEXT(DataFormat.TEXT, EnumSet.of(DataFormat.TEXT)), | ||
| IMAGE(DataFormat.BASE64, EnumSet.of(DataFormat.BASE64)); |
There was a problem hiding this comment.
The jina-clip-v2 API on Jina's side also allows to send a link for an image, so it doesn't need to be the image encoded as base64. It does in-fact work when specifying the format as base64 as I don't think we do any validation/enforcement?
Should we add a new format url or add text as valid format for the image data type? cc @DonalEvans
There was a problem hiding this comment.
Let's save that work for a follow-up PR. Adding a new format goes beyond the scope of refactoring.
timgrein
left a comment
There was a problem hiding this comment.
Adding the url data format is probably out-of-scope for this PR, just wanted to get the discussion started :)
…cations * upstream/main: (60 commits) Use batches for other bulk vector benchmarks (elastic#143167) Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:lookup-join.MvJoinKeyOnTheLookupIndexAfterStats} elastic#143388 Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testBackToBackQueuedDeletes elastic#143387 [Inference API] Parse endpoint metadata from persisted endpoints (elastic#143081) Add cluster formation doc to DistributedArchitectureGuide (elastic#143318) Fix flattened root block loader null expectation (elastic#143238) Unmute ValueSourceReaderTypeConversionTests testLoadAll (elastic#143189) ESQL: Add split coalescing for many small files (elastic#143335) Unmute mixed-cluster spatial parse warning test (elastic#143186) Fix zero-size estimate in BytesRefBlock null test (elastic#143258) Make DataType and DataFormat top-level enums (elastic#143312) Add support for steps to change the target index name for later steps (elastic#142955) Set mayContainDuplicates flag to test deduplication (elastic#143375) ESQL: Fix Driver search load millis as nanos bug (elastic#143267) Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:lookup-join.LookupJoinWithMixPushableAndUnpushableFilters} elastic#143378 ESQL: Forbid MV_EXPAND before full text functions (elastic#143249) ESQL: Fix unresolved name pattern (elastic#143210) Implement boxplot queryDSL aggregation for exponential_histograms (elastic#143026) Add prefetching to x64 bulk vector implementations (elastic#142387) Make large segment vector tests resilient to memory constraints (elastic#143366) ...
Refactors
DataTypeandDataFormatto make them top-level enums. Also store the supported format set inDataTypeso that logic is centralized in the enum.These enums will be used in the
embeddingquery vector builder I am currently working on and it is more appropriate to refer to them as top-level enums.