Support tokenizers for CJK languages

**Is your feature request related to a problem? Please describe.**
Tokenizer's principal role is to split documents into words (tokens) so each document can be indexed by it's contained words. It is also needed to split a search a query into token, and search on the work index.

**Describe the solution you'd like**
https://github.com/meilisearch/meilisearch/issues/624

**Describe alternatives you've considered**
Use a bert embedding with vector search. (this may not work well for new phrases)

**Additional context**
Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support tokenizers for CJK languages #1909

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support tokenizers for CJK languages #1909

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions