Skip to content

Support tokenizers for CJK languages #1909

@ghost

Description

Is your feature request related to a problem? Please describe.
Tokenizer's principal role is to split documents into words (tokens) so each document can be indexed by it's contained words. It is also needed to split a search a query into token, and search on the work index.

Describe the solution you'd like
meilisearch/meilisearch#624

Describe alternatives you've considered
Use a bert embedding with vector search. (this may not work well for new phrases)

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions