Skip to content

Dictionary based stemming#2062

Merged
kishorenc merged 15 commits into
typesense:v28from
krunal1313:normalizing_plurals
Dec 5, 2024
Merged

Dictionary based stemming#2062
kishorenc merged 15 commits into
typesense:v28from
krunal1313:normalizing_plurals

Conversation

@krunal1313

@krunal1313 krunal1313 commented Nov 11, 2024

Copy link
Copy Markdown
Contributor

Change Summary

  • add functionality to import plurals via end-point
  • normalize plurals with stem_dictionary=true per field
  • add test

adding dictionary via end-point

To add plurals, we need to import jsonl file to end-point /stemming/dictionaries/import via POST request like below,

curl -k -H 'Content-Type:' -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -X POST --data-binary @plurals.jsonl "http://localhost:8108/stemming/dictionaries/import?id=set1"

Here id is dictionary name we want to store.

jsonl file should contain plurals in below format

{"word": "meetings", "root":"meeting"}
{"word": "people", "root":"person"}
{"word": "attentions", "root":"attention"}
{"word": "leathers", "root":"leather"}
{"word": "qualities", "root":"quality"}

get specific dictionary

To get the stored dictionary, we need to request via GET request to end-point /stemming/dictionaries/:id like below,

curl "http://localhost:8108/stemming/dictionaries/set1" -X GET -H "Content-Type: application/json" -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"

Here set1 is the id of dictionary given while storing the dictionary. We will get the response like below.

{"id":"set1","words":[{"root":"attention","word":"attentions"},{"root":"quality","word":"qualities"},{"root":"leather","word":"leathers"},{"root":"person","word":"people"},{"root":"meeting","word":"meetings"}]}

get all stored dictionaries

To fetch all dictionary sets stored on typesense server, we need to send a GET request to end-point /stemming/dictionaries like below,

curl "http://localhost:8108/stemming/dictionaries" -X GET -H "Content-Type: application/json" -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"

Which will respond in below format,

{"dictionaries":["set1"]}

using the dictionary with collection search

We need to specify dictionary name while creating collection in order to use with searches in collection.
Here's how it can be created,

curl "http://localhost:8108/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
         "name": "companies",
         "fields": [
           {"name": "country", "type": "string", "stem_dictionary": "set1" }
       }'

Here stem_dictionary param specifies which dictionary to use for normalizing words while searching.

PR Checklist

@kishorenc kishorenc merged commit adbfa73 into typesense:v28 Dec 5, 2024
@b0g3r

b0g3r commented Jan 22, 2025

Copy link
Copy Markdown
Contributor

@krunal1313 @kishorenc thanks, it is really valuable feature for our use case. I see that POST endpoint works as "upsert", but in case I need to not only add new word to the list, but remove as well, what is the process?

@kishorenc

Copy link
Copy Markdown
Member

Is there a reason you can't call DELETE and re-import? Or create a new v2 version of the dictionary via /stemming/dictionary/import?id=set_v2 and then update the application to use that?

@kishorenc kishorenc changed the title Normalize plurals Dictionary based stemming Jan 23, 2025
@PavelKoroteev

Copy link
Copy Markdown

Is there a reason you can't call DELETE and re-import? Or create a new v2 version of the dictionary via /stemming/dictionary/import?id=set_v2 and then update the application to use that?

Hello,@kishorenc.

I couldn't fine the DELETE method by testing, I checked the openapi spec and I couldn't find the DELETE method for stemming dictionary as well. Is it missing or I'm missing something? Thank you in advance.

@kishorenc

Copy link
Copy Markdown
Member

Ugh, looks like we implemented this but forgot to hook it up in the routes. I will get you a fixed build once this is addressed soon.

@krunal1313 krunal1313 deleted the normalizing_plurals branch April 23, 2026 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants