Thanks for the amazing lib and clear documentation! I'm looking at using Orama to search local chat messages (typically involving a few words up to several sentences).
Using @orama/orama ^1.2.3 I'm getting fast a correct results for exact and prefixed matching however however typos don't seem to work the way I was hoping. I'm probably missing the obvious but testing the tolerance parameter against an example in the docs returns poor results. So I'm wondering what could be wrong.
Looking at the following example.
https://docs.oramasearch.com/usage/search/introduction#typo-tolerance
If I grab a slightly bigger database:
https://github.com/erik-sytnyk/movies-list/blob/master/db.json
{
term: "Christopher Nolan",
properties: ["director"]
}
// result: OK: matches 1 exact result like expected
{
term: "Cris",
properties: ["director"],
}
// result: OK: matches 1 document "Michael Cristofer" (no tolerance was set, so this is kind of expected)
{
term: 'Cris',
properties: ['director'],
tolerance: 1,
}
// result: "fails": matches 0 documents, in the documentation this query would return all "Chris's" - not this would still fail bumping the tolerance level
// one example in the DB: "director": "Pierre Coffin, Chris Renaud",
here's my playground (all output is in the console):
https://codesandbox.io/p/sandbox/keen-knuth-9wql22?file=/src/main.ts:65,28
I've played with other options, such as the tokenizer, stemming, relevance, threshold but without luck. What am I missing?
Thanks for the amazing lib and clear documentation! I'm looking at using Orama to search local chat messages (typically involving a few words up to several sentences).
Using
@orama/orama ^1.2.3I'm getting fast a correct results for exact and prefixed matching however however typos don't seem to work the way I was hoping. I'm probably missing the obvious but testing thetoleranceparameter against an example in the docs returns poor results. So I'm wondering what could be wrong.Looking at the following example.
https://docs.oramasearch.com/usage/search/introduction#typo-tolerance
If I grab a slightly bigger database:
https://github.com/erik-sytnyk/movies-list/blob/master/db.json
here's my playground (all output is in the console):
https://codesandbox.io/p/sandbox/keen-knuth-9wql22?file=/src/main.ts:65,28
I've played with other options, such as the tokenizer, stemming, relevance, threshold but without luck. What am I missing?