Skip to content

Bug: tolerance option not behaving as hoped #480

@ttillberg

Description

@ttillberg

Thanks for the amazing lib and clear documentation! I'm looking at using Orama to search local chat messages (typically involving a few words up to several sentences).

Using @orama/orama ^1.2.3 I'm getting fast a correct results for exact and prefixed matching however however typos don't seem to work the way I was hoping. I'm probably missing the obvious but testing the tolerance parameter against an example in the docs returns poor results. So I'm wondering what could be wrong.

Looking at the following example.
https://docs.oramasearch.com/usage/search/introduction#typo-tolerance

If I grab a slightly bigger database:
https://github.com/erik-sytnyk/movies-list/blob/master/db.json

{ 
  term: "Christopher Nolan", 
  properties: ["director"] 
}

// result: OK: matches 1 exact result like expected
{
  term: "Cris",
  properties: ["director"],
}

// result: OK: matches 1 document "Michael Cristofer" (no tolerance was set, so this is kind of expected)
{
  term: 'Cris',
  properties: ['director'],
  tolerance: 1,
}

// result: "fails": matches 0 documents, in the documentation this query would return all "Chris's" - not this would still fail bumping the tolerance level
// one example in the DB: "director": "Pierre Coffin, Chris Renaud",

here's my playground (all output is in the console):
https://codesandbox.io/p/sandbox/keen-knuth-9wql22?file=/src/main.ts:65,28

I've played with other options, such as the tokenizer, stemming, relevance, threshold but without luck. What am I missing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions