Skip to content

MB-19243: Detect fuzziness automatically based on term length#2060

Merged
CascadingRadium merged 3 commits intomasterfrom
autoFuzz
Nov 20, 2024
Merged

MB-19243: Detect fuzziness automatically based on term length#2060
CascadingRadium merged 3 commits intomasterfrom
autoFuzz

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Aug 2, 2024

  • The following queries can now automatically detect fuzziness based on the length of the terms:
    • Match Query
    • Fuzzy Query
    • Match-Phrase Query
    • Multi-Phrase Query
    • Phrase Query
  • In these queries, each term (whether in a multi-term query like Match or Phrase, or in a single-term query like Fuzzy can have its own edit distance based on its length. The edit distance is calculated as follows:
    • For terms with 1 or 2 characters: edit distance = 0 (exact match)
    • For terms with 3, 4, or 5 characters: edit distance = 1 (fuzzy match)
    • For terms with more than 5 characters: edit distance = 2 (fuzzy match)
  • This feature can be enabled using the <query>.SetAutoFuzziness(<bool>) API.
  • Additionally, we've extended the functionality to query JSON parsing. You can specify fuzziness as either "auto" or a static value in the JSON query. Both formats are valid:
  1. With auto fuzziness:
{
  "match" : "lorem",
  "field" : "bleve"
  "fuzziness" : "auto"
}
  1. With static fuzziness:
{
  "match" : "lorem",
  "field" : "bleve"
  "fuzziness" : 2
}

When unmarshalled, the query will correctly apply the chosen fuzziness method.

  • Fixed a bug where the code incorrectly returned an error message saying fuzziness exceeds maximum when using a fuzzy searcher with fuzziness = 0. Instead, a term searcher is now returned in this case.

@CascadingRadium CascadingRadium added this to the v2.4.3 milestone Aug 2, 2024
@CascadingRadium CascadingRadium self-assigned this Aug 2, 2024
@CascadingRadium CascadingRadium changed the title MB-19243: Auto Fuzzy support MB-19243: Detect fuzziness automatically based on term length Aug 2, 2024
@abhinavdangeti abhinavdangeti modified the milestone: v2.4.3 Aug 5, 2024
@abhinavdangeti abhinavdangeti added this to the v2.5.0 milestone Sep 18, 2024
@abhinavdangeti abhinavdangeti removed the request for review from moshaad7 October 17, 2024 17:49
abhinavdangeti
abhinavdangeti previously approved these changes Nov 13, 2024
@CascadingRadium
Copy link
Member Author

CascadingRadium commented Nov 14, 2024

force pushed a rebase
please review again
thanks

@CascadingRadium CascadingRadium merged commit 3a21667 into master Nov 20, 2024
@CascadingRadium CascadingRadium deleted the autoFuzz branch November 20, 2024 17:43
project-mirrors-bot-tu bot pushed a commit to project-mirrors/forgejo-as-gitea-fork that referenced this pull request Apr 6, 2025
…-gitea#7468)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/blevesearch/bleve/v2](https://github.com/blevesearch/bleve) | require | minor | `v2.4.4` -> `v2.5.0` |

---

### Release Notes

<details>
<summary>blevesearch/bleve (github.com/blevesearch/bleve/v2)</summary>

### [`v2.5.0`](https://github.com/blevesearch/bleve/releases/tag/v2.5.0)

[Compare Source](blevesearch/bleve@v2.4.4...v2.5.0)

##### Bug Fixes

-   Exact hits to score higher than fuzzy hits, with blevesearch/bleve#2056
-   Fix boosting during hybrid search that involves text + nearest neighbor, with blevesearch/bleve#2127
-   Addressed bug in IP field handling while highlighting, with blevesearch/bleve#2142
-   Graceful error handling within registry, with blevesearch/bleve#2151
-   `http/` package (meant for demo purposes) removed from repository to remove vulnerability - [CVE-2022-31022](GHSA-9w9f-6mg8-jp7w), relocated to within https://github.com/blevesearch/bleve-explorer
-   Geo radius queries will now advertise distances (within sort values) in readable format, with blevesearch/bleve#2137

##### Improvements

-   Vector search requires `faiss` dynamic library to be built from [blevesearch/faiss@352484e](https://github.com/blevesearch/faiss/tree/352484e0fc9d1f8f46737841efe5f26e0f383f71) which is a modified version of [v1.10.0](https://github.com/facebookresearch/faiss/releases/tag/v1.10.0)
-   Support for **BM25 scoring**, see: [scoring.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/scoring.md#bm25)
-   Support for **synonyms' search**, see: [synonyms.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/synonyms.md)
-   **Significant performance improvements in pre-filtered vector search**, with blevesearch/bleve#2169 + dependent changes
-   `auto` fuzziness detection with blevesearch/bleve#2060
-   Ability to affect ingestion/drain rate by tuning persister workers with blevesearch/bleve#2100
-   Additional config in merge policy for improved merger behavior, with blevesearch/bleve#2134
-   Geo improvements: footprint reduction for polygons, better validation and graceful error handling, with blevesearch/bleve#2162 + blevesearch/bleve#2158 + blevesearch/bleve#2165
-   Upgrade to RoaringBitmap/roaring@v2.4.5, etcd.io/bbolt@v1.4.0
-   More metrics

##### Milestone

-   [v2.5.0](https://github.com/blevesearch/bleve/milestone/24)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "* 0-3 * * *" (UTC), Automerge - "* 0-3 * * *" (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMjIuMSIsInVwZGF0ZWRJblZlciI6IjM5LjIyMi4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=-->

Co-authored-by: Gusted <postmaster@gusted.xyz>
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7468
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Reviewed-by: Shiny Nematoda <snematoda@noreply.codeberg.org>
Co-authored-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Co-committed-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants