Implement enum type by allevo · Pull Request #458 · oramasearch/orama

allevo · 2023-08-02T20:01:13Z

We collect some issues for our performance during the insertion. In particular, the number type has some problems if the number of documents is high. But also string can be slow too.

The issue related to the insertion time is because of the cost of keeping the order for the number. Trees allow us to keep order with the cost of O(log(N)) for insertion time. But what happens if we choose to lose the order to have a faster insertion time?

Depending on what you need, values can have a lot of meanings:

strings you want to search for
numbers you want to compare with operators (>, <, >=, <=, =, !=)
enum you want to compare with operators (=, in)

This PR introduces the enum type to allow:

fast insertion
fast search
= and in operator (others??)
reduce the dump size

What this PR misses:

tests: done
we need to identify FlatTree properly: used a discriminator string
choose if we want to support enum[] as well: discarded for now
documentation: done
collect suggestions and comment: done

Usage:

const db = await create({
  schema: {
    categoryId: 'enum',
  },
})

const [c2, c3, c5] = await insertMultiple(db, [
  { categoryId: 1 },
  { categoryId: 1 },
  { categoryId: 2 },
  { categoryId: 3 },
  { categoryId: "5" },
])
const resultEq = await search(db, {
  term: '',
  where: {
    categoryId: { eq: 1 },
  }
}) // Expected 2 results
const resultIn = await search(db, {
  term: '',
  where: {
    categoryId: { in: [1, 3, "5"] },
  }
}) // Expected 4 results

vercel · 2023-08-02T20:01:19Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
orama-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 8, 2023 5:53pm

packages/orama/src/trees/flat.ts

ShogunPanda · 2023-08-03T10:25:09Z

I'm not sure I understand the purpose of this. Can you please clarify?

allevo · 2023-08-08T08:35:35Z

Hi,
sorry for the delay.
I'll try to describe the proposal in a better way.
Sorry for the wall.

Context

Now, Orama uses tree structures to store strings and numbers. In particular, it uses radix trees and avl trees to store respectively strings and numbers.

Problem

The time insertion is high when the developer trying to insert more than 100k of documents.
Some users opened issues or report that issues on slack.
In particular, the insertion inside avl trees took times.

We uses trees because that structures have some guarantees like:

find all the documents mapped by a defined value
find all documents greater or lesser that a value
for radix tree, find all documents using a tolerance

All those features are great for some kind of data:

strings where you want to find a substring
filter for number equity
filter for number using a comparator

All those searches (and others) are yet supported by Orama.

Anyway, if I think generally a search, I'm thinking a search bar + filters. Meanwhile the search bar completely leverages on radix tree for typo tolerance for instance, the filters are typically implemented using two kind of interfaces:

the final user can specify a range of a value. The example is the price or the distance from the city center.
the final user can specify the presence or the absence of a specific value inside the document. For instance, if an hotel has a services like a parking, or if a book has the Kindle version, or t-shirt size.

For the first use case, we can leverage on avl because we need the comparators.
For the second use case, we could leverage on the same implementation but it creates an overhead because we are still keeping the order also for fields where the order or the typo tolerance don't make any sense.

Solution

I'm proposing a different approach to not keep the order for fields that doesn't require it. Like the example above, the user still want to filter on it with an exact match and without keeping an order.
This solution offers a way to implement it using a Map<any, InternalDocumentId[]>.
Benefits:

30% improvement for number insertion
less memory usage (not yet tested)
reduce the dump size (not yet tested)

Const:

add another type that cannot so easy to understand.

The implementation is still in progress because I would like to understand if this makes sense before finalizing it.

ShogunPanda · 2023-08-09T07:27:00Z

I see!
The proposal makes sense to me and an enum searching it definitely worth a try.
I'll review the code once it's finalized but I have a question here: do you think you can implement this using a new component (which means: do not use the usual radix+avl based component when performing operations) rather than modifying sparse part of Orama? This way we have a cleaner and "black-box like" implementation.

allevo · 2023-08-09T12:05:57Z

Hi!
What do you mean by a new component? Which component are you expecting to have?
It is always an index, right?

ShogunPanda · 2023-08-09T12:47:33Z

Exactly, a new Index component that can replace the default one. This use-case is an example of the ratio behind Orama's architecture.

allevo · 2023-08-09T13:52:56Z

What do you mean? I want to add this feature to the default index.
If this is not the case, could you give me an example of your thoughts?

ShogunPanda · 2023-08-09T13:54:31Z

I think that in order to keep Orama clean things should be separated. Nothing prevents you to implement the new feature as new index and then call within the default index.
But I still think this should be kept as separated component.

micheleriva

Looks good to me, just a couple of minor changes. Will approve once we also have documentation ready. Well done!

packages/orama/src/components/index.ts

packages/orama/src/trees/flat.ts

micheleriva

I think it mostly looks good, except for a couple of small comments. @allevo please add tests for Enums for the plugin-data-persistence package as well.

Good to go as soon as these comments are addressed.

Terrific job 🔥

packages/docs/pages/usage/create.mdx

packages/orama/src/components/index.ts

micheleriva

LGTM

allevo requested review from ShogunPanda and micheleriva August 2, 2023 20:01

ShogunPanda reviewed Aug 3, 2023

View reviewed changes

packages/orama/src/trees/flat.ts Outdated Show resolved Hide resolved

allevo force-pushed the feat/implement-enum branch from 4789851 to 8fa6e3a Compare September 5, 2023 08:54

vercel bot deployed to Preview September 5, 2023 08:55 View deployment

allevo force-pushed the feat/implement-enum branch from 8fa6e3a to 6289b81 Compare September 5, 2023 09:31

vercel bot deployed to Preview September 5, 2023 09:32 View deployment

allevo force-pushed the feat/implement-enum branch from 6289b81 to 70b86ed Compare September 5, 2023 09:53

vercel bot deployed to Preview September 5, 2023 09:54 View deployment

allevo force-pushed the feat/implement-enum branch from 70b86ed to 0ce6534 Compare September 5, 2023 10:27

vercel bot deployed to Preview September 5, 2023 10:28 View deployment

allevo force-pushed the feat/implement-enum branch from 0ce6534 to cfc7381 Compare September 5, 2023 10:29

vercel bot deployed to Preview September 5, 2023 10:30 View deployment

micheleriva reviewed Sep 5, 2023

View reviewed changes

vercel bot deployed to Preview September 6, 2023 13:47 View deployment

vercel bot deployed to Preview September 6, 2023 13:55 View deployment

vercel bot deployed to Preview September 6, 2023 13:56 View deployment

vercel bot deployed to Preview September 7, 2023 09:08 View deployment

allevo requested review from ShogunPanda and micheleriva September 7, 2023 09:09

allevo marked this pull request as ready for review September 7, 2023 09:11

Implement enum type

08dbe4d

allevo added 4 commits September 7, 2023 11:19

Make flattree serializable

6216680

Use enum

c7e6752

Use Nullable over |null

74a8585

Implement nin operator. Add doc

0c949b2

allevo force-pushed the feat/implement-enum branch from 6cc8db0 to 0c949b2 Compare September 7, 2023 09:20

vercel bot deployed to Preview September 7, 2023 09:21 View deployment

micheleriva reviewed Sep 8, 2023

View reviewed changes

packages/docs/pages/usage/create.mdx Outdated Show resolved Hide resolved

packages/orama/src/components/index.ts Outdated Show resolved Hide resolved

allevo added 2 commits September 8, 2023 19:35

Add data-persist test

0d25c2c

Address suggestions

81cfde5

vercel bot deployed to Preview September 8, 2023 17:43 View deployment

Fix test

0aae939

vercel bot deployed to Preview September 8, 2023 17:50 View deployment

allevo requested a review from micheleriva September 9, 2023 08:29

micheleriva approved these changes Sep 14, 2023

View reviewed changes

micheleriva merged commit c35ca0e into main Sep 14, 2023

micheleriva deleted the feat/implement-enum branch September 14, 2023 21:53

allevo mentioned this pull request Sep 18, 2023

Implements enum[] #482

Merged

5 tasks

Uh oh!

Conversation

allevo commented Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ShogunPanda commented Aug 3, 2023

Uh oh!

allevo commented Aug 8, 2023

Context

Problem

Solution

Uh oh!

ShogunPanda commented Aug 9, 2023

Uh oh!

allevo commented Aug 9, 2023

Uh oh!

ShogunPanda commented Aug 9, 2023

Uh oh!

allevo commented Aug 9, 2023

Uh oh!

ShogunPanda commented Aug 9, 2023

Uh oh!

micheleriva left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

micheleriva left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

micheleriva left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

allevo commented Aug 2, 2023 •

edited

Loading

vercel bot commented Aug 2, 2023 •

edited

Loading