Skip to content

perf: lazy operations on sorter#434

Merged
micheleriva merged 6 commits intooramasearch:mainfrom
h4ad-forks:perf/faster-insert
Jul 11, 2023
Merged

perf: lazy operations on sorter#434
micheleriva merged 6 commits intooramasearch:mainfrom
h4ad-forks:perf/faster-insert

Conversation

@H4ad
Copy link
Copy Markdown
Contributor

@H4ad H4ad commented Jul 3, 2023

Well, instead of sorting every time you insert an item, now it only sort when you need to search, this speedup up the insert by A LOT.

To compare, the following code went from 1m 7s to 1s with this change:

import { create, insert, remove, search } from "./dist/index.js";

(async () => {
  const db = await create({
    schema: {
      name: "string"
    }
  });

  let now = performance.now();
  for (let i = 0; i < 3e4; i++) {
    await insert(db, {
      id: i.toString(),
      name: Math.random().toString(16).substring(8)
    });
  }
  console.log(`insert: ${performance.now() - now}ms`);
  console.log(`memory: ${process.memoryUsage().heapUsed / 1024 / 1024}MB`);
  now = performance.now();

  await search(db, {
    term: 'lll'
  });
  console.log(`search: ${performance.now() - now}ms`);
  now = performance.now();

  for (let i = 0; i < 1e4; i++) {
    await remove(db, i.toString());
  }
  console.log(`remove: ${performance.now() - now}ms`);
  console.log(`memory: ${process.memoryUsage().heapUsed / 1024 / 1024}MB`);
  now = performance.now();

  await search(db, {
    term: 'lll'
  });
  console.log(`search: ${performance.now() - now}ms`);
})();

Before this change:

insert: 49947.061446000356ms
memory: 44.56006622314453MB
search: 1.312158000189811ms
remove: 5246.693398999982ms
memory: 45.88232421875MB
search: 0.10874100029468536ms

Now:

insert: 694.5329680000432ms
memory: 54.184051513671875MB
search: 1.2686470001935959ms
remove: 92.8359210002236ms
memory: 58.10900115966797MB
search: 0.10343399969860911ms

Also, this new change also fix the #348, instead of hours, it took just a minute to run.

/claim #434

@vercel
Copy link
Copy Markdown

vercel bot commented Jul 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
orama-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 10, 2023 11:31am

@micheleriva
Copy link
Copy Markdown
Contributor

@H4ad how does this affect search time?

@H4ad
Copy link
Copy Markdown
Contributor Author

H4ad commented Jul 3, 2023

image

Just add 50~60ms on the first search.

You can also call ensureIsSorted after insert to not affect the search.

Now it probably consumes 99% less time.

@micheleriva
Copy link
Copy Markdown
Contributor

You can also call ensureIsSorted after insert to not affect the search.

Could you please add this as a last function call in insertMultiple?


@allevo maybe we should expose a simple sort function to ensure sorting? Something like:

import { create, insertMultiple, ensureSort } from '@orama/orama'

const db = await create({ ... })

await insertMultiple(db, [...])

await ensureSort(db)

^ super bad DX, written it down not to forget about it

@H4ad
Copy link
Copy Markdown
Contributor Author

H4ad commented Jul 3, 2023

@micheleriva Only worth adding if you know the users never call insertMultiple twice, do you still want to add it?

About the ensureSort, you can add a option during the insert: lazySort, and let it default to true.

This was referenced Jul 3, 2023
@ShogunPanda
Copy link
Copy Markdown
Contributor

ShogunPanda commented Jul 4, 2023

The PR looks fine to me in general, even though I don't really like the lazy approach to solve the performance problem.
I'm conflicted on where to put ensureSort.

As @H4ad pointed out, doing it within insertMultiple might make not a lot of sense if is not called iteratively.

If we move the .sorted check from ensurePropertyIsSorted to ensureIsSorted it should probably be fine to call within search.

@micheleriva
Copy link
Copy Markdown
Contributor

What about this:

  1. Add a isSorted boolean property to the Orama instance
  2. Running a function such as ensureSort will set it to true
  3. If isSorted is set to false, the first search operation will call ensureSort
  4. After every new insertion/update, isSorted should route back to false

what do you think?

@ShogunPanda
Copy link
Copy Markdown
Contributor

So we handle this outside of the sorter? Ok, it makes sense to me.

@H4ad Can you please implement the aforementioned changes?

@micheleriva
Copy link
Copy Markdown
Contributor

@H4ad we didn't plan this activity, but we'll reward a small bounty anyway as soon as it lands. Your work is immensely appreciated

@micheleriva
Copy link
Copy Markdown
Contributor

/tip $70

@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Jul 4, 2023

👉 @micheleriva: Click here to proceed

@H4ad
Copy link
Copy Markdown
Contributor Author

H4ad commented Jul 4, 2023

As soon as I have time I will implement these changes, probably today.

@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Jul 4, 2023

💡 @H4ad submitted a pull request that claims the bounty. You can visit your org dashboard to reward.
👉 @H4ad: To receive payouts, sign up on Algora, link your Github account and connect with Stripe on your dashboard.

@H4ad
Copy link
Copy Markdown
Contributor Author

H4ad commented Jul 4, 2023

@micheleriva I read the suggestions but it doesn't make sense to have the isSorted inside the orama instance.

If I define the isSorted on orama instance, I don't know which property is not sorted since we keep the orderedDocs inside an Record of properties.

Or, I keep the current changes and I also add a property called isSorted, so when I call insert I mark the PropertySort and also the orama instance as isSorted=false, what do you think?

@ShogunPanda
Copy link
Copy Markdown
Contributor

ShogunPanda commented Jul 5, 2023

@H4ad The idea is that you don't really remember which properties are sorted and which are not.

It's easier to just assume that when a document is inserted/updated/removed all properties are unsorted. The rationale behind this is that unliked other system Orama is optimized to be used in a two phases: "insert all documents" and then "search" (and the DB in this phase should not be modified anymore).
All heavy duties are done in the inserting phase while the searching phase must be the most lightweight possible.

@H4ad
Copy link
Copy Markdown
Contributor Author

H4ad commented Jul 5, 2023

@ShogunPanda I agree with your design of Orama and this change will not affect that.

Calling ensureSort was just used by the tests, it will not be used by the users because they don't need it.

Doing this approach will just make the search slower since we need to traverse NItems*NProperties, instead of traversing the properties that were modified.

On the optimization side, adding it to orama does not make sense, but if you still want to do that, I will push a commit later this day.

@ShogunPanda
Copy link
Copy Markdown
Contributor

ShogunPanda commented Jul 5, 2023

If you don't want to add to Orama, you can add it to the sorter, eventually.

@H4ad
Copy link
Copy Markdown
Contributor Author

H4ad commented Jul 5, 2023

@ShogunPanda Added isSorter to sorter and also I added the properties to be serialized, so we can restore the sorter without needing to sort again.

I also removed the export of ensureSort method, that method is a implementation detail and it should not be required.

@H4ad H4ad changed the title perf: lazy sort on insert perf: lazy operations on sorter Jul 5, 2023
@H4ad
Copy link
Copy Markdown
Contributor Author

H4ad commented Jul 5, 2023

I also transform the remove operation into lazy one, before:

insert: 49947.061446000356ms
memory: 44.56006622314453MB
search: 1.312158000189811ms
remove: 5246.693398999982ms
memory: 45.88232421875MB
search: 0.10874100029468536ms

Now:

insert: 694.5329680000432ms
memory: 54.184051513671875MB
search: 1.2686470001935959ms
remove: 92.8359210002236ms
memory: 58.10900115966797MB
search: 0.10343399969860911ms

@H4ad H4ad force-pushed the perf/faster-insert branch from 63c011b to 698be5f Compare July 5, 2023 23:38
sorter.sortablePropertiesWithTypes[path] = type
sorter.sorts[path] = {
docs: {},
docs: new Map(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear we can't use maps here as they're not directly serializable to JSON. @allevo, @ShogunPanda please correct me if I'm wrong

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes correct

 % node
Welcome to Node.js v20.2.0.
Type ".help" for more information.
> JSON.stringify({ a: new Map() })
'{"a":{}}'
> a = new Map()
Map(0) {}
> a.set('a', 0)
Map(1) { 'a' => 0 }
> JSON.stringify({ a })
'{"a":{}}'
> a
Map(1) { 'a' => 0 }
> 

A custom toJSON function is required here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite strange that our tests couldn't identify this. We need to fix that sooner or later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShogunPanda
Copy link
Copy Markdown
Contributor

Other than things that @micheleriva outlined, looks good to me.

@micheleriva
Copy link
Copy Markdown
Contributor

Looks good to me. @allevo / @ShogunPanda would you be so kind to test this locally with serialization/deserialization before merging?

@H4ad H4ad mentioned this pull request Jul 9, 2023
5 tasks
@ShogunPanda
Copy link
Copy Markdown
Contributor

@micheleriva I'm currently full on my list. @allevo can probably test this quicker.

Copy link
Copy Markdown
Contributor

@allevo allevo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI! Thanks for your contribution.
I'm checking the code.
Does this change also affect the documentation?

return
}

sorter.language = language
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to support a language change during the insertion: the language remains the same as the DB creation.

Copy link
Copy Markdown
Contributor Author

@H4ad H4ad Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what Orama does already: https://github.com/oramasearch/orama/blob/main/packages/orama/src/components/sorter.ts#L111

But I change the default to get from tokenizer.

@allevo
Copy link
Copy Markdown
Contributor

allevo commented Jul 10, 2023

Another tip: could you ensure after the insertMany / removeManythe sorting?

@H4ad H4ad force-pushed the perf/faster-insert branch from 6d4611c to 8bed2bb Compare July 10, 2023 11:31
@H4ad
Copy link
Copy Markdown
Contributor Author

H4ad commented Jul 10, 2023

@allevo We can do that, but doesn't make sense on the optimization side: #434 (comment)

And this change doesn't change the documentation.

@allevo
Copy link
Copy Markdown
Contributor

allevo commented Jul 11, 2023

LGTM !
And thanks a lot!

Copy link
Copy Markdown
Contributor

@allevo allevo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@micheleriva micheleriva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Amazing job @H4ad

@micheleriva micheleriva merged commit a4b4694 into oramasearch:main Jul 11, 2023
@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Jul 11, 2023

@H4ad: You just got a $70 tip! 👉 Complete your Algora onboarding to collect your payment.

@H4ad H4ad deleted the perf/faster-insert branch July 11, 2023 15:18
@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Jul 11, 2023

🎉🎈 @H4ad has been awarded $70! 🎈🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants