Introduce Orama internal ID for documents

# Introduction to the problem

As per the Orama documentation ([link](https://docs.oramasearch.com/usage/insert#custom-document-ids)):

> Orama automatically uses the id field of the document, if found.
>
> That means that given the following document and schema:
> ```js
> import { create, search } from '@orama/orama'
> 
> const db = await create({
>   schema: {
>     id: 'string',
>     author: 'string',
>     quote: 'string',
>   },
> })
>
> await insert(db, {
>   id: '73cbcc79-2203-49b8-bb52-60d8e9a66c5f',
>   author: 'Fernando Pessoa',
>   quote: "I wasn't meant for reality, but life came and found me",
> })
>```
>
> The document will be indexed with the following `id`: `73cbcc79-2203-49b8-bb52-60d8e9a66c5f`.
>
> If the `id` field is not found, Orama will generate a random `id` for the document.

This gives users a great opportunity to use their own custom IDs, but at a cost: Orama uses several data structures (AVL Trees, Radix Trees, Inverted Indexes) where the ID of a document gets duplicated multiple times.

For example, if we insert the following document:

```js
{
  id: '37fd6bb7-5ac2-4a37-adcd-485045b54bdc',
  author: 'Michele',
  quote: 'Hello, world!'
}
```

The id `37fd6bb7-5ac2-4a37-adcd-485045b54bdc` will get duplicated at least three times:

1. `'Michele'` will be stored in the radix tree, where the last node will contain the reference of the document containing this specific token
2. `'Hello'` will be stored in the radix tree, where the last node will contain the reference of the document containing this specific token
3. `'world'` will be stored in the radix tree, where the last node will contain the reference of the document containing this specific token

There might be other places where this ID can get duplicated depending on certain conditions.

# How to solve

If the user is indexing the content using a `UUID`, this will drastically affect the index size, because of several duplications of a quite large string. 

Therefore, we should let users index every document with their own custom IDs (without putting them into a Radix Tree, therefore making the `id` property not searchable), but use an internal, shorter ID, possibly using the `syncUniqueId` function exported [here](https://github.com/oramasearch/orama/blob/main/packages/orama/src/utils.ts#L92) to store the document reference in our data structures.

Users should be able to retrieve their docs using the `getById` function ([link](https://docs.oramasearch.com/usage/utilities#getbyid)), but internally, Orama should always use a short, optimized ID.

# Bounty Program

This issue is subject to our **Open Source Bounty Program**, and we'll reward whoever is creating a PR that gets merged with $800 for this activity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce Orama internal ID for documents #426

Introduction to the problem

How to solve

Bounty Program

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Introduce Orama internal ID for documents #426

Description

Introduction to the problem

How to solve

Bounty Program

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions