Support for records bigger than a page

### Discussed in https://github.com/ArcadeData/arcadedb/discussions/320

<div type='discussions-op-text'>

<sup>Originally posted by **lvca** February  7, 2022</sup>
Currently (version <= 22.02.1) ArcadeDB does not support storing records larger than a page. This limitation must be overcome to:
- store large records, especially binary records (blobs)
- avoid worrying too much about importing the database if the business requirements change and larger records are needed.

First, a quick introduction to how records are stored on a page. The first part of the page contains the header with slots with pointers where the record content is located on the page. There can be a maximum of 2048 (`Bucket.DEF_MAX_RECORDS_IN_PAGE`) records on a page (2048 slots). The record content is prefixed by it size. The size is stored with a varint (variable integer). The record size is the length of the record:
- 0 = deleted record
- -1 = placeholder pointer that points to another record on another page
- <-1 = placeholder content pointed from another record in another page

The record size is stored as a varint (variable integer size). The minimum size of a record stored on a page is 5 bytes. If the record is smaller than 5 bytes, it is filled with blanks.

In order to store large records, we must split the record into chunks and save all of them in sequence as a linked list. To let the bucket know it's a chunk of a record, the new `size = -2` must be used. 

_NOTE: This doesn't interfere with the current content, because negative sizes are considered placeholder content, but records cannot be smaller than 5 bytes, so it's not possible to encounter a placeholder content record with record size = -2. The minimum would be -5._

So a record with size -2 will contain the first chunk of the record. The record content will have the following information before the actual chunk of data:
- the chunk size in bytes as varint
- the location of the next chunk, stored as a placeholder pointer containing the record slot in the same bucket

In order to read the entire record, the record is built in memory chunk after chunk, jumping between pages until the pointer to the next chunk is 0 (zero).

The first chunk has a record size = -2, while the other chunks will have a record size = -3. This allows the `scan()` and `count()` methods to skip those records once encountered.</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for records bigger than a page #332

Discussed in #320

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support for records bigger than a page #332

Description

Discussed in #320

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions