Promises: bigger-than-RAM

# Goals

Support some forms of “bigger-than-RAM” recordings, as soon as possible

# Background

## Small-index vs Big-index

Table index: row ids and time points.

Does the table index fit in RAM?

Hypothesis: most “bigger-than-RAM” problems have smallish indices.

### Big index

Example: 100GB of scalar plots

We need a hierarchical index file on disk, with seeking, and have store-subscribers that are aware of this, etc. Difficult!

### Small index

Example: thousands of uncompressed 4k images, or big point clouds, meshes, …

We “just” need to figure out how load blobs from disk on-demand. Easier!

## Promises: a solution to small-index

We replace large blobs with promises, that refer to the external data.

A promise could be a file path with optional byte offset, a URL, …

When a query results in a Promise, we (try) to resolve it.

Example: we go through a huge MCAP file and log it to Rerun, but replace big blobs by a Promise referring to a byte-offset in the MCAP.

# User stories

- Logging a file reference
    - `rr.log(”image”, rr.Image(data=rr.Promise.file_path(”foo.jpg”)))`
- VRS
    - `file://recording.vrs?stream=video&time=42`
- Log a video file
    ```python
    for i, frame in enumerate(video):
        rr.set_time_point("frame", i)
        rr.log(”video”, rr.Image(data=rr.Promise.file_path(f”foo.mp4?frame={i}”)))
    ```

# Design
A `Promise` is a datatype, which can be used for any component.
So a `component.Point3D` can be represented by `datatype.Promise`
A promise contains a single URI string.

A promise resolves to some IPC Arrow data (or an error, or _pending_).

The promise is resolved late, after primary caches, close to the UI/visualizer.

``` rs
/// The data of component. `ComponentResult` a better name?
enum ComponentResult<'data, T> {
    /// The entity doesn't have this component
    None,
    
    /// Wait for it - it is being loaded in the background
    Pending,
    
    /// Failed to load.
    Error(String),
    
    /// The data is decoded and ready.
    /// A slice into the secondary promise cache (if it was a promise)
    Data(&'data [T]),
}

impl PromiseResult {
    fn map(…) -> …
}
```

# MVP
- log huge files, index them after, then open the small index
- Shortcomings:
    - Some stalling when time-scrubbing
    - No web support
    - Local files only

## Steps
- Add a `PromiseCache` returning `ComponentResult<'a, T>`
- `entity_iterator` should either
    - return a `MaybePromise<T>` for each component (leaving it to the user to resolve)
    - or a `ComponentResult<'a, T>` for each component
- Put datatype-name in the meta-data of each `DataCell`
- Built-in resolver for `[file://…?bytes=…](file://)`
    - Immediate, fseek
    - IPC Arrow data at a byte offset, or ArrowMsg at offset + index in it
- `rerun index huge.rrd > indexed.rrd`
    - creates “indexed” version of rrd which replaces components with promises and puts the raw blobs elsewhere in the file
    - two files as alternative, but single file preferred
    - “self” uri, for referring to the same file
- gc `PromiseCache`

# Post-MVP
**Latency-aware**

- Start using in `ComponentResult` in visualizers
- make resolver async
- Some latency resolver strategy
    - experiment with simulated latency etc.

**Promise resolvers**

- Custom HTTP(S) resolver
- VRS resolver

**SDK-aware**

Each of these adds additional abilities:

- Auto-promsify sink in the SDK
- log promise components directly `rr.log("mypoints", rr.Promise(Position3D.name, uri))`
- Support promises for all archetypes
    - Rust: replace `Option<Vec<Position3D>>`  with `MaybePromise<Vec<Position3D>>`
    - Python: `isinstance`
    - C++: enhance or wrap `Collection` type

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promises: bigger-than-RAM #5247

Goals

Background

Small-index vs Big-index

Big index

Small index

Promises: a solution to small-index

User stories

Design

MVP

Steps

Post-MVP

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Promises: bigger-than-RAM #5247

Description

Goals

Background

Small-index vs Big-index

Big index

Small index

Promises: a solution to small-index

User stories

Design

MVP

Steps

Post-MVP

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions