Goals
Support some forms of “bigger-than-RAM” recordings, as soon as possible
Background
Small-index vs Big-index
Table index: row ids and time points.
Does the table index fit in RAM?
Hypothesis: most “bigger-than-RAM” problems have smallish indices.
Big index
Example: 100GB of scalar plots
We need a hierarchical index file on disk, with seeking, and have store-subscribers that are aware of this, etc. Difficult!
Small index
Example: thousands of uncompressed 4k images, or big point clouds, meshes, …
We “just” need to figure out how load blobs from disk on-demand. Easier!
Promises: a solution to small-index
We replace large blobs with promises, that refer to the external data.
A promise could be a file path with optional byte offset, a URL, …
When a query results in a Promise, we (try) to resolve it.
Example: we go through a huge MCAP file and log it to Rerun, but replace big blobs by a Promise referring to a byte-offset in the MCAP.
User stories
- Logging a file reference
rr.log(”image”, rr.Image(data=rr.Promise.file_path(”foo.jpg”)))
- VRS
file://recording.vrs?stream=video&time=42
- Log a video file
for i, frame in enumerate(video):
rr.set_time_point("frame", i)
rr.log(”video”, rr.Image(data=rr.Promise.file_path(f”foo.mp4?frame={i}”)))
Design
A Promise is a datatype, which can be used for any component.
So a component.Point3D can be represented by datatype.Promise
A promise contains a single URI string.
A promise resolves to some IPC Arrow data (or an error, or pending).
The promise is resolved late, after primary caches, close to the UI/visualizer.
/// The data of component. `ComponentResult` a better name?
enum ComponentResult<'data, T> {
/// The entity doesn't have this component
None,
/// Wait for it - it is being loaded in the background
Pending,
/// Failed to load.
Error(String),
/// The data is decoded and ready.
/// A slice into the secondary promise cache (if it was a promise)
Data(&'data [T]),
}
impl PromiseResult {
fn map(…) -> …
}
MVP
- log huge files, index them after, then open the small index
- Shortcomings:
- Some stalling when time-scrubbing
- No web support
- Local files only
Steps
- Add a
PromiseCache returning ComponentResult<'a, T>
entity_iterator should either
- return a
MaybePromise<T> for each component (leaving it to the user to resolve)
- or a
ComponentResult<'a, T> for each component
- Put datatype-name in the meta-data of each
DataCell
- Built-in resolver for
[file://…?bytes=…](file://)
- Immediate, fseek
- IPC Arrow data at a byte offset, or ArrowMsg at offset + index in it
rerun index huge.rrd > indexed.rrd
- creates “indexed” version of rrd which replaces components with promises and puts the raw blobs elsewhere in the file
- two files as alternative, but single file preferred
- “self” uri, for referring to the same file
- gc
PromiseCache
Post-MVP
Latency-aware
- Start using in
ComponentResult in visualizers
- make resolver async
- Some latency resolver strategy
- experiment with simulated latency etc.
Promise resolvers
- Custom HTTP(S) resolver
- VRS resolver
SDK-aware
Each of these adds additional abilities:
- Auto-promsify sink in the SDK
- log promise components directly
rr.log("mypoints", rr.Promise(Position3D.name, uri))
- Support promises for all archetypes
- Rust: replace
Option<Vec<Position3D>> with MaybePromise<Vec<Position3D>>
- Python:
isinstance
- C++: enhance or wrap
Collection type
Goals
Support some forms of “bigger-than-RAM” recordings, as soon as possible
Background
Small-index vs Big-index
Table index: row ids and time points.
Does the table index fit in RAM?
Hypothesis: most “bigger-than-RAM” problems have smallish indices.
Big index
Example: 100GB of scalar plots
We need a hierarchical index file on disk, with seeking, and have store-subscribers that are aware of this, etc. Difficult!
Small index
Example: thousands of uncompressed 4k images, or big point clouds, meshes, …
We “just” need to figure out how load blobs from disk on-demand. Easier!
Promises: a solution to small-index
We replace large blobs with promises, that refer to the external data.
A promise could be a file path with optional byte offset, a URL, …
When a query results in a Promise, we (try) to resolve it.
Example: we go through a huge MCAP file and log it to Rerun, but replace big blobs by a Promise referring to a byte-offset in the MCAP.
User stories
rr.log(”image”, rr.Image(data=rr.Promise.file_path(”foo.jpg”)))file://recording.vrs?stream=video&time=42Design
A
Promiseis a datatype, which can be used for any component.So a
component.Point3Dcan be represented bydatatype.PromiseA promise contains a single URI string.
A promise resolves to some IPC Arrow data (or an error, or pending).
The promise is resolved late, after primary caches, close to the UI/visualizer.
MVP
Steps
PromiseCachereturningComponentResult<'a, T>entity_iteratorshould eitherMaybePromise<T>for each component (leaving it to the user to resolve)ComponentResult<'a, T>for each componentDataCell[file://…?bytes=…](file://)rerun index huge.rrd > indexed.rrdPromiseCachePost-MVP
Latency-aware
ComponentResultin visualizersPromise resolvers
SDK-aware
Each of these adds additional abilities:
rr.log("mypoints", rr.Promise(Position3D.name, uri))Option<Vec<Position3D>>withMaybePromise<Vec<Position3D>>isinstanceCollectiontype