# Raises if number of recordings != 1
recording : Recording = rr.data.load_recording("foo.rrd")
archive = rr.data.load_archive("foo.rrd")
recording = archive.recordings[0]
contents = {
"world/points/**": [rr.Image],
"metrics/**": [rr.Scalar]
}
contents = "world/**"
class Recording:
def view(self, index: str, contents: ContentLike) -> RecordingView: ...
class RecordingView:
def filter_range(self, min: float, max: float) -> RecordingView: ...
def filter_times(self, times: ArrayLike) -> RecordingView: ...
def filter_events_for(self, AnyColumn(...)) -> RecordingView: ...
def using_index_values(self, times: ArrayLike) -> RecordingView: ...
def latest_at(self) -> RecordingView: ...
def select(self, *columns: AnyColumn) -> Iterator[pa.RecordBatch]: ...
# Do we need this?
def select_all(): ...
# Alternative
expr = QueryExpression(...)
recording = load_recording(...)
recording.query(expr).select(...)
Details
Notes from exploration:
- EntityPathFilter expression is redundant with column-selection
- Different types of column filtering is very useful for exploration
- Select operators feel natural for incrementally filtering a dataset independently of the latest-at / range queries
- Would be nice to just chain these in, or re-use a selection for multiple queries.
- Implicit empty columns can be surprising. Some filter operations should probably return errors if the columns are not present.
- Ultimately need a way to access deterministic/named columns in the context of the resulting Table
- Allowing a user-set name in the
ComponentSelector
- Would be nice to group together timeline + range like we do some other places in rust
- POV component is confusing
- Really need a sampled multi-latest-at
- Could be more clearly expressed as something like:
let samples = dataset.logged_at(timeline, columns)
dataframe = dataset.latest_at(samples, columns)
This has the dual benefit that it allows users to provide their own samples, or to choose sample-times for multiple columns.
Proposals
Start with python refinement, and then back-propagate into rust if we like it.
Selections
The python Dataset object will internally track a set of columns that will be used for all queries along with an Arc<ChunkStore>.
Introduce new select_ variant APIs on the Dataset:
dataset.select_entities(expr: str) -> Dataset
- NOTE: only mutates component-columns; no-op on control/time columns
dataset.select_components(components: Sequence[ComponentLike]) -> Dataset
- NOTE: only mutates component-columns; no-op on control/time columns
dataset.select_columns(column_selectors: : Sequence[ColumnSelector]) -> Dataset
Each of these has the potential to strictly filter/mutate the active set of descriptors relative to the previous step. I.e. first selection is from the complete set, each incremental selection only selects from the remaining set.
LatestAtQuery and RangeQuery
Our TimeType ambiguity continues to torment us.
The most ergonomic is clearly an API that looks like:
LatestAtQuery(timeline: str, at: int | float)
RangeQuery(timeline: str, min: int | float, max: int | float)
The big challenge here is that sane-looking APIs are ambiguous without knowledge of the timeline.
Concretely:
LatestAtQuery(timeline, 2.0) needs to map to the TimeInt 2 if timeline is a Sequence, 2000000000 if timeline is Temporal and the user is thinking in seconds, and 2 if the timeline is temporal and the user is thinking in nanos.
TODO: Still not sure what the right answer is here.
If we follow precedent from TimeRangeBoundary this ends up looking something:
Choice A
latest_at = rr.LatestAt("frame", rr.TimeValue(seq=42))
latest_at = rr.LatestAt("time", rr.TimeValue(seconds=42.5))
range = rr.Range("frame, min=rr.TimeValue(seq=10), max=rr.TimeValue(seq=17))
Choice B, with some parameter-exploding could be simplified down to:
latest_at = rr.LatestAt("frame", seq=42)
latest_at = rr.LatestAt("time", seconds=42.5)
range = rr.Range("frame", min_seq=10, max_seq=17)
# Not sure if we let the users do stupid things
range = rr.Range("time", min_seconds=10.0, max_nanos=17000000000)
Choice C, diverging from what we do in TimeRangeBoundary:
latest_at = rr.dataframe.LatestAt.sequence("frame", 42)
latest_at = rr.dataframe.LatestAt.sequence("time", 42.5)
range = rr.dataframe.Range.sequence("frame", min=10, max=17)
range = rr.dataframe.Range.seconds("time", min=10.0, max=17)
Queries
Since the selection is now carried with the Dataset, you can now execute a query directly without providing columns.
dataset.latest_at_query(latest_at: LatestAt)
dataset.range_query(range: Range, pov: ComponentSelector)
This means you can write a query like:
opf = rr.dataframe.load_recording("datasets/open_photogrammetry_format.rrd")
range = rr.dataframe.Range.sequence("image", min=10, max=17)
# This part is still annoying
pov = rr.dataframe.ColumnSelector("world/cameras/rgb", rr.components.Blob)
df = opf.select_entities("world/cameras/**").range_query(range, pov)
Column Naming
Selectors/Descriptors will be given a name.
This name will default to one of:
- Control:RowId
- Timeline:<timeline_name>
- <entity_path>:<component_short_name>
When specifying a component selector, users have the option to call .rename_as() to change the name of the component.
These names are also valid INPUT to a ColumnSelector.
For example:
image = rr.dataframe.ColumnSelector("/world/camera/rgb:Blob").rename_as("image")
df = opf.select_columns(image).latest_at_query(query)
df = opf.select_columns(image).range_query(query, pov=image)
Updated Proposal:
Improved concept definitions
.filter(TimeRange(start=..., end=...))or maybe.filter_range(start=..., end=...))view.using_index_values(self, values: ArrayLike).select(Timeline(), "Translation3D")Python APIs
Original Proposal (archive):
Details
Notes from exploration:
ComponentSelectorProposals
Start with python refinement, and then back-propagate into rust if we like it.
Selections
The python
Datasetobject will internally track a set of columns that will be used for all queries along with anArc<ChunkStore>.Introduce new
select_variant APIs on theDataset:dataset.select_entities(expr: str) -> Datasetdataset.select_components(components: Sequence[ComponentLike]) -> Datasetdataset.select_columns(column_selectors: : Sequence[ColumnSelector]) -> DatasetEach of these has the potential to strictly filter/mutate the active set of descriptors relative to the previous step. I.e. first selection is from the complete set, each incremental selection only selects from the remaining set.
LatestAtQuery and RangeQuery
Our TimeType ambiguity continues to torment us.
The most ergonomic is clearly an API that looks like:
LatestAtQuery(timeline: str, at: int | float)RangeQuery(timeline: str, min: int | float, max: int | float)The big challenge here is that sane-looking APIs are ambiguous without knowledge of the timeline.
Concretely:
LatestAtQuery(timeline, 2.0)needs to map to the TimeInt 2 if timeline is a Sequence, 2000000000 if timeline is Temporal and the user is thinking in seconds, and 2 if the timeline is temporal and the user is thinking in nanos.TODO: Still not sure what the right answer is here.
If we follow precedent from
TimeRangeBoundarythis ends up looking something:Choice A
Choice B, with some parameter-exploding could be simplified down to:
Choice C, diverging from what we do in TimeRangeBoundary:
Queries
Since the selection is now carried with the
Dataset, you can now execute a query directly without providing columns.dataset.latest_at_query(latest_at: LatestAt)dataset.range_query(range: Range, pov: ComponentSelector)This means you can write a query like:
Column Naming
Selectors/Descriptors will be given a name.
This name will default to one of:
When specifying a component selector, users have the option to call
.rename_as()to change the name of the component.These names are also valid INPUT to a ColumnSelector.
For example: