Skip to content

Tracking issue: end-to-end batching #1619

@teh-cmc

Description

@teh-cmc
  • Will create individual issues as the need arises.
  • Most likely an evolving document.

RFC


  • Move DataStore sanity checks and formatting tools to separate files
    store.rs is supposed to be the place where one can get an overview of all the datastructures involved in the store, except it has slowly become a mess over time and is now pretty much unreadable.

  • Implement all the needed tests & benchmarks
    We need to be able to check for regressions at every step, so make sure we have all the tests and benchmarks we need for that.
    We should already be 95% of the way there at this point.

  • Replace MsgBundle & ComponentBundle with the new types (DataCell, DataRow, DataTable, EventId, BatchId...)
    No actual batching features nor any kind of behavior changes of any sort: just define the new types and use them everywhere.

  • Pass entity path as a column rather than as metadata
    Replace the current entity_path that is passed in the metadata map with an actual column instead. This will also requires us to make EntityPath a proper arrow datatype (..datatype, not component!!).

  • Make sure implicit instance counts have been wiped everywhere #1892
    Issue created; not blocking for batching.

  • Eliminate legacy splats #1893
    Issue created; not blocking for batching.

  • Get rid of component buckets altogether
    Update the store implementation to remove component tables, remove the get APIs, introduce slicing on the write path, etc. Still no batching in sight!

  • SDK-side log batching #1880

  • Implement the coalescing/accumulation logic in the SDK
    Add the required logic/thread/timers/whatever-else in the SDKs to accumulate data and just send it all as many LogMsgs (i.e. no batching yet).

  • Implement full-on batching
    End-to-end: transport, storage, the whele shebang.

  • Sort the batch before sending ((event_id, entity_path))
    Keep that in its own PR to keep track of the benchmarks.

  • Implement new GC
    The complete implementation; should close all existing GC issues.

  • Dump directly from the store into an rrd file
    No rebatching yet, just dump every event in its own LogMsg.

  • Remove LogMsgs from LogDb
    We shouldn't need to keep track of events outside the store past this point: clean it all up.
    Reminder: the timeline widget keeps track of timepoints directly, not events.

  • Rebatch aggressively while dumping the store to a stream of LogMsg #1894
    Issue created; not blocking for batching.

  • Make log_time column implicit and potentially introduce ingest_time #1891
    Issue created; not blocking for batching.

  • A Component's DataType should embed its metadata #1696
    Issue created; not blocking for batching.

  • re_datastore: replace anyhow::Error usage with a thiserror derived Error type #527

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions