Skip to content

colexec: add a way to propagate metadata in a streaming fashion #55758

@yuzefovich

Description

@yuzefovich

While working on #55713, I realized that the current way of propagating the metadata in the vectorized engine is incompatible with some things we would like to do. As a reminder, we currently buffer up all metadata in MetadataSources and return it when draining the whole flow, but there are cases when we would like to propagate the metadata once it is created:

  1. we emit the metadata about the number of rows read from rowexec.TableReaders which is then used in combination with the estimated number of rows to be scanned (which is obtained during planning) in order provide the progress estimate for a query (shown on SHOW QUERIES in phase column)
  2. the table statistics collection emit progress metadata as well, and although we don't have the vectorized table statistics collection yet, it'd be nice to support the progress update when we do.

The current thinking is that we probably want to implement a pull-based model that is able to return the buffered meta since the previous call (something like DrainBufferedMeta which - unlike DrainMeta - doesn't have an assumption of being called when the flow is shutdown) and then have the root materializer periodically poll its metadata sources.

Jira issue: CRDB-3634

Metadata

Metadata

Assignees

Labels

A-sql-executionRelating to SQL execution.A-sql-vecSQL vectorized engineC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)O-supportWould prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docsP-3Issues/test failures with no fix SLAT-sql-queriesSQL Queries Teamtarget-release-26.2.0v26.2.0-prerelease

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions