Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 30, 2022

Closes apache/datafusion#3464

This series explains various aspects of parquet decoding and filter pushdown, as implemented in arrow-rs and DataFusion, though it is much more general than that

NOTE

This was published initially on the InfluxData blog first for various reasons. https://www.influxdata.com/blog/querying-parquet-millisecond-latency/

See discussion on mailing list about posting on arrow blog as well https://lists.apache.org/thread/l377q5f20kyltb37m345p287kpo22qb6

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@alamb alamb marked this pull request as draft November 30, 2022 21:37
To give a sense of what is possible, the following picture shows a timeline of fetching the footer metadata from remote files, using that metadata to determine what DataPages to read, and then fetching data and decoding simultaneously. This process often must be done for more than one file at a time in order to match network latency, bandwidth and available CPU.


```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself that I hope to clean up this diagram

@alamb alamb marked this pull request as ready for review December 18, 2022 11:46
@alamb
Copy link
Contributor Author

alamb commented Dec 26, 2022

Per the mailing list discussion https://lists.apache.org/thread/l377q5f20kyltb37m345p287kpo22qb6 I plan to publish this later today or tomorrow

@alamb alamb merged commit e64c7b4 into apache:master Dec 26, 2022
@alamb alamb deleted the alamb/parquet_filter_pushdown branch December 26, 2022 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write a blog about parquet predicate pushdown

1 participant