-
Notifications
You must be signed in to change notification settings - Fork 122
[WEBSITE]: Querying Parquet with Millisecond Latency #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format? See also: |
| To give a sense of what is possible, the following picture shows a timeline of fetching the footer metadata from remote files, using that metadata to determine what DataPages to read, and then fetching data and decoding simultaneously. This process often must be done for more than one file at a time in order to match network latency, bandwidth and available CPU. | ||
|
|
||
|
|
||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself that I hope to clean up this diagram
|
Per the mailing list discussion https://lists.apache.org/thread/l377q5f20kyltb37m345p287kpo22qb6 I plan to publish this later today or tomorrow |
Closes apache/datafusion#3464
This series explains various aspects of parquet decoding and filter pushdown, as implemented in arrow-rs and DataFusion, though it is much more general than that
NOTE
This was published initially on the InfluxData blog first for various reasons. https://www.influxdata.com/blog/querying-parquet-millisecond-latency/
See discussion on mailing list about posting on arrow blog as well https://lists.apache.org/thread/l377q5f20kyltb37m345p287kpo22qb6