[WEBSITE]: Querying Parquet with Millisecond Latency #280

alamb · 2022-11-30T21:37:00Z

This series explains various aspects of parquet decoding and filter pushdown, as implemented in arrow-rs and DataFusion, though it is much more general than that

NOTE

This was published initially on the InfluxData blog first for various reasons. https://www.influxdata.com/blog/querying-parquet-millisecond-latency/

See discussion on mailing list about posting on arrow blog as well https://lists.apache.org/thread/l377q5f20kyltb37m345p287kpo22qb6

github-actions · 2022-11-30T21:37:13Z

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

alamb · 2022-11-30T21:54:57Z

_posts/2022-11-30-querying-parquet-with-millisecond-latency-part-3.md

+To give a sense of what is possible, the following picture shows a timeline of fetching the footer metadata from remote files, using that metadata to determine what DataPages to read, and then fetching data and decoding simultaneously. This process often must be done for more than one file at a time in order to match network latency, bandwidth and available CPU.
+
+
+```


Note to myself that I hope to clean up this diagram

…r_pushdown

alamb · 2022-12-26T18:07:19Z

Per the mailing list discussion https://lists.apache.org/thread/l377q5f20kyltb37m345p287kpo22qb6 I plan to publish this later today or tomorrow

alamb added 2 commits November 30, 2022 16:22

[WEBSITE]: Querying Parquet with Millisecond Latency

6a41ca9

Initial content

353f5bf

alamb marked this pull request as draft November 30, 2022 21:37

alamb added 2 commits November 30, 2022 16:49

Add transition paragraphs between parts

6d8db18

nits

c844e10

alamb mentioned this pull request Nov 30, 2022

Write a blog about parquet predicate pushdown apache/datafusion#3464

Closed

alamb commented Nov 30, 2022

View reviewed changes

alamb added 4 commits November 30, 2022 16:58

formatting tweaks

3234e8d

Update to single page, as published on influx blog

9707130

Merge remote-tracking branch 'origin/master' into alamb/parquet_filte…

f53c5f2

…r_pushdown

copy edits / formatting

c4089f4

alamb marked this pull request as ready for review December 18, 2022 11:46

alamb added 3 commits December 26, 2022 11:54

Merge remote-tracking branch 'origin/master' into alamb/parquet_filte…

c9da097

…r_pushdown

Update date

9a14620

whitespace engineering

f822400

alamb merged commit e64c7b4 into apache:master Dec 26, 2022

alamb deleted the alamb/parquet_filter_pushdown branch December 26, 2022 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WEBSITE]: Querying Parquet with Millisecond Latency #280

[WEBSITE]: Querying Parquet with Millisecond Latency #280

Uh oh!

alamb commented Nov 30, 2022 •

edited

Loading

Uh oh!

github-actions bot commented Nov 30, 2022

Uh oh!

alamb Nov 30, 2022

Uh oh!

alamb commented Dec 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		To give a sense of what is possible, the following picture shows a timeline of fetching the footer metadata from remote files, using that metadata to determine what DataPages to read, and then fetching data and decoding simultaneously. This process often must be done for more than one file at a time in order to match network latency, bandwidth and available CPU.


		```

[WEBSITE]: Querying Parquet with Millisecond Latency #280

[WEBSITE]: Querying Parquet with Millisecond Latency #280

Uh oh!

Conversation

alamb commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NOTE

Uh oh!

github-actions bot commented Nov 30, 2022

Uh oh!

alamb Nov 30, 2022

Choose a reason for hiding this comment

Uh oh!

alamb commented Dec 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alamb commented Nov 30, 2022 •

edited

Loading