ARROW-2066: [Python] Document using pyarrow with Azure Blob Store #1544

rjrussell77 · 2018-02-01T18:20:06Z

Original question:

Improvement story:

https://issues.apache.org/jira/browse/ARROW-2066

rjrussell77

@wesm Can you review this? (I had difficulty with getting the formatting down. Adding a sub-bullet list item caused the parent bullet to become italicized. Eventually omitted the sub-bullet.)

rjrussell77 · 2018-02-01T18:29:44Z

@xhochy Uwe - can you review?

xhochy · 2018-02-01T18:31:44Z

python/doc/source/parquet.rst

+
+   block_blob_service = BlockBlobService(account_name=account_name, account_key=account_key)
+   with tempfile.TemporaryFile() as fp:
+      block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=fp)


This should actually work without a temporary file, see the implementation in simplekv: https://github.com/mbr/simplekv/blob/master/simplekv/net/azurestore.py#L74

Ok - I'll take a look. Thanks.

rjrussell77 · 2018-02-22T07:22:39Z

python/doc/source/parquet.rst

+   byte_stream = io.BytesIO()
+   block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream)
+   pd = pq.read_table(source=byte_stream).to_pandas()
+   pd.head(10)


@xhochy Good feedback - I replaced the temp file buffer with BytesIO stream instead.

rjrussell77 · 2018-02-22T07:46:29Z

python/doc/source/parquet.rst

+      print("Error: {0}".format(err))
+   finally:
+      byte_stream.close()
+


Added try/except/finally block to ensure closure of the stream

Can you add this comment to the code? That will also be helpful for the reader later.

xhochy · 2018-02-22T15:41:06Z

python/doc/source/parquet.rst

+   block_blob_service = BlockBlobService(account_name=account_name, account_key=account_key)
+   try:
+      block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream)
+      pd = pq.read_table(source=byte_stream).to_pandas()


The result is typically written into a variable called df whereas pd is the abbreviation you use when you import pandas (import pandas as pd)

xhochy · 2018-02-22T15:41:28Z

python/doc/source/parquet.rst

+   try:
+      block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream)
+      pd = pq.read_table(source=byte_stream).to_pandas()
+      pd.head(10)


Better replace this with # Do work on DF …

How about, # Do work on df (lower case)?

xhochy · 2018-02-22T15:42:16Z

python/doc/source/parquet.rst

+      block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream)
+      pd = pq.read_table(source=byte_stream).to_pandas()
+      pd.head(10)
+   except Exception as err:


Please don't catch exceptions like this, just let it throw.

xhochy · 2018-02-22T15:42:40Z

python/doc/source/parquet.rst

+      print("Error: {0}".format(err))
+   finally:
+      byte_stream.close()
+


Can you add this comment to the code? That will also be helpful for the reader later.

…emove head() call and instead use comment to indicate generic fill-in code, add comment re: stream closure in finally block

rjrussell77 · 2018-02-22T17:06:45Z

python/doc/source/parquet.rst

+   finally:
+      # Add finally block to ensure closure of the stream
+      byte_stream.close()
+


@xhochy Ok, I've responded to your last set of feedback. How are we looking now?

xhochy

+1, thank you for writing this up. This will be very helpful for new users.

xhochy · 2018-02-23T12:39:51Z

@rjrussell77 do you have an Apache JIRA id so I could assign https://issues.apache.org/jira/browse/ARROW-2066 to you?

rjrussell77 · 2018-02-26T18:34:48Z

@xhochy Regarding your question about my Apache JIRA id - try this: rob_PTL

wesm · 2018-02-26T22:16:14Z

Thanks @rjrussell77 -- I added you as a contributor and assigned the issue to you

rjrussell77 added 20 commits January 31, 2018 14:16

ARROW-2066 Add documentation for Arrow/Azure/Parquet solution

eb643e4

Polish the formatting

6841116

Add helpful notes about Azure properties

5365a9c

Add a note about keys and add polish

5fbea89

Fix formatting

26a53e4

Refine indented bullet and fix title underline

7bab640

Fix unintended italics

718bd94

Change wording a bit

f130e04

Try to fix italics

83a38c4

remove inline edits

6fd9f70

Fix formatting

34c5a16

Fix formatting

599e04f

fix formatting

1815816

Fix formatting

a015deb

Fix formatting

051b91d

Use asterisks for list

803cbca

Try moving the bullet to remove italics

4c75824

fix

4770de1

fix

5d450fc

Add back original Notes bullets

654a6f9

rjrussell77 commented Feb 1, 2018

View reviewed changes

xhochy requested changes Feb 1, 2018

View reviewed changes

wesm changed the title ~~Arrow 2066 docs azure parquet~~ ARROW-2066: [Python] Document using pyarrow with Azure Blob Store Feb 5, 2018

Replace usage of tempfile buffer with BytesIO stream

36f7378

rjrussell77 commented Feb 22, 2018

View reviewed changes

rjrussell77 added 2 commits February 21, 2018 23:43

Add try/except/finally blocks to ensure closure of the byte stream

1fe9866

Clean up white space

f056888

rjrussell77 commented Feb 22, 2018

View reviewed changes

xhochy reviewed Feb 22, 2018

View reviewed changes

use more common 'df' instead of 'pd' for pandas dataframe variable, r…

a5addb0

…emove head() call and instead use comment to indicate generic fill-in code, add comment re: stream closure in finally block

rjrussell77 commented Feb 22, 2018

View reviewed changes

Add missing byte_stream declaration/assignment

0d3972c

xhochy approved these changes Feb 23, 2018

View reviewed changes

xhochy closed this in 3e3f7c2 Feb 23, 2018

ARROW-2066: [Python] Document using pyarrow with Azure Blob Store #1544

ARROW-2066: [Python] Document using pyarrow with Azure Blob Store #1544

Uh oh!

Conversation

rjrussell77 commented Feb 1, 2018

Uh oh!

rjrussell77 left a comment

Choose a reason for hiding this comment

Uh oh!

rjrussell77 commented Feb 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xhochy left a comment

Choose a reason for hiding this comment

Uh oh!

xhochy commented Feb 23, 2018

Uh oh!

rjrussell77 commented Feb 26, 2018

Uh oh!

wesm commented Feb 26, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants