Skip to content

Conversation

@rjrussell77
Copy link
Contributor

Original question:

#1510

Improvement story:

https://issues.apache.org/jira/browse/ARROW-2066

Copy link
Contributor Author

@rjrussell77 rjrussell77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm Can you review this? (I had difficulty with getting the formatting down. Adding a sub-bullet list item caused the parent bullet to become italicized. Eventually omitted the sub-bullet.)

@rjrussell77
Copy link
Contributor Author

@xhochy Uwe - can you review?

block_blob_service = BlockBlobService(account_name=account_name, account_key=account_key)
with tempfile.TemporaryFile() as fp:
block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=fp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually work without a temporary file, see the implementation in simplekv: https://github.com/mbr/simplekv/blob/master/simplekv/net/azurestore.py#L74

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - I'll take a look. Thanks.

@wesm wesm changed the title Arrow 2066 docs azure parquet ARROW-2066: [Python] Document using pyarrow with Azure Blob Store Feb 5, 2018
byte_stream = io.BytesIO()
block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream)
pd = pq.read_table(source=byte_stream).to_pandas()
pd.head(10)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xhochy Good feedback - I replaced the temp file buffer with BytesIO stream instead.

print("Error: {0}".format(err))
finally:
byte_stream.close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added try/except/finally block to ensure closure of the stream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this comment to the code? That will also be helpful for the reader later.

block_blob_service = BlockBlobService(account_name=account_name, account_key=account_key)
try:
block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream)
pd = pq.read_table(source=byte_stream).to_pandas()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is typically written into a variable called df whereas pd is the abbreviation you use when you import pandas (import pandas as pd)

try:
block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream)
pd = pq.read_table(source=byte_stream).to_pandas()
pd.head(10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better replace this with # Do work on DF …

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about, # Do work on df (lower case)?

block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream)
pd = pq.read_table(source=byte_stream).to_pandas()
pd.head(10)
except Exception as err:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't catch exceptions like this, just let it throw.

print("Error: {0}".format(err))
finally:
byte_stream.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this comment to the code? That will also be helpful for the reader later.

…emove head() call and instead use comment to indicate generic fill-in code, add comment re: stream closure in finally block
finally:
# Add finally block to ensure closure of the stream
byte_stream.close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xhochy Ok, I've responded to your last set of feedback. How are we looking now?

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you for writing this up. This will be very helpful for new users.

@xhochy xhochy closed this in 3e3f7c2 Feb 23, 2018
@xhochy
Copy link
Member

xhochy commented Feb 23, 2018

@rjrussell77 do you have an Apache JIRA id so I could assign https://issues.apache.org/jira/browse/ARROW-2066 to you?

@rjrussell77
Copy link
Contributor Author

@xhochy Regarding your question about my Apache JIRA id - try this: rob_PTL

@wesm
Copy link
Member

wesm commented Feb 26, 2018

Thanks @rjrussell77 -- I added you as a contributor and assigned the issue to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants