Skip to content

Conversation

@8ggmaker
Copy link
Contributor

@8ggmaker 8ggmaker commented Aug 20, 2019

Changes proposed in this pull request

  • Avoid extra memory allocation when doing chunked upload
    ChunkedUploadProvider trys to help with large file uploading, it divides the large stream to chunk-size stream and upload each one by one. The current implementation allocate maxChunksize of byte[] buffer each time:
  var readBuffer = new byte[this.maxChunkSize]; // extra allocation

and an extra stream copy:

    //extra copy 
    await this.uploadStream.ReadAsync(readBuffer, 0, request.RangeLength).ConfigureAwait(false);

       while (true)
       {
             using (var requestBodyStream = new MemoryStream(request.RangeLength))
             {
                 await requestBodyStream.WriteAsync(readBuffer, 0, request.RangeLength).ConfigureAwait(false);
                 requestBodyStream.Seek(0, SeekOrigin.Begin);
                 ...
             }
       }

If there is a total 500MiB size file, open a FileStream with a 10MiB max chunk size, it will allocate a 10 * 1024 * 1024 size buffer 50 times (extra 500MiB memory allocation), and do 100 times buffer operations (50 read and 50 write).

Because the upload stream supports seek and read operation, this pull request provide a ReadOnlySubStream to represent a chunk of a upload stream to save memory when dealing with large file and remove the extra copy operations

@8ggmaker 8ggmaker changed the title avoid extra memory allocation when do chunked upload Avoid extra memory allocation when do chunked upload Aug 20, 2019
@8ggmaker 8ggmaker changed the title Avoid extra memory allocation when do chunked upload Avoid extra memory allocation when doing chunked upload for large file Aug 20, 2019
@8ggmaker
Copy link
Contributor Author

@MIchaelMainer @peombwa

@8ggmaker
Copy link
Contributor Author

@MIchaelMainer @darrelmiller anyone can help review this?

@darrelmiller
Copy link
Contributor

Hey @zsybupt, sorry for the delay. Unfortunately we are all tied up with other priorities at the moment. I did take a quick review of the PR and there are a number of things I am unsure about. I do acknowledge there is room for improvement in this code, however I'm not immediately seeing where that readbuffer is getting reallocated for each chunk. Also, creating an entire new Stream subclass for this scenario seems like a lot of risk unless that class is copied from some existing project. My experience has been getting stream subclasses right is hard.

@8ggmaker
Copy link
Contributor Author

8ggmaker commented Aug 28, 2019

@darrelmiller Thanks for your quick reply.

  1. I am sorry that it is not the buffer, it is the memorystream, please check this line:MemoryStream Allocated, a new MemoryStream is allocated for each chunk request.
  2. This SubStream class is port from System.IO.Compression, they have a very similar scenario of dealing with ZipAchrive:System.IO.Compression, and I am using this in my production now.

For a 170MiB file

Before

defaultmemorystream

After

readonlysubstream

@darrelmiller
Copy link
Contributor

@zsybupt Ahh, that makes more sense and makes me feel much more comfortable about the new stream class. I will try and review this later today. /cc @MIchaelMainer

@8ggmaker
Copy link
Contributor Author

Sorry that my PC language is Chinese, the original ChunkedUploadProvider consume about 113.4 MiB memory, with the ReadOnlySubStream the memory reduce to 13.9 MiB

@8ggmaker
Copy link
Contributor Author

Any updates of this pr ?

@andrueastman
Copy link
Contributor

@zsybupt
Thanks a lot for making this PR. It does indeed improve things by avoiding extra memory allocation on upload.

If it is okay, I will be merging this PR into the branch task/uploadTask so as to enable us to do more tests and build upon this work to improve the UploadProvider so as to provide more usability such as giving progress callbacks during upload.

@8ggmaker
Copy link
Contributor Author

8ggmaker commented Nov 5, 2019

@andrueastman ok,Thanks

@andrueastman andrueastman changed the base branch from dev to task/UploadTask November 6, 2019 04:23
@andrueastman andrueastman merged commit abc3ee8 into microsoftgraph:task/UploadTask Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants