Skip to content
This repository was archived by the owner on Apr 4, 2026. It is now read-only.

Asynchron upload for servlet 3.1#235

Merged
dlecan merged 2 commits into
play2war:developfrom
yanns:servlet31
Mar 20, 2014
Merged

Asynchron upload for servlet 3.1#235
dlecan merged 2 commits into
play2war:developfrom
yanns:servlet31

Conversation

@yanns

@yanns yanns commented Mar 20, 2014

Copy link
Copy Markdown
Member

Add infrastructure to test war plugin with servlet 3.1
Implement the asynchron upload.

Add infrastructure to test war plugin with servlet 3.1
Implement the asynchron upload.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a must-have? (or the Client can send data until the server OOMEs)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sadly the current situation. I just factor this code out the general servlet to be able to overwrite it with async IO for the servlet 3.1.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is no way of just creating an Enumerator that enumerates a chunk at a time?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an proposal for this in the past: #169
@dlecan can you comment on this?

My personal idea would be to cut manually the input stream into chunks, pushing them one by one to the body parser (it means we must block until the body parse reach its next state before pushing the next chunk). There would involve blocking, but would still be better than the current situation IMHO.
The blocking could be mitigated by using a dedicated thread pool (or the one from the servlet?)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why Enumerator.fromStream isn't used?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your information, James Roper saw some other problems and opened a ticket for it: #223

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer to the issue, sure looks like you got things covered!
Perhaps refer to the issue in the todo?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why I made this change a year ago. I created #141 for that, but explanations are definitely too short :)
Read #169 also. Another user tried to provide a better upload.

After this change, no one has reported any upload hangs. Neither OOME. I think that was a good change. But for an old version of Play framework now.
Maybe Enumerator#fromStream(...) could perform better with Play 2.2.
But I have to test case to reproduce upload hangs on slow connections and so I'm not able to evaluation the quality of a such modification.

if the reads block (not using fromStream) then it's the Servlet
Thread which is blocked, and with fromStream it's a thread on the supplied
ExecutionContext?

Yes, you're right.

dlecan added a commit that referenced this pull request Mar 20, 2014
Asynchronous upload for servlet 3.1
@dlecan dlecan merged commit 8dc66bc into play2war:develop Mar 20, 2014

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An improvement could be made here: this method has the potential to cause out of memory errors. If you read up to a certain size, you could feed this into the iteratee, and then read more when the iteratee is available, and so on.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one of the concerns raised was that then blocking would occur on a non-servlet thread.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only the method definition.
The method call is actually in the pureFlatFold:
https://github.com/dlecan/play2-war-plugin/pull/235/files#diff-6631fbdeb8bfe668903b928d15152696R100

@viktorklang The chunk is actually read with the play internal Thread pool. I did that on purpose, because:

@jroper If I understand you well, the problem is that the servlet container could in theory deliver a very big chunk, leading to OOME. You'd suggest to control the size of the chunk and limit their size to a max?
I checked this PR with tomcat 8.0.3 and jetty 9.1, with big files (a few GB). These 2 containers made the job to cut the input into chunks. Also I do see any issue in practice, but it is true that we rely on the container implementation to limit the size of the chunks.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanns If the reading is done on a Play thread pool I'd just read one chunk and feed it into the Iteratee, and then read the next chunk when the iteratee is ready for more input.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktorklang I think this is what I am doing.

val chunk = consumeBody(servletInputStream, buffer)

reads one chunk

case Step.Cont(k) => k(El(chunk))

feeds the chunk into the Iteratee

iteratee = iteratee.pureFlatFold

is called only when the Iteratee is ready for more input

Please correct me if I'm wrong.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm relying on the servlet container to cut the stream into chunks
buffer is really just an intern variable for consumeBody.
I put it extern to avoid creating a new Array[Byte] in consumeBody.
My tests by uploading big files show less GC.

But it is just a micro-optimization. I can make buffer local to consumeBody.
consumeBody reads for the input stream as long as data is available body.isReady.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I'm thinking: either you can reuse the same buffer-instance between chunks (low GC overhead), or you need to create new byte-arrays for all reads, in which case it doesn't make that much of a difference if you create 10 8kb byte arrays or 1 80kb. Thoughts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktorklang I think I understand what you mean
Instead of

val buffer = new Array[Byte](1024 * 8)
val output = new java.io.ByteArrayOutputStream()
var continue = body.isReady
while (continue) {
  val length = body.read(buffer)
  if (length == -1) {
    continue = false
  } else {
    output.write(buffer, 0, length)
    continue = body.isReady
  }
}
output.toByteArray

you think of reading the whole body.read in a bigger buffer, which size == the size of the current chunk?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanns perhaps just set a configurable (external config) chunk size?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktorklang Yes definitely a good idea.

@dlecan dlecan added this to the v1.2 milestone Jun 11, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants