Asynchron upload for servlet 3.1#235
Conversation
Add infrastructure to test war plugin with servlet 3.1 Implement the asynchron upload.
There was a problem hiding this comment.
Isn't this a must-have? (or the Client can send data until the server OOMEs)
There was a problem hiding this comment.
This is sadly the current situation. I just factor this code out the general servlet to be able to overwrite it with async IO for the servlet 3.1.
There was a problem hiding this comment.
So there is no way of just creating an Enumerator that enumerates a chunk at a time?
There was a problem hiding this comment.
There was an proposal for this in the past: #169
@dlecan can you comment on this?
My personal idea would be to cut manually the input stream into chunks, pushing them one by one to the body parser (it means we must block until the body parse reach its next state before pushing the next chunk). There would involve blocking, but would still be better than the current situation IMHO.
The blocking could be mitigated by using a dedicated thread pool (or the one from the servlet?)
There was a problem hiding this comment.
Is there a reason why Enumerator.fromStream isn't used?
There was a problem hiding this comment.
For your information, James Roper saw some other problems and opened a ticket for it: #223
There was a problem hiding this comment.
Thanks for the pointer to the issue, sure looks like you got things covered!
Perhaps refer to the issue in the todo?
There was a problem hiding this comment.
I don't remember why I made this change a year ago. I created #141 for that, but explanations are definitely too short :)
Read #169 also. Another user tried to provide a better upload.
After this change, no one has reported any upload hangs. Neither OOME. I think that was a good change. But for an old version of Play framework now.
Maybe Enumerator#fromStream(...) could perform better with Play 2.2.
But I have to test case to reproduce upload hangs on slow connections and so I'm not able to evaluation the quality of a such modification.
if the reads block (not using fromStream) then it's the Servlet
Thread which is blocked, and with fromStream it's a thread on the supplied
ExecutionContext?
Yes, you're right.
Asynchronous upload for servlet 3.1
There was a problem hiding this comment.
An improvement could be made here: this method has the potential to cause out of memory errors. If you read up to a certain size, you could feed this into the iteratee, and then read more when the iteratee is available, and so on.
There was a problem hiding this comment.
I guess one of the concerns raised was that then blocking would occur on a non-servlet thread.
There was a problem hiding this comment.
This is only the method definition.
The method call is actually in the pureFlatFold:
https://github.com/dlecan/play2-war-plugin/pull/235/files#diff-6631fbdeb8bfe668903b928d15152696R100
@viktorklang The chunk is actually read with the play internal Thread pool. I did that on purpose, because:
- the servlet 3.1 async IO is non-blocking. With servlet <= 3.0, it would be an error (like described by @jroper in Don't write to servlet output stream in a trampoline context #223)
- a slow consumer can slow down the browser (back pressure)
If I read the chunk with the servlet thread, I would accumulate the pureFlatFold function, leading to a stack overflow.
@jroper If I understand you well, the problem is that the servlet container could in theory deliver a very big chunk, leading to OOME. You'd suggest to control the size of the chunk and limit their size to a max?
I checked this PR with tomcat 8.0.3 and jetty 9.1, with big files (a few GB). These 2 containers made the job to cut the input into chunks. Also I do see any issue in practice, but it is true that we rely on the container implementation to limit the size of the chunks.
There was a problem hiding this comment.
@yanns If the reading is done on a Play thread pool I'd just read one chunk and feed it into the Iteratee, and then read the next chunk when the iteratee is ready for more input.
There was a problem hiding this comment.
@viktorklang I think this is what I am doing.
val chunk = consumeBody(servletInputStream, buffer)reads one chunk
case Step.Cont(k) => k(El(chunk))feeds the chunk into the Iteratee
iteratee = iteratee.pureFlatFoldis called only when the Iteratee is ready for more input
Please correct me if I'm wrong.
There was a problem hiding this comment.
yes, I'm relying on the servlet container to cut the stream into chunks
buffer is really just an intern variable for consumeBody.
I put it extern to avoid creating a new Array[Byte] in consumeBody.
My tests by uploading big files show less GC.
But it is just a micro-optimization. I can make buffer local to consumeBody.
consumeBody reads for the input stream as long as data is available body.isReady.
There was a problem hiding this comment.
Here's what I'm thinking: either you can reuse the same buffer-instance between chunks (low GC overhead), or you need to create new byte-arrays for all reads, in which case it doesn't make that much of a difference if you create 10 8kb byte arrays or 1 80kb. Thoughts?
There was a problem hiding this comment.
@viktorklang I think I understand what you mean
Instead of
val buffer = new Array[Byte](1024 * 8)
val output = new java.io.ByteArrayOutputStream()
var continue = body.isReady
while (continue) {
val length = body.read(buffer)
if (length == -1) {
continue = false
} else {
output.write(buffer, 0, length)
continue = body.isReady
}
}
output.toByteArrayyou think of reading the whole body.read in a bigger buffer, which size == the size of the current chunk?
There was a problem hiding this comment.
@yanns perhaps just set a configurable (external config) chunk size?
Add infrastructure to test war plugin with servlet 3.1
Implement the asynchron upload.