Parallel.retrieve ensures that order is maintained when Parallel.__call__ is called on an iterable. Rather than returning a list of results in the same order as the input, I propose a generator-based version of Parallel.__call__ that yields output as it is ready without ensuring order.
>>> workers = Parallel(n_jobs=8)
>>> workers(delayed(sqrt)(i) for i in range(10)) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> workers.async(delayed(square)(i) for i in range(10)) # <generator object _____ at 0x______>
Thoughts? Issues?