-
Notifications
You must be signed in to change notification settings - Fork 464
Description
I've filed several issues here and I saw that 1.31.0 is the first release that contained all the fixes I needed. We finally managed to upgrade from 1.27.0 to 1.31.0 (Apache v2) and get rid of all deprecated usage.
However, after making a release of our product, we suddenly got numerous reports of slow network operations (slow write at least). Often slow to the point of hanging or taking a couple hours. At first I thought these were isolated issues in their environments, but we soon realized this is universal and consistent. We observed the degree of throughput drop is already significant on fast network at the office (for example, 9s vs 67s). The symptom gets dramatically worse when you are on not-so-fast network or an operation takes longer to complete (and I will explain why later). For example, in my own testing at home, the same task takes ~45s with 1.27.0, but it hangs indefinitely and times out after 17 minutes with 1.31.0.
Our team members individually verified the performance drop. We were able to track down the cause, and downgrading the library to 1.27.0 resolves the issue. For now, we decided to go back to 1.27.0. We don't know who's causing this issue at which level.
However, we cannot hold off upgrading the library forever. So, this issue must be cracked down eventually for us to move forward.
Here's one interesting observation from my own experiment, which I hope will shed some light:
When uploading ~40MB data in a fresh new single HTTP request, the throughput is always steadily >1MB/s with 1.27.0. The whole operation takes ~30 seconds or more. But with 1.31.0, the throughput starts with lower, say ~800KB/s, and it gradually wanes over time to 600KB/s, 400KB/s, ..., to absolute 0. No activity in the end. After 15 minutes or so, it throws java.net.SocketException: Connection timed out (Write failed).