Skip to content

[Colossus] fix: upload aborted error#5094

Merged
mnaamani merged 3 commits intoJoystream:masterfrom
zeeshanakram3:colossus_fix_upload_aborted_error
Mar 11, 2024
Merged

[Colossus] fix: upload aborted error#5094
mnaamani merged 3 commits intoJoystream:masterfrom
zeeshanakram3:colossus_fix_upload_aborted_error

Conversation

@zeeshanakram3
Copy link
Copy Markdown
Contributor

addresses #5081

  • Updates @types/node dependency version to 18.6.0
  • Fix colossus upload aborted error

Context

There is a server.requestTimeout configuration option in the http.Server instance (remember express is just a wrapper around nodejs native HTTP Server), from the nodejs official documentation what this option means:

Sets the timeout value in milliseconds for receiving the entire request from the client.

If the timeout expires, the server responds with status 408 without forwarding the request to the request listener and then closes the connection.

The default value for the option is 300s for node versions >=18.x, while for the previous version the default value was 0 (no timeout)

So, the uploads failing problem started to occur when we updated the Nodejs from v14 to v18.6.0 in #4778

This timeout value of 300s also validates Ignazio's investigation that Colossus aborts errors after ~5 minutes.

Copy link
Copy Markdown
Member

@mnaamani mnaamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you found this settings. I suppose we should have looked more closely at changes between v14 and v18 that would have impacted our applications. Let keep that in mind when we shift to v20+

If we want a quick deployment of this fix because we are seeing many failed uploads this is good to go. Just need to bump colossus version.


// INFO: https://nodejs.org/dist/latest-v18.x/docs/api/http.html#serverrequesttimeout
// Set the server request timeout to 0 to disable it. This was default behaviour pre Node.js 18.x
server.requestTimeout = 0
Copy link
Copy Markdown
Member

@mnaamani mnaamani Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have the timeout disabled, provided the node is always behind a reverse proxy and not exposed on a public interface.

Alternatively we can have a non zero timeout, but larger value of say 20min, better than default of 5min?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or disabled timeout but make sure docs and our default deployment configs do not expose public interface.. ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial idea was to set a env var as a parameter named MAX_REQUEST_TIMEOUT = 20 * 60 * 1000 (20 mins in milliseconds) so that it can be set by the user.

@zeeshanakram3 zeeshanakram3 requested a review from mnaamani March 11, 2024 09:52
@dobertRowneySr
Copy link
Copy Markdown
Collaborator

The fix works in the sense that the above observed error is gone and the request time is in my case 6.4 mins > 5mins

Copy link
Copy Markdown
Collaborator

@dobertRowneySr dobertRowneySr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have also tested the fix locally

@mnaamani mnaamani merged commit 50953d1 into Joystream:master Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants