Skip to content

Improve ActiveStorage asset linking performance#12576

Merged
alecslupu merged 50 commits intodevelopfrom
fix/storage-service-uris
Jul 23, 2024
Merged

Improve ActiveStorage asset linking performance#12576
alecslupu merged 50 commits intodevelopfrom
fix/storage-service-uris

Conversation

@ahukkanen
Copy link
Copy Markdown
Contributor

@ahukkanen ahukkanen commented Mar 7, 2024

🎩 What? Why?

Use direct links to the assets living at storage services in order to push most of the application load to the storage service instead of the website itself.

Currently all uploaded assets in Decidim are served either through the ActiveStorage redirect URLs or representation URLs. This would make sense in some situations (e.g. if the files are protected) but currently all assets served by Decidim should be considered public.

Therefore, we can serve these images directly through the storage service because otherwise there will be an extra request to the service for each and every image to be served. If the page has a lot of different images, such as typical front page or listing of proposals, this adds excessive load on the servers because in worst case situations they are doing hundreds of requests to the Rails service just to serve one page. Under heavy load this becomes a real problem, e.g. if you have 100 users loading the front page at the same time containing 100 images, you would generate 10 000 requests for the Rails app in a very short timespan (note that 100 people is not that many).

To solve this issue, we can serve the images directly through the storage service bypassing all these requests coming to the Rails app. Note that some requests still need to be loaded through the app, such as the variant representation URLs during their first load (when the variant is created). But after the processing of the image variant, they can be served directly through the storage service.

📌 Related Issues

Testing

  • Create a live instance of Decidim with a storage service configured, such as S3, Azure, Google, etc.
    • Alternatively, ask someone for permission to use their instance for load testing
  • Upload a bunch of images through ActiveStorage that are displayed on the front page
  • Install a benchmarking tool (e.g. using siege as pointed out by Andrés or similar tools)
  • Open top at the machine running Decidim
  • Run the benchmark, e.g. create 10k requests with a concurrency of 100 (depending on your server size)
  • After the test finishes, store the results
  • Apply this fix and repeat the test
  • See highly decreased amount of requests and CPU usage

Use direct links to the assets living at storage services in order
to push most of the application load to the storage service
instead of the website itself.
github-actions[bot]
github-actions bot previously approved these changes Mar 7, 2024
The image variants need to be processed through the local
representation URLs when they are served the first time because
this is when these assets are processed for the first time.

These changes detect if the variant needs processing and serves
it through the local representation URL before the image has been
processed. If the image is already processed it is served directly
through the storage service itself.
- Fix the storage asset specs to match the current implementation
- Add more tests to test the router under different configurations
  and options
- Add an RSpec support utility to define the storage host when the
  storage URLs are tested outside of a request context
github-actions[bot]
github-actions bot previously approved these changes Mar 19, 2024
github-actions[bot]
github-actions bot previously approved these changes Mar 19, 2024
github-actions[bot]
github-actions bot previously approved these changes Mar 20, 2024
@andreslucena
Copy link
Copy Markdown
Member

Install a benchmarking tool (first I thought of ab but it won't work in this case because it doesn't request the images)

siege does that exactly. It even has an option to disable it. From its man page:

   --no-parser
       Turn off the HTML parser. **When siege downloads a page, it parses it for additional page elements such as style-sheets, javascript and images**. It will make additional
       requests for any elements it finds. With this option enabled, siege will stop after it pulls down the main page.

(Asterisks are mine)

- Add more logic for the disk service to determine the
  availability of the current host before generating the URL
- Define the current host based on the record if the record has
  details about its organization, only if the current host hasn't
  been set already
Copy link
Copy Markdown
Contributor

@alecslupu alecslupu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahukkanen , i have spent my day trying to fully understand the nature of your fix and also to see what are the end results of siege.

I can say that while running on develop, the homepage executed **~ 39 transactions** ( siege -c 1 -r 1 --verbose http://localhost:3000/ )
While running the code on this branch, the same homepage executed **~ 33 transactions**.

Regarding the load, i could not observe an improvement on the tests conducted, but I am happy that now we can use directly the S3 host.

PS. There are some conflicts & failing specs . It would be great if you can allocate some time to fix the issues , so we could merge this.

@andreslucena
Copy link
Copy Markdown
Member

Hi @ahukkanen!
Have you seen the last message from Alex? Can you check it out please? We want to clean the backlog of pending PRs, so if you don't have the bandwith to solve the pending comments and conflict, let us know and we can tackle this. Thanks!

github-actions[bot]
github-actions bot previously approved these changes Jul 22, 2024
Co-authored-by: Alexandru Emil Lupu <contact@alecslupu.ro>
github-actions[bot]
github-actions bot previously approved these changes Jul 22, 2024
Co-authored-by: Alexandru Emil Lupu <contact@alecslupu.ro>
github-actions[bot]
github-actions bot previously approved these changes Jul 22, 2024
github-actions[bot]
github-actions bot previously approved these changes Jul 22, 2024
@ahukkanen
Copy link
Copy Markdown
Contributor Author

@alecslupu @andreslucena Conflicts are now resolved and the failing specs are fixed.

The last failing set of specs is because Capybara has failed to start the server itself due to a reserved port. I do not completely understand why but I have a guess that I made another PR about (see #13194).

There might be more flaky ones introduced by this because the specs that I fixed were not waiting for the pending requests to finish before continuing their operations. This happened e.g. when inviting user as a parivate participant and then logging out user after that. The previously pending request was not finished before proceeding with the next request where the assumption was a logged out state. These are mainly issues with the individual specs.

@alecslupu Instead of looking at the number of database transactions, I would suggest looking at the amount of HTTP requests served by Decidim before and after this feature. The problem is apparent when there is a high load and Decidim has to serve several simultaneous requests. It causes a bottleneck where as with this feature, most of that load is passed to the file server bypassing Decidim completely.

Of course, it depends on how many images you are serving from the home page or individual pages you are testing. When there are no to few images on the page, there is basically no difference between the two approaches.

Copy link
Copy Markdown
Contributor

@alecslupu alecslupu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants