[4.0] Optimizing Smart Search for larger content#30008
[4.0] Optimizing Smart Search for larger content#30008wilsonge merged 3 commits intojoomla:4.0-devfrom
Conversation
This comment was marked as abuse.
This comment was marked as abuse.
|
Do you have more than that one article in the installation? I'm asking because indexing several articles does require more memory than a single one. You could test if it works when you lower the number of items to index in one pass. |
This comment was marked as abuse.
This comment was marked as abuse.
|
This looks to be the problem that #16621 was intended to fix. |
|
I've reviewed the code and the cache actually doesn't really makes sense. We are storing the tokenised result of the whole string, which either doesn't help a lot, since the string is very short (category names, etc.) or it never has a cache hit, since we would have to index the exactly same text twice. This new cache now caches the tokens and looks those up instead. This actually reduces the necessary memory by several megabytes. I've had results that went as far as reducing from 30+MB to just 14MB. I also tested the cache size and 1024 is better than 2048. The later meant a significant increase in memory consumption, but no real reduction in execution time. I would assume that this would also solve the issue around #16621... @PhilETaylor can you test again if this change fixes your memory issue? |
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
|
PHP 7.3.5 and MariaDB 10.1.39 here. I've now gone and used the "worst" settings for indexing possible. Enabled tuple search, enabled stemming, had TinyMCE as editor and copied the whole bee movie script in there and it worked fine. My memory had a peak usage of 18MB. Can you debug where the huge memory usage is coming from? allocating 33MB at that point at once seems strange. |
This comment was marked as abuse.
This comment was marked as abuse.
|
Regarding the bee movie: I did try the re-indexing as well and that worked fine, too. Regarding the bible: We have a general issue with very large texts. It does not fail gracefully here and I don't know why (yet). I had similar issues where it loaded an empty form after saving for an eternity... This PR is hopefully a step towards handling larger documents better, but we should do larger, more systematic tests with large documents, too... |
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
|
I have tested this item ✅ successfully on e5ae33c This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/30008. |
|
RTC This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/30008. |
|
Thanks! |
…outs * '4.0-dev' of github.com:joomla/joomla-cms: (612 commits) [4.0] Smart Search: Fixing ordering, order direction and disabled button (joomla#29474) [4.0] Generate routed Modal links for iframes when not on the root (joomla#30007) [4.0] Get menu directly in com_tags menu route helper (joomla#30039) Remove collapse when resizing from mobile to desktop (joomla#30132) [4.0] Wrap component output in `main` element to make Cassiopeia more accessible (joomla#29870) [4.0] Webauthn gmp warning (joomla#29731) [4.0] Refactor to return early, remove if depths and throw NotAllowed (joomla#29694) [4.0] CLI help text (joomla#29811) Feature/draggable typo fixes (joomla#29987) [4.0] Removing unnecessary workaround in finder indexer (joomla#30037) [4.0] Optimizing Smart Search for larger content (joomla#30008) [4.0] Fix js ajax for pre update checker (joomla#29980) [4.0] Cassiopea: Fixing modals custom-select fields display (joomla#30097) [4.0][com_fields] Fix draggable sorting (joomla#30094) [4.0] Correct incorrect @return documentation (joomla#30092) [4.0] Menu items modal: adding missing filters (joomla#30087) short to long php open tags with echo (joomla#30089) Use new Toolbar (joomla#30085) [4.0] Center status/date created headers (joomla#29249) [4.0] Fix Cassiopea searchtools alignment in modals (joomla#30077) ... # Conflicts: # administrator/components/com_templates/src/View/Template/HtmlView.php # installation/sql/postgresql/base.sql # libraries/src/Application/AdministratorApplication.php # libraries/src/Application/SiteApplication.php
Summary of Changes
Smart Search has issues with larger content and to be honest, this already triggers rather early. It fails for content larger than 200kb.
The first issue is, that the description column in the #__finder_links table is not large enough. The solution here is to truncate the description to a reasonable size. That reasonable size is 32000 characters and I pulled that number out of thin air. I hope that it is low enough to allow for a string with multi-byte chars still to be saved.
The second issue is, that we are trying to store all tokens at once in the DB. That means if we have a text with 50k words, we are creating a query that tries to insert 50k rows at once. That fails spectacularly. So now we do this in chunks of 128 rows at a time.
Third issue is the tokens table is running full rather easily. The number of tokens storable in the tokens table is depending on the max_heap_table_size, which on my system is set to 16MB and which in that case fails at just 22k entries. To get around this issue, we have the toggleTables() method, but that has only been run so far between the different properties to index. If you text has 2MB in size, it tries to insert all those 2MB into the DB and it fails right away. Thus we are checking the limit also when storing in the additional code of issue 2. I also reduced the number of entries to 10k instead of 30k.
This fixes #27807.
Testing Instructions
Find a large text, for example the bible and create an article with that as content. Save and previously see that it takes a very long time and fails at some point. Apply this patch and try again. See now that it doesn't take as long and actually works successfully now.