Skip to content

Reorder tables with single SQL query (all DBs compatible), greatly improving reorder performance#11184

Closed
ggppdk wants to merge 7 commits intojoomla:stagingfrom
ggppdk:patch-13
Closed

Reorder tables with single SQL query (all DBs compatible), greatly improving reorder performance#11184
ggppdk wants to merge 7 commits intojoomla:stagingfrom
ggppdk:patch-13

Conversation

@ggppdk
Copy link
Copy Markdown
Contributor

@ggppdk ggppdk commented Jul 18, 2016

Performance fix for JTable::reorder(),

It should work in all Database servers, since it uses case/when/then/else/end expression, (SQL-92)

Summary of Changes

  • Replace multiple reordering SQL queries with a single query
  • Thus updates 10,000 records per query during re-ordering

Testing Instructions

1 - Applying PR to latest staging
2 - It helps to make the per row ORDER inputs visible, to do it add CSS to your backend template file: /templates/isis/css/template.css

.text-area-order, .sortable-handler + input {
  display: inline !important;  width: 40px; text-align: right;
}

You may want to add just before JTable::reorder() final return;
https://github.com/joomla/joomla-cms/pull/11184/files#diff-8254e2c441a41d3fa26f0f83fbbb3043R1431

JFactory::getApplication()->enqueueMessage('<b>reorder()</b>:<br/>'.$query, 'message');
  1. Then create new article
  2. Click "Save and close" (not "Save", since we want to see the enqueued message and also list the orderings in article manager) see an example in picture

table_reorder_single_query

$this->_db->setQuery($query);
$rows = $this->_db->loadObjectList();
// Reorder configuration
$uses_single_key = count($this->_tbl_keys == 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:
$uses_single_key = count($this->_tbl_keys) == 1;

@csthomas
Copy link
Copy Markdown
Contributor

If joomla users want to stay with old ordering then
what about use a transaction start before loop and commit after it.
This also speed up queries.
Joomla use innoDB, postresql and sqlazure which support database transactions.

Your patch will do the same but it is more complicated.

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Jul 18, 2016

What is complicated about it ?

it is actually very simple collect all updates in 1 query

also a transaction with multiple queries should always be slower or a lot slower,
or do you mean transaction with 1 query ?

@ghost
Copy link
Copy Markdown

ghost commented Jul 19, 2016

SORRY AGAIN! This test is NOT RELEVANT! Forgot to select "successful".

I have tested this item 🔴 unsuccessfully on c43254e

@ghost
Copy link
Copy Markdown

ghost commented Jul 19, 2016

I have tested this item ✅ successfully on c43254e

I would prefer this solution against #11139 because ordering numbers are more transparent. Starting at 1 or so instead of 2147483647.

(But the more I think about this whole reordering issue I don't understand why we have to reorder the table while saving a new article. Ordering is an unsigned field. Why not just calculating min_ordering once and set ordering of new article(s) to (--min_ordering). If not possible because of field size, only then do a reorder starting at 1.)


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/11184.

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Jul 19, 2016

Anyone testing

  • PLEASE USE latest staging, making the ordering inputs visible in J3.5.?-J3.6.2 breaks the drag and drop
    as it breaks drag and drop ordering (does the JS selector need the input elements to be hidden ?)

FIX for visible selectors was merged in latest staging (for J3.6.3+) here: #11572

@ghost
Copy link
Copy Markdown

ghost commented Jul 19, 2016

if you want to test drag and drop ordering, please be warned to remove the CSS change

I didn't and had no issues. Fields don't update without page reload. That's all

Off Topic:
Nice thing (just for my own private site) is that one can enter a number (e.g. -25). After drag&drop -25 is stored correctly in the database. You have to reload the page to see this ordering change in backend.

@csthomas
Copy link
Copy Markdown
Contributor

Please take a look at Review of solutions #11139 (comment)

@ggppdk ggppdk changed the title Reorder tables with single SQL query (all DBs compatible), allows to fix B/C break with new article ordering, and performance / race-condition issues when re-ordering in large sites Reorder tables with single SQL query (all DBs compatible), greatly improving reorder performance Jul 20, 2016
@csthomas
Copy link
Copy Markdown
Contributor

I'm back to there because this is more related.
To be less race conditions you would only add one transaction.

BEGIN

-- loop on each 10000 queries: 
--    UPDATE max 10000 rows

COMMIT

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Aug 10, 2016

this PR fixes performance issue and makes race issue very rare

but transactions are not be directly related to this PR

Joomla database API needs to add them first ?

  • checking if current DB supports them
  • checking if current sql user is privileged to do them
  • adding some API calls for them

right ?

@csthomas
Copy link
Copy Markdown
Contributor

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Aug 10, 2016

So it is safe to use regardless of enironment ?

i will make use of it,
thanks, small change in code of this PR

[EDIT]
i need to remove changes in articles model, and rebase it

@csthomas
Copy link
Copy Markdown
Contributor

csthomas commented Aug 10, 2016

I have tested transactionStart only on my own component.

You can see some example on fof library:
https://github.com/joomla/joomla-cms/blob/staging/libraries/fof/table/nested.php#L181

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Aug 13, 2016

@alikon

I have managed to pull it locally and rebase it, (originally i made it from the GitHub GUI)
So it is good to test when you finished your other work (as you said)

  • especially in non-Mysql DBs and see if some modification is needed for them

case/when/then/else/end expression, was added by SQL-92, so it should be supported by all DBs ?

@alikon
Copy link
Copy Markdown
Contributor

alikon commented Aug 13, 2016

added to my to-do list 👍
not so sure if i will be able to test on MSSQL too

p.s.
seems travis is not happy

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Aug 13, 2016

@alikon
Done, fixed the code styling,

Also updated the description of the first posting, to be up to date, shorter and clearer.
Any other testers welcome, especially if they can also test the non-mysql DBs too

@mbabker
Copy link
Copy Markdown
Contributor

mbabker commented Aug 13, 2016

I'd suggest checking your limits with the SQL Server docs. For an INSERT query, it only allows 1000 values at a time (see https://github.com/joomla/joomla-cms/blob/staging/administrator/components/com_finder/helpers/indexer/driver/sqlsrv.php#L543 for Smart Search where it hit this issue).

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Aug 13, 2016

That code tries to do the same thing for the case of inserts

  • i see 1 query per 1000 inserts

To be honest in this case i don't know of what limitation applies to the size of queries of SQL server

but, in our case the updates are really small strings

  • e.g. for MySQL max_packet_size default value 1MByte , (some servers change it to much larger)

Let's say that the code below is 30 characters
WHEN 'NNNNN' THEN OOOOOO

you will get
1,000,000 / 30 = 33,000 updates (ok if you count some 10% overhead this maybe smaller)
to fit in the default MAX packet size of MySQL

in the code of this, i use 10,000 updates per query

i do not know if SQL server has smaller limits
(in size and in update record count, i need to check the docs ...)

we can probably change it to 1,000 with small or little performance drop

If the values were mostly variable length,
we can also count the characters (considering the multibyte of UTF8 too),
(that is what i do in my queries to avoid hitting the limit)

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Aug 13, 2016

i will check the docs, but we also need a website, with some category with 30,000 articles for testing this properly

@brianteeman
Copy link
Copy Markdown
Contributor

brianteeman commented Aug 13, 2016 via email

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Aug 13, 2016

Thanks brian, i have for MySql,

i have made a patch for com_overload to make it run faster, and had created a website with 500,000 articles

but i have not setup any Joomla web-sites in other DBs

@mbabker
Copy link
Copy Markdown
Contributor

mbabker commented Aug 13, 2016

I mainly just want to make sure UPDATE queries in SQL Server don't have the
same data limitations as INSERT queries, or others that are more
restrictive than the other engines. We've got a couple dozen users to keep
happy ya know 😆

On Saturday, August 13, 2016, Georgios Papadakis notifications@github.com
wrote:

i will check the docs, but we also need some category with 30,000 articles
for testing this properly


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11184 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAWfocbTYXWcJJB5UYvv7MlV1wQYqOe5ks5qfi3fgaJpZM4JPNjv
.

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Aug 13, 2016

I hope this PR is proven good to be accepted for the other DBs too,
it will make many 3rd party extensions happy too )))

@alikon
Copy link
Copy Markdown
Contributor

alikon commented Sep 4, 2016

which will make a query of 10,000 x 30 characters (WHEN ... THEN ...) thus less than< 300 KBytes

plus for the IN()

10,000 x 10 characters [ IN (nnnnnnnn1, nnnnnnnn2, ......., nnnnnnnn) less than < 100 KBytes

so still under default max_packet_size 1MB ( should be the same on postgres )

call to @waader for a test on mssql

@waader
Copy link
Copy Markdown
Contributor

waader commented Sep 5, 2016

Reordering works with testing data installed but because of this #11932 i was not able to create new articles. So I could not follow the test instructions.

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Sep 7, 2016

Also during the time i was making this PR i tested by using limit of 5 or 50 values to confirm that breaking the update Query into more than 1 update queries works:

https://github.com/joomla/joomla-cms/pull/11184/files#diff-8254e2c441a41d3fa26f0f83fbbb3043R1374

so you may want to test by adding a 5 or 50 limit to the above (depending on how many articles / records your category has)

thus the update queries, for updating 500 records will be 10 UPDATE queries

  • also since the other "performance gain" PR to get last order number was reverted, and since no problems has been found for this PR, it would be good to get a milestone of J3.6.3 ?

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Sep 24, 2016

The change of using last order number in new record creation was reverted
so J3.6.3, will now again have same slow down on new records saving in site with many records per category

This PR does not have a milestone

@DavorMtc
Copy link
Copy Markdown

After upodating my joomla to 3.6.3 (from 3.6.2) I had problems with "504 gateaway" error when saving new articles.

Found this PR, did everything like its written and now its working fine.

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Oct 24, 2016

Since PR: #11581 Revert "Performance gain - new featured article"
has fixed: #11103 New articles are created with last possible "ordering"

This PR was supposed to also be considered for accepting and merging, to address the performance issue,
(see my previous comment) but eventually this did not happen,

so J3.6.3 has the known JTable::reorder() performance issue (and in big sites timeouts can occur too)

[EDIT]
Updating multiple values in single SQL query is very important in terms of performance, when you need to update many values, and that this exactly why this feature exists in standard SQL !

@mbabker
Copy link
Copy Markdown
Contributor

mbabker commented Nov 7, 2016

Reading through this conversation and patch again, I'm still not 100% comfortable with this. It still seems as though it has been coded against default MySQL configurations. I've seen a couple tests for PostgreSQL, where frankly I'm not as concerned about it working since it usually behaves closer to MySQL anyway. I haven't seen anything in here at all for SQL Server (and yes it's still supported right now so that needs to be validated).

Two things I think need to happen:

  1. SQL Server testing; this CANNOT be merged without testing on that platform as long as we make a claim to supporting it
  2. How can this be improved to be better aware of environmental limits with the different platforms? This is hardcoded against MySQL's default configurations, what happens in the case where smaller limits are used or one of the platforms doesn't match the expectations added here?

I'm really not trying to be difficult, even if others think my arguments are BS. But we are making changes at the base level of the database integration layer and the changes need to be properly validated and flexible enough to deal with lower end configurations. So far, I think we're OK for default MySQL and PostgreSQL configurations.

@csthomas
Copy link
Copy Markdown
Contributor

csthomas commented Nov 7, 2016

There is a very old article that describe similar solution to this PR at http://www.databasejournal.com/features/mssql/article.php/1460001/Complex-Updates-Using-the-Case-Statement.htm

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Nov 7, 2016

  1. How can this be improved to be better aware of environmental limits with the different platforms? This is hardcoded against MySQL's default configurations, what happens in the case where smaller limits are used or one of the platforms doesn't match the expectations added here?

The query start breaking into multiple queries at 10,000 records, which is about 350KBytes
because 10,000 * 30 (or much less characters) + some overhead <= 350 KBytes

Max query string size for MySQL, default is: 1MByte
-- would anyone reconfinguring this to a lower limit and having a website with 10,000+ articles in the same category ?! (i have not seen such environments)

Max query string size for postgresql: 1 Gbyte

@mbabker

So, this is well below the default limits, but even if we change the limit to e.g. 5,000 or to 3,000 records per query the performance difference will be small / very small

  • tell me and i will just can just change it now, no real performance impact to lower the default from 10,000 records to 3,000 records

@csthomas
Copy link
Copy Markdown
Contributor

csthomas commented Nov 7, 2016

The other idea could be use row_number described at http://stackoverflow.com/questions/3047789/how-to-enumerate-returned-rows-in-sql which is supported on mssql, and postresql from 8.4. For mysql there could be a workaround with @ variables. This is only idea.

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Nov 7, 2016

@csthomas

You mean this PR (scroll to the bottom to see the query)
https://github.com/joomla/joomla-cms/pull/8563/files

@mbabker
Copy link
Copy Markdown
Contributor

mbabker commented Nov 7, 2016

I don't know what can be done, but all I know is this is coded against MySQL's defaults, and based on your comment I guess PostgreSQL has a bigger default. Are we sure SQL Server has a similar configuration? If so then it should be fine, but if not then this needs to be addressed.

Also, as I pointed out above, SQL Server has a stricter configuration with the number of records/values that can be set with queries. Maybe this isn't an issue with UPDATE queries as being issued here, but when I wrote the SQL Server adapter for Smart Search it was a very real issue with INSERT queries and the 10,000 records limit you have here exceeds the 1,000 record limitation I had to work around. So even if the configurations are OK for the query size (in terms of bytes), we may still have an issue with this aspect of it. Unless I scrolled past something, that was another thing I don't remember seeing an answer for.

Again, so far it seems like we're doing fine for MySQL and PostgreSQL for the most part. But there's still one environment with no testing that we claim to support. At least we're dropping it at 4.0 but we should still make sure it's working as long as 3.x is supported.

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Nov 7, 2016

hhmm,

  • i will test against postgress (@alikon had tested)
  • about SQL server, i will try to install sql server lite (or something) and try to test with it too

Also even a very low limit of 500 records will have about 90%-95% of the performance benefit that 10,000 records limit per query gives

so we can just use 1,000 as limit of records and thus be sure of SQL server too

@csthomas
Copy link
Copy Markdown
Contributor

csthomas commented Nov 8, 2016

@ggppdk I thought about something more cross platform like #12839.

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented Nov 9, 2016

I ll find time to test

@mbabker
Copy link
Copy Markdown
Contributor

mbabker commented May 21, 2017

With the other PRs that have been merged addressing reordering performance, is this still valid?

@ggppdk
Copy link
Copy Markdown
Contributor Author

ggppdk commented May 21, 2017

This PR is no longer needed

@ggppdk ggppdk closed this May 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.