Skip to content

Sql optimise new index merged branch#4115

Closed
nadeeshaan wants to merge 19 commits intojoomla:stagingfrom
nadeeshaan:SqlOptimise_new_index_merged_branch
Closed

Sql optimise new index merged branch#4115
nadeeshaan wants to merge 19 commits intojoomla:stagingfrom
nadeeshaan:SqlOptimise_new_index_merged_branch

Conversation

@nadeeshaan
Copy link
Copy Markdown
Contributor

@nadeeshaan nadeeshaan commented Aug 14, 2014

Testing Instructions:

This PR consists all the PRs merged in to this one regarding the index additions. Two approaches can be followed in order to test this PR. One is in an already installed site and one is for fresh distributions.

For already installed sites

For already installed sites first apply the PR and then add follow the bellow commands in order to add the indexes manually to the database

  • Index added to #_languages table (You can see this query execution when you access the site)

ALTER TABLE #__languages ADD INDEX 'idx_published_ordering' ('published','ordering'); (#11)

Related Query Before changes

  SELECT *
  FROM #__languages
  WHERE published=1
  ORDER BY ordering ASC

Related Query Before changes

  SELECT lang_id,lang_code,title,title_native,sef,image,description,
                 metakey,metadesc,sitename,published,access,ordering
  FROM #__languages
  WHERE published=1
  ORDER BY ordering ASC

  • Index added to #_session table (Query executes every time a user access a page)

ALTER TABLE #__session DROP PRIMARY KEY, ADD PRIMARY KEY(session_id(32)) ( #8 )
ALTER TABLE #__session DROP INDEX time, ADD INDEX time(time(10))

Index Description
This Index added in order to speed up the session update query allowing the sub string index usages


  • Index added to #_template_styles table (You can see this query execution when you access the site)

ALTER TABLE #__template_styles ADD INDEX idx_client_id('client_id') ( #13 )

Index Description
Without applying this index the below query cannot use any index to retrieve the rows from the database. After applying this index it clearly shows in the explanation section that the query uses the index to retrieve the rows

Related Query

    SELECT id, home, template, s.params 
    FROM #__template_styles as s 
    LEFT JOIN #__extensions as e 
    ON e.element=s.template 
    AND e.type='template' 
    AND e.client_id=s.client_id 
    WHERE s.client_id = 0 AND e.enabled = 1

  • Index added to #__contentitem_tag_map table (You can see this query execution when you access the site)

ALTER TABLE #__contentitem_tag_map ADD INDEX idx_alias_item_id ('type_alias','content_item_id') ( #14 )

Index Description
Without applying this index the below query cannot use any index to retrieve the rows from the database. After applying this index it clearly shows in the explanation section that the query uses the index to retrieve the rows. Also one another change did relevant to this index changes. Which is removing the t.* from the relevant query and adding the table fields instead. This change can be found (#14) and please refer to that PR also.

Related Query Before changes

    SELECT m.tag_id,t.*
    FROM #__contentitem_tag_map AS m
    INNER JOIN #__tags AS t ON m.tag_id = t.id
    WHERE m.type_alias = 'com_content.article' AND m.content_item_id = 5196 AND t.published = 1
    AND t.access IN (1,1,5)

Related Query After changes

    SELECT m.tag_id,'id',t.parent_id,t.lft,t.rgt,t.level,t.path,t.title,t.alias,t.note,t.description,t.published,
    t.checked_out,t.checked_out_time,t.access,t.params,t.metadesc,t.metakey,t.metadata,
    t.created_user_id,t.created_time,t.created_by_alias,t.modified_user_id,t.modified_time,
    t.images,t.urls,t.hits,t.language,t.version,t.publish_up,t.publish_down
    FROM #__contentitem_tag_map AS m
    INNER JOIN #__tags AS t ON m.tag_id = t.id
    WHERE m.type_alias = 'com_content.article' AND m.content_item_id = 5196 AND t.published = 1
    AND t.access IN (1,1,5)

  • Index added to #_extensions table (You can see this query execution when you access the site)

ALTER TABLE #__extensions ADD INDEX 'idx_type_ordering' ('type','ordering') ( #16 )

Index Description
Without applying this index the below query uses filesort to retrieve the rows from the database, which is an expensive retrieval operation. After applying this index it clearly shows in the explanation section that the query uses the index to retrieve the rows except the filesort.

Related Query

    SELECT folder AS type, element AS name, params
    FROM #_extensions
    WHERE enabled >= 1
    AND type ='plugin'
    AND state >= 0
    AND access IN (1,1,5)
    ORDER BY ordering

  • Index added to #_menu table (You can see this query execution when you access the site)

ALTER TABLE #__menu ADD INDEX 'idx_client_id_published_lft' ('client_id','published','lft') ( #17 )

Index Description
Without applying this index the below query uses filesort to retrieve the rows from the database, which is an expensive retrieval operation. After applying this index it clearly shows in the explanation section that the query uses the index to retrieve the rows except the filesort.

Related Query

    FROM #_menu AS m
    LEFT JOIN #_extensions AS e
    ON m.component_id = e.extension_id
    WHERE m.published = 1
    AND m.parent_id > 0
    AND m.client_id = 0
    ORDER BY m.lft

  • Index added to #__viewlevels table
    (query Executes when clicking on the add new category in the admin console)

Add New Category

ALTER TABLE #__viewlevels ADD INDEXidx_ordering_title(ordering,title)( #21 )

Index Description
Without applying this index the below query uses filesort to retrieve the rows from the database, which is an expensive retrieval operation. After applying this index it clearly shows in the explanation section that the query uses the index to retrieve the rows except the filesort. Also changed the JROOT/libraries/cms/html/access.php file's query which is shown below by removing the GROUP BY clause since the PRIMARY KEY includes in the selected fields. And this reduce the execution time

Related Query Before Change

    SELECT a.id AS value, a.title AS text
    FROM ltzvy_viewlevels AS a
    GROUP BY a.id, a.title, a.ordering
    ORDER BY a.ordering ASC,`title` ASC

Related Query After Change

    SELECT a.id AS value, a.title AS text
    FROM ltzvy_viewlevels AS a
    ORDER BY a.ordering ASC,`title` ASC

@nadeeshaan
Copy link
Copy Markdown
Contributor Author

Related Tracker - Tracker!

@810
Copy link
Copy Markdown
Contributor

810 commented Aug 15, 2014

@test

Test results: viewlevels

Before patch: 0.97 ms
After patch: 1.50 ms

  • Failed

@infograf768
Copy link
Copy Markdown
Member

Please regroup in one file the db alterations to ease testing and as anyway it would be necessary:
Examples:
https://github.com/joomla/joomla-cms/tree/staging/administrator/components/com_admin/sql/updates/mysql

@beat
Copy link
Copy Markdown
Contributor

beat commented Aug 15, 2014

@810 Test is not necessarily failed: If your test is with default small dataset, query time can increase a bit with indexes with small sets, but with big sets dramatically decrease.

@nadeeshaan Great work!

It is easy to review the great work that you did from a database optimization perspective, as it is well presented here. So i'll give my database optimization perspective comments one by one:

ALTER TABLE #__languages ADD INDEX 'idx_published_ordering' ('published','ordering'); (#11)

agree.

ALTER TABLE #__session DROP PRIMARY KEY, ADD PRIMARY KEY(session_id(32)) ( #8 )

agree

ALTER TABLE #__template_styles ADD INDEX idx_client_id('client_id') ( #13 )

shouldn't that be ?:

ALTER TABLE #__template_styles ADD INDEX idx_client_id('client_id','template') ( #13 )

and the related query be changed to:

SELECT id, home, template, s.params 
    FROM #__template_styles as s 
    LEFT JOIN #__extensions as e 
    ON e.element=s.template 
    AND e.type='template' 
    AND e.client_id = 0
    AND e.enabled = 1
    WHERE s.client_id = 0

?

ALTER TABLE #__contentitem_tag_map ADD INDEX idx_alias_item_id ('type_alias','content_item_id') ( #14 )

agree.

ALTER TABLE #__extensions ADD INDEX 'idx_type_ordering' ('type','ordering') ( #16 )

agree

ALTER TABLE #__menu ADD INDEX 'idx_client_id_published_lft' ('client_id','published','lft') ( #17 )

agree

ALTER TABLE #__viewlevels ADD INDEXidx_ordering_title(ordering,title)( #21 )

agree.

(my above comments are from a MySQL/MariaDB perspective only, which is anyway 99+% of Joomla's audience).

@nadeeshaan
Copy link
Copy Markdown
Contributor Author

Thank you very much for helping to verify the indexing @beat. I will
definitely look at the #13 and correct the issue.
:)

On Sat, Aug 16, 2014 at 2:08 AM, beat notifications@github.com wrote:

@810 https://github.com/810 Test is not necessarily failed: If your
test is with default small dataset, query time can increase a bit with
indexes with small sets, but with big sets dramatically decrease.

@nadeeshaan https://github.com/nadeeshaan Great work!

It is easy to review the great work that you did from a database
optimization perspective, as it is well presented here. So i'll give my
database optimization perspective comments one by one:

ALTER TABLE #__languages ADD INDEX 'idx_published_ordering'
('published','ordering'); (#11
#11)

agree.

ALTER TABLE #__session DROP PRIMARY KEY, ADD PRIMARY KEY(session_id(32)) (
#8 #8 )

agree

ALTER TABLE #__template_styles ADD INDEX idx_client_id('client_id') ( #13
#13 )

shouldn't that be ?:

ALTER TABLE #__template_styles ADD INDEX idx_client_id('client_id','template') ( #13 )

and the related query be changed to:

SELECT id, home, template, s.params
FROM #__template_styles as s
LEFT JOIN #__extensions as e
ON e.element=s.template
AND e.type='template'
AND e.client_id = 0
AND e.enabled = 1
WHERE s.client_id = 0

?

ALTER TABLE #__contentitem_tag_map ADD INDEX idx_alias_item_id
('type_alias','content_item_id') ( #14
#14 )

agree.

ALTER TABLE #__extensions ADD INDEX 'idx_type_ordering'
('type','ordering') ( #16 #16 )

agree

ALTER TABLE #__menu ADD INDEX 'idx_client_id_published_lft'
('client_id','published','lft') ( #17
#17 )

agree

ALTER TABLE #__viewlevels ADD INDEXidx_ordering_title(ordering,title)( #21
#21 )

agree.

(my above comments are from a MySQL/MariaDB perspective only, which is
anyway 99+% of Joomla's audience).


Reply to this email directly or view it on GitHub
#4115 (comment).

Nadeeshaan Gunasinghe
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka

@brianteeman
Copy link
Copy Markdown
Contributor

@test I'm finding it much harder to test this one as it isnt clear exactly where each of the indexes should be tested

@nadeeshaan
Copy link
Copy Markdown
Contributor Author

@brianteeman In the above testing Instructions comment I have made include the details about the index and the related query as well as where to find the query. If you have any issue of figuring out an index to be test please mention and I will definitely update the instruction
:)

@alikon
Copy link
Copy Markdown
Contributor

alikon commented Aug 18, 2014

@brianteeman you can use the explain on "the query" before apply the patch and after applying the patch

SELECT lang_id,lang_code,title,title_native,sef,image,description,
                 metakey,metadesc,sitename,published,access,ordering
  FROM #__languages
  WHERE published=1
  ORDER BY ordering ASC

before apply the pr
explain_before

after apply the pr
explain_after

you should notice the use of the new index

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change it to ->where('a.title != "" ') instead of above.
and if you can make it like this, it will be awesome.

->where($db->qn('a.title') . ' != "" ')

Thanks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be changed from <> to != ?

iirc, in SQL the prefered notation for inequality is <>.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand for String to String comparison it is better to use !=

@gunjanpatel
Copy link
Copy Markdown
Contributor

@infograf768 added update file for ALTER Queries.
Thanks.

@alikon
Copy link
Copy Markdown
Contributor

alikon commented Aug 18, 2014

@test the @beat tip

the query before

 SELECT id, home, template, s.params
 FROM old_template_styles as s LEFT JOIN old_extensions as e 
ON e.element=s.template AND e.type='template' AND e.client_id=s.client_id 
WHERE s.client_id = 0 
AND e.enabled = 1;

and the relative explain
explain_before_beat

the query after

SELECT id, home, template, s.params 
FROM vixuv_template_styles as s LEFT JOIN vixuv_extensions as e 
ON e.element=s.template 
AND e.type='template' 
AND e.client_id = 0 
AND e.enabled = 1 
WHERE s.client_id = 0;

and the relative explain
explain_after_beat

@wilsonge
Copy link
Copy Markdown
Contributor

I'm really not a fan of this changing of a * to the entire set of fields from the table. How much of an effect does this have on performance vs just the key changes?

I'm just worried that readability is severely affected. Plus if any db changes ever happen to these tables it means more places where these things need changes and where bugs can be introduced

@beat
Copy link
Copy Markdown
Contributor

beat commented Aug 20, 2014

@wilsonge Good points! 👍

You're right: indexes have MUCH bigger performance gains than rewriting * to fields lists. And specially if it's full fields-list, I doubt about any performance gains, and if there are any, that could be automated in code and not made manually for maintainability in case of added fields.

In MySQL, sometimes/often * is slightly faster than the full list. E.g. COUNT(*) can be faster than e.g. COUNT(id). In Postgress, it could be different in older versions.

@gunjanpatel
Copy link
Copy Markdown
Contributor

@wilsonge It's really a good point. In future if there will be any additional field or need any field somewhere have to change it several places. We can check performance difference with and without a.*.
What do you think about it @alikon ?

@alikon
Copy link
Copy Markdown
Contributor

alikon commented Aug 20, 2014

sorry is my mental sql syntax check 1.0 alpha
that hate select * and replace it with the field list

the right thing to do is to select the only "app" needed fields
despite in several db it may perform different this is not the "point"

i think is more easy to understand what that query "ask"
and even if it perform worst i'd prefer cause the code is more readable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there might be a change from spaces to a tab here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Sophist-UK I will fix this issues.

@waader
Copy link
Copy Markdown
Contributor

waader commented Jan 14, 2015

Can you add the indexes for postresql and mssql too?

@zero-24
Copy link
Copy Markdown
Contributor

zero-24 commented Jan 14, 2015

@nadeeshaan

Github say that
This pull request contains merge conflicts that must be resolved.

Can you have a look into it?

@gunjanpatel gunjanpatel force-pushed the SqlOptimise_new_index_merged_branch branch from 1e219eb to 7e0ed08 Compare January 25, 2015 06:32
@gunjanpatel
Copy link
Copy Markdown
Contributor

This PR is conflict is now solved and ready to review. Thanks everyone for review and tests.

@wnnz34
Copy link
Copy Markdown
Contributor

wnnz34 commented Aug 21, 2015

the Query
SELECT m.core_content_id,m.content_item_id,m.type_alias,COUNT( tag_id) AS count,ct.router,cc.core_title,cc.core_alias,cc.core_catid,cc.core_language,cc.core_params

FROM sbgcn_contentitem_tag_map AS m

INNER JOIN sbgcn_tags AS t
ON m.tag_id = t.id

INNER JOIN sbgcn_ucm_content AS cc
ON m.core_content_id = cc.core_content_id

INNER JOIN sbgcn_content_types AS ct
ON m.type_alias = ct.type_alias

WHERE m.tag_id IN (2,3,4,5)
AND t.access IN (1,1)
AND (cc.core_access IN (1,1) OR cc.core_access = 0)
AND (m.content_item_id <> 24 OR m.type_alias <> 'com_content.article')
AND cc.core_state = 1
AND (cc.core_publish_up='0000-00-00 00:00:00' OR cc.core_publish_up<='2015-08-21 11:02:01')
AND (cc.core_publish_down='0000-00-00 00:00:00' OR cc.core_publish_down>='2015-08-21 11:02:01')

GROUP BY m.core_content_id,m.content_item_id,m.type_alias,ct.router,cc.core_title,cc.core_alias,cc.core_catid,cc.core_language,cc.core_params

ORDER BY count DESC
LIMIT 0, 5
Tested has a Query Time of 15.00 ms. and it is slower than some MSQL Queries.

SELECT folder AS type, element AS name, params
FROM #_extensions
WHERE enabled >= 1
AND type ='plugin'
AND state >= 0
AND access IN (1,1,5)
ORDER BY ordering
tested ! and all the others MYSQL QUERIES.


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/4115.

@euismod2336
Copy link
Copy Markdown
Contributor

I have tested this item 🔴 unsuccessfully on 7e0ed08

Confirmed for all but #17, see screenshot. I assumed that also here the filesort shouldn't be used anymore
Screenshot


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/4115.

@beat
Copy link
Copy Markdown
Contributor

beat commented Apr 15, 2016

@euismod2336 That's not necessarily an unsuccessful test! Specially if master has no better result... Many other improvements, worthwhile ones were made here. :-)

sure an adequate index seems to exist: (client_id, published, lft) , and would be most probably used by MySQL depending on the size of the result. The issue here for not being able to use straight that index is the m.parent > 0 in the where statement. You should try again with maybe 1000 or 10000 rows to see if MySQL uses that index.

@wilsonge
Copy link
Copy Markdown
Contributor

No but this PR does conflict because it predates all the utf8mb4 index changes - there's no point in testing this unless the branch is updated again

@wilsonge
Copy link
Copy Markdown
Contributor

For when @brianteeman hits this in a few weeks at the sprint tho - this is one that needs redoing - not just closing. It's going to give big improvements for "free"

@euismod2336
Copy link
Copy Markdown
Contributor

@beat i agree that the improvements are considerable, but not all issues were "fixed" (as far as i could see), i'll have another go at it with more records if it might change it. all the other queries showed direct results.

@wilsonge not sure about the utf8mb4 changes, but this was PR was tested on the staging branch.


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/4115.

@euismod2336
Copy link
Copy Markdown
Contributor

I have tested this item ✅ successfully on 7e0ed08

After adding 10.000 records the query is done properly on the index.

Thank you @beat for pointing this out.

Image


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/4115.

@zero-24
Copy link
Copy Markdown
Contributor

zero-24 commented May 7, 2016

Set to "closed" on behalf of @zero-24 by The JTracker Application at issues.joomla.org/joomla-cms/4115

@zero-24
Copy link
Copy Markdown
Contributor

zero-24 commented May 7, 2016

Closing as we have a new updated PR at #10284 by @alikon


This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/4115.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.