-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
I discussed this with @shlomi-noach on Monday:
--critical-load is currently a copy of a flag from pt-online-schema-change: in the event that that value is hit (e.g. high threads_running), gh-ost panics and dies.
This behaviour makes sense in pt-online-schema-change because it has triggers on the master/host that copy writes from the old to new table, so on things going badly, you want those triggers removed since they impede performance.
With gh-ost, however, if its copy thread is paused, there's no extra load on the master (the gh-ost tables are there, but no write activity is going to them). So pausing gh-ost for a period of time looks the same to the master as gh-ost not running at all.
On that note, I suggest a new feature, where --critical-load is treated as a more severe --max-load, where gh-ost is paused for, say, an hour (or some other default value). This is on the idea that whatever bad thing just happened is temporary and/or fixable and we'd rather not lose hours/days of table copying because of it. And if things were fixed before that hour is up, gh-ost could be unpaused.