Once upon a time a client asked me to look into a problem. I found a bug and was ready to fix it with a simple low-risk change, but they said they preferred the known bug behavior over the correct behavior because they had never known correct behavior. 😂
We had a client do the same. A team member was dedicated to investigating the behaviour, and finding a fix. She spent the better part of eight months hunting down the extremely obscure cause, and finding a fix that didn't create cascade errors in other systems. She spend six weeks testing it.
When she was finally ready to deploy it to production, the customer informed us that the operators had decided two weeks after submitting the initial bug report that they would rather not change their workflow, and told their managers not to bother fixing it.
This, of course, was followed with "didn't anyone tell you?".
But you'll still need a crisis committee, because (assuming the server is actually used by the public), the effects of the outage may linger for a while, and you'll want to release a notice about what happened.
For example, the recent Cloudflare outage that crippled a lot of the Internet for a day.
Once upon a time a client asked me to look into a problem. I found a bug and was ready to fix it with a simple low-risk change, but they said they preferred the known bug behavior over the correct behavior because they had never known correct behavior. 😂
We had a client do the same. A team member was dedicated to investigating the behaviour, and finding a fix. She spent the better part of eight months hunting down the extremely obscure cause, and finding a fix that didn't create cascade errors in other systems. She spend six weeks testing it.
When she was finally ready to deploy it to production, the customer informed us that the operators had decided two weeks after submitting the initial bug report that they would rather not change their workflow, and told their managers not to bother fixing it.
This, of course, was followed with "didn't anyone tell you?".
Uncomfortably real...
I did. I gaved myselfs permission. 😈
OMG this is so perfect!!!
But you'll still need a crisis committee, because (assuming the server is actually used by the public), the effects of the outage may linger for a while, and you'll want to release a notice about what happened.
For example, the recent Cloudflare outage that crippled a lot of the Internet for a day.
He did QA the change lol. So true happens everyday.
I'll post this and tag my friend who works in DevOps😂😂