Whats Wrong with Facebook 2019
By
Ega Wahyudi
—
Monday, September 2, 2019
—
What's Wrong With Facebook
Whats Wrong With Facebook
The crucial problem that created this failure to be so extreme was a regrettable handling of an error problem. An automated system for validating configuration values ended up creating much more damage than it taken care of.
The intent of the automated system is to look for setup values that are invalid in the cache and also change them with upgraded values from the consistent shop. This functions well for a short-term trouble with the cache, however it does not function when the persistent shop is invalid.
Today we made a change to the relentless copy of a configuration worth that was taken invalid. This meant that every client saw the invalid value and also tried to fix it. Because the repair includes making a question to a collection of data sources, that collection was rapidly bewildered by numerous thousands of questions a second.
To make matters worse, every single time a client obtained an error trying to quiz one of the databases it analyzed it as a void worth, and also erased the matching cache key. This meant that even after the original trouble had been taken care of, the stream of inquiries proceeded. As long as the data sources stopped working to service some of the demands, they were creating even more demands to themselves. We had gone into a feedback loophole that really did not allow the data sources to recuperate.
The method to stop the responses cycle was rather agonizing - we needed to stop all web traffic to this data source collection, which suggested turning off the website. When the data sources had actually recuperated and the origin had actually been dealt with, we slowly allowed even more people back onto the site.
This got the website back up and running today, and also for now we've shut off the system that tries to deal with configuration values. We're exploring new styles for this arrangement system following layout patterns of other systems at Facebook that deal even more gracefully with responses loopholes and transient spikes.
We apologize again for the website blackout, and also we want you to understand that we take the performance and also dependability of Facebook very seriously.