What's Wrong with Facebook 2019
By
Ega Wahyudi
—
Friday, May 22, 2020
—
What's Wrong With Facebook
What's Wrong With Facebook
The key defect that created this blackout to be so serious was an unfavorable handling of a mistake condition. A computerized system for confirming configuration values wound up creating much more damages than it repaired.
The intent of the automated system is to look for arrangement values that are void in the cache as well as change them with updated worths from the persistent shop. This works well for a short-term problem with the cache, but it does not work when the consistent store is invalid.
Today we made an adjustment to the persistent duplicate of a setup value that was taken void. This suggested that every single customer saw the invalid worth and tried to repair it. Because the repair involves making a question to a cluster of data sources, that collection was rapidly bewildered by numerous countless inquiries a second.
To make issues worse, every single time a customer got an error attempting to query one of the databases it translated it as an invalid worth, as well as deleted the matching cache trick. This suggested that even after the initial trouble had actually been fixed, the stream of queries proceeded. As long as the databases failed to service a few of the demands, they were causing a lot more requests to themselves. We had gone into a comments loop that really did not permit the databases to recover.
The way to quit the feedback cycle was quite uncomfortable - we needed to stop all web traffic to this data source cluster, which implied switching off the site. When the databases had actually recouped and the source had been fixed, we slowly permitted even more people back onto the site.
This got the site back up and running today, as well as for now we've switched off the system that tries to correct setup worths. We're discovering new designs for this arrangement system adhering to layout patterns of various other systems at Facebook that deal more with dignity with comments loopholes as well as transient spikes.
We ask forgiveness once more for the site outage, as well as we desire you to recognize that we take the efficiency and integrity of Facebook very seriously.