Facebook You Re Doing It Wrong 2019
By
Ega Wahyudi
—
Thursday, December 12, 2019
—
What's Wrong With Facebook
Facebook You Re Doing It Wrong
The key flaw that caused this outage to be so severe was a regrettable handling of an error problem. An automatic system for verifying setup values ended up triggering much more damages than it dealt with.
The intent of the computerized system is to check for configuration worths that are void in the cache and replace them with updated values from the persistent store. This works well for a short-term issue with the cache, however it does not work when the consistent store is void.
Today we made a modification to the relentless duplicate of an arrangement value that was interpreted as void. This implied that every single customer saw the invalid worth and also attempted to repair it. Due to the fact that the solution includes making an inquiry to a collection of databases, that collection was rapidly bewildered by thousands of thousands of queries a 2nd.
To make matters worse, each time a customer got a mistake attempting to query among the data sources it interpreted it as an invalid value, and erased the matching cache secret. This suggested that also after the original problem had been repaired, the stream of questions continued. As long as the databases failed to service a few of the demands, they were creating even more demands to themselves. We had actually gone into a feedback loophole that really did not permit the databases to recuperate.
The means to stop the comments cycle was quite painful - we had to quit all web traffic to this data source collection, which implied switching off the website. Once the databases had actually recovered and the root cause had been fixed, we slowly permitted even more people back onto the website.
This got the website back up and also running today, as well as for now we've shut off the system that tries to correct arrangement values. We're discovering new styles for this configuration system complying with style patterns of other systems at Facebook that deal more gracefully with responses loopholes as well as transient spikes.
We ask forgiveness again for the site blackout, as well as we want you to know that we take the efficiency and dependability of Facebook very seriously.