Earlier this week, Microsoft online services were hit by another global service outage which affected many consumer offerings such as Outlook.com, OneDrive, Skype and Xbox Live. All affected services were inaccessible for more than hour, and we learned yesterday that Microsoft technicians were hard at work to find the cause of the massive sign-in issues.
As it turns out, it seems that the outage could be the result of human error, more precisely a bad certificate causing the MSA infrastructure to crash. Microsoft reporter Brad Sams explained today in his weekly podcast that an insider gave him the following explanation (it starts at 18’26):
An insider at the company actually told me what the root cause was, at least initially, now granted when these things happen sometimes you have a failure at point A, and that failure replicates across, and that causes more failures in different areas and creates a snowball rolling down a hill effect, but I believe the initial thing that kicked this off is that someone pushed out a bad authentication certificate to the MSA infrastructure or whatever it’s called, and that’s what really screwed this up…
Microsoft is not the only one big tech company to be affected by online outages, but this was actually the second time in a month that its services went down for an hour or more. These kind of events always make headlines, but the company usually has a pretty good track record at explaining what went wrong. We’ll keep you informed if Microsoft shares an official explanation for the sign in issues.