Recent outage blamed on caching service failure, Microsoft offers apology


During this week, suffered from an outage causing users to not be able to read their emails nor could they retrieve their files stored on Microsoft’s SkyDrive service. Today, Microsoft has offered an apology and an explanation on why this outage happened.

According to Microsoft, the outage was a result of a caching service that failed. This caching service interfaces with devices using Exchange ActiveSync and the caching service failure caused these devices to receive an error, ultimately causing the service to be hammered with continuous connection attempts. There was a flood of traffic from users but the services could not handle the traffic properly, causing some users to be unable to access their email or share files to SkyDrive.

“In order to stabilize the overall email service, we temporarily blocked access via Exchange ActiveSync. This allowed us to restore access to via the web and restore the sharing features of SkyDrive. These parts of the service were fully stabilized within a few hours of the initial incident. A significant backlog of Exchange ActiveSync requests accumulated as we worked to stabilize access. To avoid another flood of traffic, we needed to restore access to Exchange ActiveSync slowly, which meant that some customers remained impacted for a longer period of time,” Microsoft explained.

Microsoft apologized about the outage and stated that they have learned two new lessons to prevent this from happening again. Microsoft will increase network bandwidth in the affected part of the system and also change the way error handling is done for devices using Exchange ActiveSync.

“We are now fully through the backlog and have restored service so all customers should have normal access from all of their devices. We want to apologize to everyone who was affected by the outage, and we appreciate the patience you have shown us as we worked through the issues,” Microsoft stated.

Thanks for the tip, Markus!