Last month’s Azure downtime in Europe was caused by a faulty fire alarm
Last month, some Azure users in Northern Europe were experiencing issues with connecting and managing resources hosted in the region. Noted by Fortune.com, it turns out that the issue was linked to a faulty fire alarm in a data center.
Microsoft explains the root cause of the problem on a post to the Azure status history page, saying, “during a routine periodic fire suppression system maintenance, an unexpected release of inert fire suppression agent occurred.”
Of course, once the fire suppression agent was released, the systems in the data center did what they were designed to do. A shutdown of Air Handler Units was initiated in order to contain and keep the data center safe, also forcing an eventual restart of the system.
Turns out during the restart process, ambient temperatures went up too high, forcing an auto shutdown or restarts of systems in the zone to prevent a bigger meltdown and center-wide overheating. Microsoft explains:
While conditions in the data center were being reaffirmed and AHUs were being restarted, the ambient temperature in isolated areas of the impacted suppression zone rose above normal operational parameters. Some systems in the impacted zone performed auto shutdowns or reboots triggered by internal thermal health monitoring to prevent overheating of those systems.
Microsoft immediately knew about the situation, and in 35 minutes, everything was back to normal, although it took as many as seven hours for some customers to come fully back online. The company said that facility power was not impacted, but additional time was required to troubleshoot and recover due to some systems in the zone not shutting down in a controlled matter.Further reading: Azure, Data Center, Microsoft