Last night, a planned update to the Microsoft Wide Area Network (WAN) affected Microsoft, Azure, Teams, and Outlook customers worldwide. At about 11pm Pacific Time last night (between 7:05 UTC and 09:45 UTC, or 11:05pm and 1:45am PST, according to Azure Service Health) users around the globe “experienced issues with networking connectivity, manifesting as network latency and/or timeouts when attempting to connect to Azure resources in Public Azure regions, as well as other Microsoft services including M365 and PowerBI.”
Microsoft’s Azure Support Twitter account tweeted about the outages:
🛠️Engineers have confirmed the issue impacting connectivity to Azure resources has been mitigated. A detailed resolution statement can be found in the Status History at https://t.co/cMAHQp3Lj7
— Azure Support (@AzureSupport) January 25, 2023
Unfortunately for OnMSFT.com, our own little set of issues left us down for a bit longer than that. A PHP configuration change, made a week or so ago but mistakenly not implemented with a VM reboot, became active when apparently our server was rebooted as (probably) part of the mitigation attempts by Azure. Instead of coming back online, because the VM did restart but PHP did not, OnMSFT.com became unavailable via a 502 error.
When we noticed the issues first thing this morning, we rebooted the server to no avail, and then restored our daily backup via Azure Backups to a new VM, from 48 hours ago, to a time before the service outage began. However, since the PHP change was made over a week ago, this backup VM had the same problem.
This led us a bit astray in narrowing down the problem, especially as multiple users were still reporting Azure issues even though Microsoft had reported them fixed, and as our backup server, from a known good time, was also affected. Of course, the backup server rebooted to apply the rogue PHP config, too.
As usual, the issues were finally spotted by digging through log files, and once the config file was restored, we were back up and running.
One note on using Azure: we made the move to an Azure VM just a bit over two years ago, and this is the first significant outage in all that time (even though a good portion of the outage was our own fault). We’ve had zero problems (until now) using Azure, and find the Azure Portal a clean, simple, and very useful entry into our Ubuntu VM. Azure backup and restore (to a new VM) is great, it usually takes less than 5 minutes to get a full working clone.
Anyway, glad to be back, and thanks for reading OnMSFT.com!
(Featured image via Smithsonian Open Access)