Developers using Microsoft’s Visual Studio suite of tools may have noticed a brief interruption in services between 9:10 and 14:28 UTC yesterday, February 4, 2016. A few developers took to social media to air their grievances about their inability to log into their Team Services accounts.
To Microsoft’s credit, the Visual Studio Team was quick to acknowledge the service interruption, posting on Twitter, a dedicated Reddit page, and its own service a blog post alerting customers to the outage. However, at the time, the Visual Studio team had no concrete estimated time of arrival (ETA) for a fix or a solid answer as to why the customers couldn’t access their Team Services accounts between those hours.
@SQLbyoBI Hi Bill, we apologize for the inconvenience, it has been worked by our engineers and it should be working now.
— Visual Studio (@VisualStudio) February 4, 2016
With the issue eventually being resolved, the Visual Studio team is now ready to clarify what exactly caused the outage and what they did to help bring service back up for customers.
PRELIMINARY ROOT CAUSE: A SQL stored procedure that was being called was allocating too much memory in one of the critical backend SQL databases. After an extended period of time, this caused the SQL databases to fall into an unresponsive state and resulted in customers being unable to access their VSTS accounts.
MITIGATION: Engineers attempted to failover the SQL database which allowed for temporary mitigation, however, the same procedure was quickly allocating memory to the newly assigned databases, which in turn became unresponsive. Engineers manually assigned allocation limits for the procedure that was being called. This has ensured the backend SQL databases remain in a healthy state.
Obviously, the Visual Studio team is apologetic about, and perhaps embarrassed by, the outage. Fortunately, the team was able to get things up and running again relatively quickly.