Microsoft has announced that its Office 365 and Sharepoint Online platforms are back to normal after the outage that occurred on April 1. The company has blamed a spike in Domain Name System (DNS) traffic for the downtime. The disruption also caused its Bing, Azure, and Xbox Live services to be inaccessible to some users.
According to the latest update, a spike in DNS requests targeting its domains on Azure caused the problem. While the server setup was supposed to mitigate the hiccup through a cache refresh sequence, it failed in this instance, subsequently exposing a coding error. The problem was resolved by 22:00 UTC after the code defect was found and corrected. An update of the traffic detection system was also done to prevent future occurrences.
What is DNS?
DNS is a hierarchical naming system that applies to all computing devices connected to the web. The primary task of DNS is to convert domain names into their respective IP addresses in a way that internet devices can interpret. While human beings rely on simple domain names (like OnMSFT.com, for example) to access web resources, internet devices interpret them as numbers, and the DNS undertakes the translation. In a nutshell, a DNS eliminates the need to input typically hard-to-memorize IP addresses when searching for a specific site.
DNS servers, on the other hand, store DNS information, including domain names, and are prone to attacks.
How Substantial DNS Traffic Can Cause an Outage
As in Microsoft’s case, a huge volume of DNS traffic is said to have momentarily overwhelmed its DNS infrastructure. The following is an outline of a few scenarios that could lead to a DNS system being negatively impacted by traffic.
A DNS NXDOMAIN Attack
In a DNS NXDOMAIN attack, a malicious actor inundates the Domain Name System (DNS) server with persistent rapid requests for non-existent domain records. Repeated unresolved DNS server requests take up precious system resources, eventually filling up the cache with undefinable results. This slows down the response-time for legitimate requests and can limit access to a site.
A Phantom Domain Attack
A phantom domain attack is a type of Denial-of-Service (DoS) attack usually targeting an authoritative nameserver. It takes advantage of the DNS IP search mechanism to interrupt a service. Usually, when a DNS server is unable to resolve an IP address in its records, it will attempt to search for it in other DNS servers on its network – a process referred to as recursive DNS. A high volume of such requests overwhelms a server’s lookup processes, ultimately causing performance issues.
How Traffic-based DNS Attacks Are Thwarted
One way of mitigating traffic-based DNS attacks is using a blackhole routing strategy. In this countermeasure, traffic is redirected to a ‘black hole’ and basically vanquished. Parameters can be set to distinguish between legitimate and malicious traffic for redirection. Microsoft has confirmed using this technique to allow recovery of its infrastructure. In some cases, blocking a client IP address generating a high amount of invalid NXRRset, NXDOMAIN, or SRVFAIL requests also works. The other options are lowering the timeout period for name lookups and limiting traffic.
Last but not least, a regular cache refresh will help to keep things running smoothly. It is important to note that the Microsoft DNS problem was not caused by an attack, but a traffic surge that caused unexpected caching problems.