Microsoft Research uses Azure Log Analytics to manage deep neural networks

Kareem Anderson

data science virutal machine azure

Looking for more info on AI, Bing Chat, Chat GPT, or Microsoft's Copilots? Check out our AI / Copilot page for the latest builds from all the channels, information on the program, links, and more!

Microsoft’s Research division typically aims to solve far-off problems that include smart roads that help guide driverless cars, cures for cancer using machine learning, or improving DNA data storage, to list a few. At the center of Microsoft’s Research work sits Machine Learning and Deep Neural Networks (DNN) that help generate next generation server infrastructures that support Windows and Linux cluster environments for large scale data processing of test results and sensory content.

Microsoft service engineer Kris Zenter explains in more detail how the research team makes use of Azure Log Analytics, Linux Server System Resource monitoring, and custom Python applications in attempting to solve and tackle cutting-edge problems.

Azure Log Analytics
Azure Log Analytics

Enter Azure Log Analytics

Azure Log Analytics, a component of Microsoft Operations Management Suite, natively supports log search through billions of records, real-time metric collection, and rich custom visualizations across numerous sources. These out of the box features paired with the flexibility of available data sources made Log Analytics a great option to produce visibility & insights by correlating across DNN clusters & components.

Linux Server System Resource Monitoring

Deep Neural Networks traditionally run on Linux, and Log Analytics supports major Linux distributions as first class citizens. The OMS Agent for Linux was also recently made generally available, built on the open source log collector FluentD.  By leveraging the Linux agent, we were able to easily collect system metrics at 10-second interval and all of our Linux logs without any customization effort.

NVIDIA GPU Information

The Log Analytics platform is also extremely flexible, allowing users to send data via a recently released HTTP POST API. We were able to write a custom Python application to retrieve data from their NVIDIA GPUs and unlock the ability to alert based off of metrics such as GPU Temperature. Additionally, these metrics can be visualized with Custom Views to create rich performance graphs for the team to further monitor.

For more on Azure Log Analytics, the projects Microsoft’s Research division are currently tackling, or on the application of machine learning in science, visit Microsoft’s Azure blog here. Anyone attempting to take advantage of Azure Log Analytics can also get a walkthrough with Python codes at the MSOMS blog.