Azure's big data services expand, now include HDInsight on Linux

Kellogg Brengel

Microsoft’s Cloud and Enterprise group recently announced the general availability of HDInsight on Linux as well as two new Azure services: Azure Data Lake and Azure Data Lake Analytics. The new availability of HDInsight on Linux gives organizations the flexibility to run and manage Hadoop clusters in the cloud on Ubuntu based Linux Operating Systems through HDInsight.
HDInsight is a powerful tool developed by Microsoft in partnership with Hadoop to deploy and manage Apache Hadoop clusters. For quick explanation, clusters are groups of servers that are managing large sets of data. And Hadoop is a hugely popular free, Java-based programming framework used for processing large data sets in a cluster. HDInsight is Microsoft’s service for managing, analyzing and reporting on big data in these clusters. But previously HDInsight was only available for clusters configured to run with Windows Server Operating System.
That is until now. With the general availability of HDInsight on Linux, organizations can use HDInsight to deploy, manage, and analyze their clusters of big data regardless of whether the servers housing that data are running Windows or Linux. With HDInsight on Linux, there is now even broader support for Hadoop ecosystem partners to run within HDInsight. This enables information officers even greater flexibility and choice in what tools they want to use alongside HDInsight.
HDInsight on Linux
In Microsoft’s announcement, it was also stated that this expanded availability will also support additional capabilities such as: cluster scaling, virtual network integration, and script action support. The announcement further stated this availability allows users to run other cluster types besides Hadoop, such as HBase and Storm clusters on Linux, which are important for other needs such as IoT applications that demand real time processing and NoSQL.
In addition to HDInsight, Microsoft also released two brand new services, Azure Data Lake Store and Azure Data Lake Analytics.
Microsoft describes Azure Data Lake Store as a:

…a hyper-scale HDFS repository designed specifically for big data analytics workloads in the cloud. Azure Data Lake Store solves the big data challenges of volume, variety, and velocity by enabling you to store data of any type, at any size, and process it at any scale.

With this new service, organizations can store any type of data, regardless of size, in the cloud for processing and mining with Azure HDInsight, or any other Hadoop-based search engine for that matter. While just in private preview today, but soon to be broadly available, this option could potentially give organizations an added flexibility for storing and processing data in the cloud without running their own server clusters.
The other brand new service offered by Microsoft is Azure Data Lake Analytics. As the name implies, it is a service to apply analytics jobs to big data stored in the Azure Data Lake. Interestingly, it runs on an evolution of the SQL syntax, U-SQL, allowing users to write declarative big data jobs. The added convenience of this is that it provides easy management of the data from the Azure management portal.
The addition of these new services further illustrate how Microsoft is building bridges to deliver their take on best in class services in other programming frameworks and even operating systems. This cross platform provision of services will enable organizations to do more with their data in a flexible and adaptable way.