Microsoft defines “big data,” and how it fits in with Excel

Many business teams have defined the term “big data” very differently amongst themselves. However, Microsoft has shared a new blog post explaining exactly what the Excel Team things of the term and how it is best defined within their software.

“Big Data” is defined by Microsoft as including high volume, high velocity, and high variety. With fast information that can collect large amounts of items and dimensions that intends to evolve over time based on the needs of the team evaluating it. “Big data” is expected to meet specific requirement:

  • Cost-effective processing—As we mentioned, many of the vendors claim they’ve been doing big data for decades. Technically this is accurate, however, many of these solutions rely on expensive scale-up machines with custom hardware and SAN storages underneath to get enough horsepower. The most promising aspect of big data is the innovation that allows a choice to trade off some aspects of a solution to gain unprecedented lower cost of building and deploying solutions.
  • Innovative types of analysis—Doing the same old analysis on more data is generally a good sign you’re doing scale-up and not big data.
  • Novel business value—Between this principle and the previous one, if a data set doesn’t really change how you do analysis or what you do with your analytic result, then it’s likely not big data.

With Microsoft Excel, the program attempts to better correspond with big data by providing processing tools that can make approaching it easier for users. Most commonly is the need to utilize exploratory/ad-hoc analysis to search for new information through the stores of high-velocity volumes. Excel can organize more predictive and prescriptive experiences for unstructured data through the tools provided, much like organizing a mess of social media into understandable information. One example of this was when Yom-Tov crowdsourced media and searches to better evaluate flu-related symptoms and spread.

It’s easy to implement external data to Excel through three different methods:

Import Data to Excel – Through the use of Power Query, Excel is able to retrieve sets of connectors for relational, DHFS, SaaS, and more. Power Query provides the means to transform large amounts of data simply by bringing it into a clean and organized set. Even when the data is unstructured across several files in a folder, Power Query is able to deal with the complex, nested, and hierarchical data by extracting the structure from JSON-formats. While Power Query was only built into Excel 2016 with continued additions to the feature, there is an available download for applying it to earlier versions available on the website.

Live Query of an External Source – When accessing a large amount of data that is too much to import by other means, customers are able to send a query to the source for the information . With the use of OLAP PivotTables and PivotCharts, teams are able to access standalone databases for slices of data with quick management.

Export from an Application to Excel – A rising practice for many applications, the ability to export a static list of data to Excel is perfect for data that doesn’t change often or is stagnant. Excel has provided APIs for developers to implement this feature for customers and plan to continue supporting it for future integrations with Excel Online.

Depending on the big data needed for your business team, it’s possible to utilize a little of each of these available features to continuously provide updates to your constantly evolving collections.


Share This
Further reading: , ,

What do you think about the Excel Team's definition of big data?