In our second Big Data technology guest blog post, we are thrilled to have Ashvini Sharma, a Group Program Manager in the SQL Server Business Intelligence group at Microsoft. Ashvini discusses how organizations can provide insights for everyone from any data, any size, anywhere by using Microsoft’s familiar BI stack with Hadoop.
Whether through blogs, twitter, or technical articles, you’ve probably heard about Big Data, and a recognition that organizations need to look beyond the traditional databases to achieve the most cost effective storage and processing of extremely large data sets, unstructured data, and/or data that comes in too fast. As the prevalence and importance of such data increases, many organizations are looking at how to leverage technologies such as those in the Apache Hadoop ecosystem. Recognizing one size doesn’t fit all, we began detailing our approach to Big Data at the PASS Summit last October. Microsoft’s goal for Big Data is to provide insights to all users from structured or unstructured data of any size. While very scalable, accommodating, and powerful, most Big Data solutions based on Hadoop require highly trained staff to deploy and manage. In addition, the benefits are limited to few highly technical users who are as comfortable programming their requirements as they are using advanced statistical techniques to extract value. For those of us who have been around the BI industry for a few years, this may sound similar to the early 90s where the benefits of our field were limited to a few within the corporation through the Executive Information Systems.
Analysis on Hadoop for Everyone
Microsoft entered the Business Intelligence industry to enable orders of magnitude more users to make better decisions from applications they use every day. This was the motivation behind being the first DBMS vendor to include an OLAP engine with the release of SQL Server 7.0 OLAP Services that enabled Excel users to ask business questions at the speed of thought. It remained the motivation behind PowerPivot in SQL Server 2008 R2, a self-service BI offering that allowed end users to build their own solutions without dependence on IT, as well as provided IT insights on how data was being consumed within the organization. And, with the release of Power View in SQL Server 2012, that goal will bring the power of rich interactive exploration directly in the hands of every user within an organization.
Enabling end users to merge data stored in a Hadoop deployment with data from other systems or with their own personal data is a natural next step. In fact, we also introduced Hive ODBC driver, currently in Community Technology Preview, at the PASS Summit in October. This driver allows connectivity to Apache Hive, which in turn facilitates querying and managing large datasets residing in distributed storage by exposing them as a data warehouse.
This connector brings the benefit of the entire Microsoft BI stack and ecosystem on Hive. A few examples include:
- Bring Hive data directly to Excel through the Microsoft Hive Add-in for Excel
- Build a PowerPivot workbook using data in Hive
- Build Power View reports on top of Hive
- Instead of manually refreshing a PowerPivot workbook based on Hive on their desktop, users can use PowerPivot for SharePoint to schedule a data refresh feature to refresh a central copy shared with others, without worrying about the time or resources it takes.
- BI Professionals can build BI Semantic Model or Reporting Services Reports on Hive in SQL Server Data tools
- Of course all of the 3rd party client applications built on the Microsoft BI stack can now access Hive data as well!
Klout is a great customer that’s leveraging the Microsoft BI stack on Big Data to provide mission critical analysis for both internal users as well as to its customers. In fact, Dave Mariani, the VP of Engineering at Klout has taken some time out to describe how they use our technology. This is recommended viewing not just to see examples of applications possible but also to get a better understanding of how new options complement technology you are already familiar with. Dave also blogged about their approach here.
Best of both worlds
As we mentioned in the beginning of this blog article, one size doesn’t fit all, and it’s important to recognize the inherent strengths of options available to choose when to use what. Hadoop broadly provides:
- an inexpensive and highly scalable store for data in any shape,
- a robust execution infrastructure for data cleansing, shaping and analytical operations typically in a batch mode, and
- a growing ecosystem that provides highly skilled users many options to process data.
The Microsoft BI stack is targeted at significantly larger user population and provides:
- functionality in tools such as Excel and SharePoint that users are already familiar with,
- interactive queries at the speed of thought,
- business layer that allows users to understand the data, combine it with other sources, and express business logic in more accessible ways, and
- mechanisms to publish results for others to consume and build on themselves.
Successful projects may use both of these technologies in complementary manner, like Klout does. Enabling this choice has been the primary motivator for providing Hive ODBC connectivity, as well as investing in providing Hadoop-based distribution for Windows Server and Windows Azure.
More Information
This is an exciting field, and we’re thrilled to be a top-tier Elite sponsor of the upcoming Strata Conference between February 28th and March 1st 2012 in Santa Clara, California. If you’re attending the conference, you can find more information about the sessions here. We also look forward to meeting you at our booth to understand your needs.
Following that, on March 7th, we will be hosting an online event that will allow you to immerse yourself in the exciting New World of Data with SQL Server 2012. More details are here.
For more information on Microsoft’s Big Data offering, please visit http://www.microsoft.com/bigdata.