Confused about Big Data? Persistent has the definition. Introducing DataStack 3.0
Big Data has become a popular term in the industry and while enterprises are still busy figuring out the Big Data use cases to derive competitive advantage, there seems to be lot of confusion on what Big Data is and how Enterprises can leverage Big Data within their organization. In this post we talk about different DataStacks in organization while presenting a database architecture centric angle of Big Data and explain how can enterprises take advantage of Big Data in context of their current data infrastructure.
Initially the data was application driven and largely structured with emphasis on storage of data. Relational Databases were used as operational data stores in order to capture and store the transactional data and they proved to be an efficient way of managing huge amounts of data in GBs (with respect to those times). We characterize this as DataStack 1.0.
With further explosion of data, a new set of applications, which looked at aggregated data, came into the fold. Organizations realized that data could be used as one of the tools to derive competitive advantage. OLTP (OnLine Transaction Processing) systems were attached with data warehouses for analysis of enterprise data. This allowed the enterprises to aggregate data, form dimensional views and conduct analytics for supporting decisions. This is what we call DataStack 2.0.
In Datastacks 1.0 and 2.0 only structured data with fixed schema design can be analyzed. However in today’s time 80% of new information growth is unstructured content and enterprises house Petabytes of this unstructured data be it data about customer profiles, buying patterns or even logs generated by devices. Hence in order to gain that competitive edge, enterprises need a new data stack that can analyze Petabytes of unstructured data. We have named this new data stack as Data stack 3.0 which gives businesses the ability to leverage the data generation and glean intelligent insights.
Datastack 3.0 stands for a new set of technologies which allows the analysis of huge amounts of unstructured data, through Map-Reduce programming based on NoSQL databases. These technologies make use of massively parallel processing paradigm and can provide enterprises with that analytical edge required to build a sustainable competitive advantage. DataStack 3.0 stores historical information for analysis like DataStack 2.0 and is capable of handling data sources not covered by DataStack 2.0.
As businesses explore Big Data, they have to take into cognizance that DataStack 3.0 is confluence of new Big Data technologies such as Cassandra, Hbase, MapReduce with existing DataStack 2.0. DataStack 3.0 should integrate with existing DataStack 2.0 and should ideally act as a complimentary extension to the existing data infrastructure
Persistent Systems’ latest whitepaper Big Data Defined. Introduction to DataStack 3.0 defines DataStack 3.0 in the context of historical trends in data and an enterprise’s existing data infrastructure. It helps enterprises think about how Big Data can fit into their organization’s current data infrastructure. The paper aims to help CIOs, MIS managers and functional heads take advantage of Big Data within their organization.
To get an overview of the whitepaper, watch Dr. Anand Deshpande,Chairman, MD and CEO of Persistent Systems talk about Big Data and DataStack 3.0