A New Kind of Commute – Data Architecture in the Cloud
Before I get into the technical aspects of this post, a quick observation about life in Pune. I grew up there and I love it. Not that long ago, the city was relatively small (by India standards) and everyone wanted to work in the exciting heart of it; downtown was where all the action was. When Persistent Systems built our new headquarters complex, we built it in the heart of the city and it was incredibly popular with employees. But then, like in so many other places in the world, people started moving to the suburbs where they could get nicer, bigger, newer homes at more affordable prices. As this happened, the allure of working downtown, and the commute to get there, sort of lost its appeal. So, Persistent built new offices in the suburb of Hinjewadi, for among other reasons, to be where the people were and to make things more efficient for everyone.
OK, I’m sure you’re thinking, that’s great, Avadhoot, but what does that have to do with data architecture in the cloud? Well, a few years back, when the sources of the data were mostly on premise, so too the data architecture was on premise. But the times are changing and we are seeing a shift as, for good reasons, applications are now being increasingly deployed in the cloud, along with the data they manage and their logs (one of the primary types of big data). This means that an ever increasing amount of organizations’ data actually resides in cloud-based PaaS platforms like Salesforce and ServiceNow. Add to this that nowadays, the data you want to consider for your analytics includes ‘public’, ‘social’ and ‘acquired’ data, which reside in the cloud.
What it means essentially is that the center of gravity of the data is no longer centralized: it has become “suburban”.
We are seeing more and more requests to adapt Data Lakes, Data Warehouses, ETL and BI to the cloud. By the way, you can read our five-part series on Data Lakes here. It’s stating the obvious that companies want to move to the cloud for their data architecture. The advantages are imminent – for example:
- Lower and flexible TCO – Pay as you go model
- Managed infrastructure – Pay on demand for IT capabilities
- Easy handling of fluctuating bandwidth and concurrency demands
- Better disaster recovery
- Better team collaboration through cloud-based collaboration tools
From the technology perspective, the space is already looking very crowded. Look at this quick snapshot –
So does moving to the cloud mean you’re only running your on premise ETL, DWH and BI on the cloud? Not at all. In fact, the cloud versions of the tools are sometimes very different and written specifically for the cloud environment, to meet the aforementioned requirements while managing an ever-increasing amount of data. So there is going to be a huge difference between your on-premise solutions and on-cloud solutions – some examples below:
- Storage-Compute separation: In the cloud world – storage is cheap while compute can be acquired quickly. Most of the cloud ‘first’ DWH systems store data on S3/blob storage and get the compute nodes on need basis. This significantly reduces your cloud service bill (Snowflake, Azure SQL, Quoble, EMR). However, the data flows need to be designed well to take advantage of these tools
- Cloud ETL tools are available ‘as a service’. You no longer need to deploy them on VMs and do the operations management to move data to the cloud.
- The ‘code as a service’ offerings (like Lambda in Amazon) can be leveraged very effectively to perform your cleansing/transformation etc tasks.
- The cheap storage (S3/blob storage) can be shared between DWH and Data Lake
At Persistent Systems, we believe that a data architecture in the cloud is an imperative for enterprises to transform themselves. The data architecture is part of your organization’s cloud-based digital transformation platform where your existing and other cloud-ready data assets can be leveraged by software built and deployed at speed. The digital transformation platform is part of a larger concept we call Software 4.0. Software 4.0 is a completely different mindset on how to approach a software-driven future. And it just happens that future is now and we’re helping you realize it!
Over 2,000 years ago, the Greek philosopher Heraclitus said “Change is the only constant in life.” And that’s never been truer than in the 21st century. Just as my hometown of Pune has changed dramatically in the past 20 years, I have a feeling 20 years from now it will have changed even more. Going back to how I started this post, in the United States 40 years ago there was an exodus from the cities to the suburbs; well, now people are returning to the cities in huge numbers! I bring that up as something for you to think about as you plan your strategy; the key is embracing the change and just knowing there will be more change. In the software world, we’re focusing on continuous innovation and technology advancements that are making a lot of that change, like moving to the cloud easier and faster.
I’d love to hear what you think? Let’s keep this conversation going and in the coming weeks and months, you’re sure to be hearing from me, and from my colleagues about the work we’re doing here at Persistent.