My reflections post the AWS re:Invent 2018 conference in Vegas had me convinced of the need to share a few key takeaways. This blog covers the relevant insights gathered from the conference and talks about why we need to sit up and take notice of them!
CEO of AWS, Andy Jassy’s inspiring Keynote reinforced the message of using the ‘right tool for the use case’. As any Pizza connoisseur will tell you, you simply cannot use pepper jack in lieu of mozzarella! But experience tells us that far too often, architects are forced to use a tool just because it is already in production for another use case and leadership wants to avoid management and maintenance overhead. We love this stand taken by AWS since we love the world of Open Source. The open source makes so many tools and technology options available that you can choose the right tool to solve your problem. And the fully managed offerings of these tools by AWS frees up the architect to focus on solving the business problem at hand rather than worrying about the availability of the right tools and technologies. It doesn’t get more developer focused than this!
I’m also happy to report that AWS continues its legacy of creating stacks meant for developers who like to write code. The several builder sessions and hands-on workshops held during the conference testified to this fact.
But Andy’s focus also expanded to include builders who need more consultative hand holding. This was clear indication of AWS validating Persistent’s focus on working with Enterprises who are into building their own solutions than simply integrating off the shelf products. We work with Startups, ISVs and Enterprises side by side and offer our consultative hand in the form of products like Accelerite ShareInsights and offerings like Data Lake Buildout, DWH Modernization and Machine Learning Workshops.
Another instance of AWS’s validation of Persistent’s thought process was them confirming the significance of repeatable solutions and patterns as an important aspect of the new paradigm of software development on cloud. Their new enhancements in Lambda called Lambda Layers stress on this. We have a similar ideology while working with our customers. When we build solutions for our customers, we observe a pattern in the way the application is built. Based on this experience, we abstracted these repeatable modules into the AWS Design Pattern library which we showcased in our booth.
From the technology perspective, I was quite impressed with some key announcements and new product features. Here’s a quick list,
- First up is Amazon Aurora. This is a fully managed, high performance and fault tolerant database service for MySQL and Postgre SQL which looks very promising. In my early career, I had worked on a project where we re-wrote the MySQL storage engine and query kernel. We were amazed by the MySQL adoption back then in the high-tech industry. Now that AWS is making the open source databases enterprise ready, I am sure people will start migrating from commercial databases to Aurora.
- When I attended a session on Amazon Neptune, a fully managed graph database, I noticed how AWS is intelligently re-using their fundamental building blocks. The key behind making a database durable is a highly reliable storage engine. They are using a common strategy (and I am sure common code base) behind Aurora, Neptune and other database services which makes a 6-way replica of your data in real time across 3 A-Zs. Also, the pattern of 15 read replicas is common across all the databases. This not only helps AWS but also helps its users gain expertise once and reuse it across all databases.
- I was also impressed with the enhancements in DMS (Data Migration Service) and SCT (Schema Conversion Tool). Persistent regularly works on database migration projects and such tools help speed up these migrations. But more than the migration, we leverage DMS in our Data Lake ingestion pipeline and support for newer source-databases is definitely handy.
- I am eager to play with Lake Formation, a new AWS service and see how it affects new Data Lake build-outs as well as existing data lake deployments on AWS.
- Timestream, a new database for time series data is another service I am excited about. We already leverage Apache Flink, Kinesis (and Kinesis Data Analytics) and Kafka with KSQL and it would be good to try out Timestream to understand when to use what. MSK, a fully managed Kafka service, is another exciting service that was announced in this ReInvent.
- You can now monetize your Alexa skills by offering in-skill purchases. That one was expected but is still impressive.
- Like everywhere else, AI and ML is top priority at AWS. SageMaker Ground Truth, SageMaker RL, new Personalization and Forecast services are all worth trying out.
- Textract seems nice, especially their Table Extraction and Form extraction features. We will soon start supporting Textract as one of the supported engines for our TEXT.AI offering.
Although Booth Duty can be boring, my time manning the Persistent Booth at re:Invent 2018 was fun, all thanks to several interesting conversations we had! I remember someone saying that AWS cloud is successful not only because they have a great technology but also because they have a very good partner eco-system.
- The newly launched SageMaker marketplace chose Persistent to showcase ML models. We were one of the few chosen, and we launched 10 models in the Healthcare, Banking, Astronomy and Manufacturing domains.
- I was also glad to see grand presence of our partners Snowflake. They are a DWH built out for cloud. We have acquired great amount of expertise in their technologies.
Before wrapping up, I’d love to share some learnings I gathered from attending the various builder sessions
- GraphQL seems to be becoming more popular than directly consuming Micro Services.
- Data Lakes are going to be the new DWH and DWH is going to be the new Data Marts – as mentioned by Anurag Gupta.
- S3 based Data Lakes are becoming more and more popular than HDFS. Yet another validation of Persistent’s POV!
- As Data grows faster than that its actual use, storage-compute separation is a must for Data Lake – says Arpan Shah, based on his experience at Robin Hood.
- Can Data Virtualization replace Data Lake? Anurag’s opinion is a resounding NO. While Data Virtualization can give the short-term benefit of having one mechanism to connect to all data sources, it can’t replace Data Lakes. This is primarily due to the fact that such OLTP sources tend to archive data very fast and if you don’t have Data Lakes, you could lose that data.
Finally, here is a quick summary of all our AWS offering in the Dataspace
- Acceleriate ShareInsights – a Self Service Data Analytics platform
- Our ML Models at SageMaker Marketplace
- Our Data Lake Buildout, DWH Modernization, Cloud Analytics, AI/ML offerings
- Our Snowflake Partnership and offerings
And last but not the least, the swag at re:Invent 2018 had an interesting trend – with socks being the most popular giveaway! I wonder why? Anyone up for a quick analysis of this trend?
Banner image credits – Huffingtonpost