In the previous blog, we discussed the characteristics of Data Products and how to convert our existing inventory of Data Pipelines (ETL/ELT) and Data Sets into Data Products. In this blog, we will focus on how to make our data platform, a Data Mesh Ready data platform.
Here are the top 5 features of a Data Mesh ready platform.
Getting a Home Address for each data domain
Data Products are organized in different data domains. Hence it is imperative that each data domain get their own home address. The home is where the data products are built. It is equipped with data ingestion, transformation, quality measurement, modeling, and data storage tools.
Each Home gets its own isolated workspace
While every individual home is an isolated workspace, it’s a part of global organization level deployment of tools. Effectively the platform needs to define the multitenancy strategy for each tool being used in the platform.
In the architecture below –
- Multiple data domains such as marketing and revenue growth are using shared infrastructure like snowflake, and DBT. The platform implements the multi-tenancy policy for them so that each domain remains in its own workspace
- On the other hand, the platform creates multiple instances of Data Quality and ELT services
Building Organizational Data Marketplace
Fundamentally, Data Products are data assets which are built for others. A successful Data Product is one which is being used for several other pipelines and analytics. Hence data products need to be discoverable. Data Marketplace, where all Data Products are listed is an essential feature of Data Mesh.
Further, the marketplace should have all the essential features that you expect from a typical e-commerce site.
- Organization of Data Products per domain
- Easy search on product using its metadata
- Statistical tracking of the usage of products (views, likes, comments, subscribe and such)
Integration and Shipping of Data Products
The data marketplace brings users or consumers closer to data products. But for effective consumption, the platform needs to provide two main capabilities.
- Tight integration between pipelines that build the data product and its entry in the marketplace. Operational metadata of the data product, like last refresh time, data quality score, current size, should be visible to the user in the marketplace.
- Shipping of data products to the user (consumer’s) home – the user is going to use the data product in her data domain infrastructure. The data products that she wants to consume should be accessible from her data domain. The platform should support data product access patters like one time download or its subscription. Data products should also be allowed to be delivered to user’s preferred analytics tool – could be BI tool like powerBI, or database like Snowflake or simply Microsoft Excel.
Infrastructure management or cloud ops is assumed to be platform responsibility. On top of that, the platform should make it easy for Data Product owners to manage the operations they are responsible for:
- Data Quality Management – the platform should provide tools to define data quality, measure it periodically, and allow product owners and consumers to visualize it
- Data Freshness Management – the platform should track the lineage and make sure that the data is refreshed at the promised frequency. Any anomalies should be highlighted
- Schema Drifts and Versioning – the data product schema may need changes with time. The platform should provide support to create versions of the data product so that the consumers are protected from schema drift
Since the platform is decentralized, every data domain needs to understand their infrastructure and operations cost. The platform needs to provide the reports and actionable insights on the usage (and corresponding costs) of various tools per data domain.
The introduction of Data Products concept helps in democratizing data and enables citizen data scientists take data driven decisions. However, to operationalize data products, strong support from underlying data platform is needed.
In this blog, we talked about the features that need to be added in your data platform for making it data mesh ready. In the next blog, let’s explore how Persistent Data Foundry can turn your existing Data Platform to a Data Mesh Platform.