page loader
 
HomePortfolio

Do You Have an Accurate Data Inventory?

Co-Authors:
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak
Maintaining an accurate inventory of data is crucial, especially in today's remote work and cloud-based application environment. Organizations today sit on stacks of data, both structured and unstructured, scattered across different locations within the company and in the cloud. Understanding and managing this data is crucial for efficient usage and safeguarding. Having a thorough data inventory is the first step in gaining an understanding of what data an organization owns, where it is located, and how it can be used.

The research firm Gartner predicts that 80% of customers currently do not have an accurate inventory of their data. This underscores the need for organizations to take their data seriously and treat it as a strategic asset.

In this blog, we will explore what data inventory is and how it can benefit an organization’s overall operations and growth.
What is Data Inventory?
A data inventory is not just a simple list of data assets that an organization maintains. It is a comprehensive and structured document that provides detailed information about each data source and how it is used within the organization. The data inventory includes metadata such as data ownership, format, location, access controls, data classification, and retention policies.

Data classification is a key component of a data inventory. It involves categorizing data according to its sensitivity, importance, and value to the organization. This enables the organization to determine the appropriate level of protection and access controls that should be applied to each type of data. For example, sensitive data such as financial information or personally identifiable information (PII) may require stronger security controls and stricter access restrictions than non-sensitive data.

In addition to the above, a data inventory should also include information about the relationships between different data sources, such as how data flows between different systems, and how it is transformed and processed. This is important for identifying dependencies and ensuring that data is being used appropriately across the organization.

Overall, a comprehensive data inventory is a valuable tool for managing data assets, improving data quality, and minimizing risks associated with data loss, privacy breaches, or non-compliance with regulations. It also helps organizations to make informed decisions about how to use data effectively and strategically to achieve their business objectives.
Why is Data Inventory Important?
Data has become an asset for organizations, with McKinsey research showing that enterprises that are “datafied” are 23 times more likely to acquire customers, 6 times as likely to retain customers, and 19 times more likely to be profitable (ref here). With the growing number of IT systems, companies may have a low level of awareness about where they house sensitive information. Compiling a data inventory is essential for comprehending the value and whereabouts of an organization's data resources and metadata, which can aid in decreasing risk and guaranteeing conformity with privacy and regulatory requirements.

Data inventory is an important aspect of an organization's data management that provides immediate visibility into all its data sources, the information they acquire, where the data is stored, and what happens to it in the end. In addition to the benefits mentioned earlier, a comprehensive data inventory also helps organizations comply with regulations such as GDPR and CCPA, which require them to know what personal data they hold and how it's being processed.

Data inventory can also help organizations manage risks associated with unauthorized access, data breaches, or data loss by identifying and mitigating potential risks. It is an essential part of data governance, which involves managing data to ensure its accuracy, completeness, consistency, and security. With a data inventory, organizations can ensure that their data is managed according to their data governance policies and standards.
What are the Benefits of Data Inventory?
A comprehensive data inventory can provide numerous benefits for organizations, including:
  • Revealing the data currently held, including hidden or obscure data. 
  • Determining the reliability of data sources. 
  • Identifying sensitive data subject to legal or administrative regulations. 
  • Locating valuable data that is underutilized or under monetized. 
  • Recognizing dangerous information is not proportional to the risk. 
  • Viewing information subject to additional restrictions like legal holds or investigations. 
  • Defining roles and duties to make wise business decisions about maximizing the value of data, reducing risks, and avoiding legal or regulatory issues. 
How to Create an Effective Data Inventory?
To create an effective data inventory, organizations should follow these steps:
https://modak.com/wp-content/uploads/2023/03/001.-Modak-Do-You-Have-an-Accurate-Data-Inventory-002-768x430.png
Key Takeaways
A thorough data inventory is a crucial resource for enterprises in the complicated and fast evolving data landscape of today. A complete inventory offers a single source of truth that enables organizations to identify sensitive information subject to rules, locate important but underutilized data, assign tasks, and optimize the value of the data while minimizing risks. Organizations can construct an effective data inventory and utilize data as a strategic asset by establishing a monitoring authority, carrying out routine updates, and employing data mapping. Organizations can be better prepared to make data-driven decisions, retain customers, attract new ones, and boost profitability if they have an accurate inventory of their data.
About Modak
Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively. We provide cloud-agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.

Modak’s portfolio of Data Engineering Studio provides best-in-class delivery services, managed data operations, data mesh, data fabric, augmented data preparation, data quality, and governed data lake solutions.

To learn more, please download: https://modak.com/modak-nabu-solution/

Co-Authors:
https://modak.com/wp-content/uploads/2023/03/MicrosoftTeams-image-3-160x160.jpg
Devesh Salvi
Product Analyst at Modak
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak
Co-Authors:
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak
Background
The US Center for Medicare and Medicaid Services (CMS) has taken a step forward in advancing the interoperability and authorization process for the US Healthcare industry by advocating the adoption of the United States Core Data for Interoperability (USCDI) standard. This standard provides a set of health data classes and data elements to be included in patient records for sharing within the health information exchange, allowing insurers and providers to share patient data throughout their healthcare journey. As a result, when a patient wants to compare health plans to switch from one insurer to another, the patient can easily review the options available to make an informed choice, assuming the patient has consented to data sharing.

Healthcare insurance companies, who are custodians of information for millions of Americans, are now required to meet the standards set out by CMS. In addition to this, CMS has also implemented price transparency, enabling consumers to compare insurer plans. The CMS directive allows customers to make informed decisions based on the plans offered. Failure to comply with the CMS guidelines comes with a significant penalty to the insurer on a per member per day basis.
Challenges
Within this context, a large US Healthcare Insurer set out on a path to extract and process data from disparate internal systems to create the standardized data sets in compliance with the USCDI standard across 25m+ members. The volume of data to be processed was significant, over 500 terabytes, representing approximately 500 billion rows of member records. Working with a leading system integrator the client adopted an incumbent software package to ingest data and use cloud provider big data services to profile and format the data into the common data format and meet the deadline set by the CMS.

However, the client faced massive last-minute issues with the project, incurring cloud processing costs in the hundreds of thousands for a few hours of processing time. And facing the possibility of not meeting the timeline set by the CMS and as a result, incurring penalties.
Solution
The client approached Modak on a Friday afternoon to review the approach taken by their strategic System Integrator (SI) and if Modak could provide a solution to (a) resolve the technical issues (b) reduce the cloud costs and (c) meet the timelines set by CMS.

Modak’s leadership and data engineering team spent the week reviewing the cloud services configuration and the code created by the SI. Within the week, the Modak team had re-written the code and demonstrated that the output met the USCIS standard specifications. Further, the cloud processing costs were reduced to a few thousand dollars.
Impact
The solution delivered by Modak helped the Healthcare Insurance provider achieve the following:
https://modak.com/wp-content/uploads/2023/02/001.-Modak-Building-interoperable-data-fabric-at-scale-1-768x430.png
  • Reduced cloud processing costs by 99%
  • Improved processing times by 90%
  • Successful deployment of the solution into production within 3 weeks
  • Client avoided US CMS penalty fees of millions of dollars and escalation of the issue to the Office of the CEO
About Modak
Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively. We provide cloud-agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.

Modak’s portfolio of Data Engineering Studio provides best-in-class delivery services, managed data operations, data mesh, data fabric, augmented data preparation, data quality, and governed data lake solutions.

To learn more, please download: https://modak.com/modak-nabu-solution/

Co-Authors:
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak
Co-Authors:
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak
https://modak.com/wp-content/uploads/2023/01/002.-Modak-Neo4j-001-1-e1674624243730.png

Data leaders are currently facing the challenge of not only managing large volumes of data, but also extracting meaningful insights from that data. In many cases, the connections and relationships between data points are more important than the data points themselves. To effectively analyze and understand complex datasets, organizations need to use graph database technology to capture those relationships.

Many organizations currently rely on Relational Database Management Systems (RDBMS) to store their structured data. However, the fixed and inflexible structure of RDBMS can make it difficult to capture and represent the complex relationships between data points. As a result, these systems are often inadequate.

Graph databases are designed to efficiently store and query connected data by using a node and relationship-based format, making them particularly equipped to solve problems when understanding those connections are critical.

One of the key advantages of graph databases is that they can mimic the way the human brain processes and understands associations. By representing data as nodes and relationships, graph databases provide a more intuitive and natural way of working with connected data.

However, before this data can be analyzed and queried, it often needs to be migrated and prepared for use with a graph database. This process, known as data orchestration, involves cleaning and organizing the data, as well as defining the relationships between different data points.

To fully leverage the power of graph analytics, organizations need to develop a robust data orchestration strategy that ensures their data is clean, organized, and ready to use. This can be a challenging task for many organizations, especially at a large scale.

The data orchestration process often involves a range of activities, such as discovering, ingesting, profiling, tagging, and transforming data. At a large scale, this journey can take months or even years to be completed.

To make the process more efficient, organizations need a modern data platform that can support their data preparation and orchestration efforts. By using graph database technology, organizations can ensure their data is ready for analysis and can be easily queried.

https://modak.com/wp-content/uploads/2023/01/002.-Modak-Neo4j-002-768x768.jpg
How Graph Analytics Simplifies Data Visualization
Graph analytics provide a visual representation of data and relationships between data elements. This visualization allows data scientists and analysts to quickly understand the structure and content of their data, and to identify patterns and trends that may not be immediately apparent from looking at raw datasets.

With graph analytics, data scientists and analysts can create visually appealing and intuitive data visualizations using graphs, charts, and maps. This helps effectively communicate and share insights with others and can facilitate collaboration and decision making within an organization.

In addition, graph analytics provide real-time insights into the performance and efficiency of data visualization, allowing the end user to identify and address potential issues before they impact the overall effectiveness of their research.

Ultimately, graph analytics is an invaluable tool for data analysis.
https://modak.com/wp-content/uploads/2023/01/002.-Modak-Neo4j-003.png
Modak + Neo4j: Data Orchestration and Graph Analytics
Modak Nabu™ is a modern data engineering platform that significantly speeds up data preparation and improves the performance of analytics. It achieves this by converging a range of data management and analytics capabilities, such as data ingestion, profiling, indexing, curation, and exploration.
https://modak.com/wp-content/uploads/2023/01/002.-Modak-Neo4j-004.png

Neo4j is a leading graph data platform for building intelligent applications. It is the only enterprise-grade graph database that offers native graph storage, a scalable and performance-optimized architecture, and support for ACID compliance. By using Neo4j, business teams can easily work with connected data and reduce complex and time-consuming queries.

Together, Modak Nabu™ and Neo4j provide a powerful solution for data preparation, visualization, and orchestration, enabling organizations to prepare their data quickly and effectively for analysis using graph technology.

The partnership between Modak and Neo4j brings significant benefits to enterprises across industries. Graph visualization enables faster relationship and pattern discovery in datasets, while the Cypher query language simplifies querying. It yields consumption-ready curated data products, provides self-service data engineering using a no-code/low-code platform, and supports multi-cloud and hybrid-cloud data engineering.

This partnership allows enterprises to take advantage of the powerful data management and analysis capabilities of both Modak Nabu™ and Neo4j, and drive greater business value from their data, lowering costs and accelerating this complex process.

About Modak
Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively. We provide cloud-agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.

Modak’s portfolio of Data Engineering Studio provides best-in-class delivery services, managed data operations, data mesh, data fabric, augmented data preparation, data quality, and governed data lake solutions.
About Neo4j:
Neo4j is the world's leading graph data platform. We help organizations – including Comcast, ICIJ, NASA, UBS, and Volvo Cars – capture the rich context of the real world that exists in their data to solve challenges of any size and scale. Our customers transform their industries by curbing financial fraud and cybercrime, optimizing global networks, accelerating breakthrough research, and providing better recommendations. Neo4j delivers real-time transaction processing, advanced AI/ML, intuitive data visualization, and more.

To learn more, please download: https://modak.com/modak-nabu-solution/

Co-Authors:
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak
Co-Authors:
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak

The field of data engineering is constantly evolving, and it can be challenging for professionals to keep up with the latest best practices. In this article, we will explore the top 6 data engineering best practices for 2023. From understanding the importance of data quality to leveraging the power of automation, these best practices will help data engineers stay ahead of the curve and drive success for their organizations. Whether you are just starting out in the field of data engineering or have been working in the industry for years, these best practices will provide valuable insights and guidance to help you excel in your role.

The Rise of Data Engineering in the Age of Modern Data Platform

According to the dictionary definition, data engineering is the process of designing, building, maintaining, and testing systems for storing, processing, and analyzing data. This involves a wide range of activities, including data integration, data quality management, data warehousing, and data management.

There are several factors that have contributed to the rise of data engineering alongside Modern Data Platform, as explained below:


  • The increasing volume, complexity, and value of data that organizations are generating and collecting as valuable asset has risen the need for dedicated professionals who can design, build, and maintain systems for storing, processing, and analyzing data.
  • Data engineers are responsible for developing and implementing the infrastructure and processes that enable organizations to extract insights and value from their data as the reliance on data-driven decision-making increases.
  • The availability of powerful and scalable data management platforms, such as Hadoop and Spark, has made it easier for organizations to work with large and complex data sets. This, in turn, has increased the demand for data engineers who are skilled in using these technologies and tools.
https://modak.com/wp-content/uploads/2023/01/001.-Modak-6-Data-Engineering-Best-Practices-1.png

Data Engineering Best Practices for 2023

According to a report by ResearchAndMarkets, the global big data and analytics market is expected to reach $103 billion by 2027. As organizations continue to generate and collect large volumes of data, the role of data engineering has become increasingly important. In the coming years, data engineering best practices are likely to evolve and adapt to meet the changing needs of organizations and the broader data landscape. Let’s explore some of the key best practices that data engineers should consider as they plan and implement data management and analysis systems in 2023 and beyond.


Focus on data quality and consistency

As a data engineer, it is essential to focus on data quality and consistency to ensure that the data being used is accurate and reliable. This can be achieved through regular testing and validation of the data, as well as implementing strict data governance and management processes to maintain high standards of data quality. By focusing on data quality, data engineers can help to ensure that the data being used is fit for its intended purpose, whether that be for analysis, reporting, or decision making.

Implement data governance and management processes

Implementing data governance and management processes is an important part of a data engineer's role. These processes help to ensure that data is collected, stored, and accessed in a controlled and consistent manner. This can include establishing protocols for how data is collected and entered into the system, defining roles and responsibilities for managing data, and implementing processes for maintaining data quality and security.

Use modern and scalable data management technologies and platform

Using modern and scalable data management technologies is vital to support large volumes of data and complex data management processes. These technologies can help to automate many of the processes involved in data management, such as data cleaning and transformation, and can also help to handle large volumes of data more efficiently. Additionally, using modern data management technologies can help to improve the reliability and performance of data systems, and can enable data engineers to more easily integrate data from multiple sources.


Develop data pipelines and workflows

One of the key responsibilities of a data engineer is to develop data pipelines and workflows. This involves designing and implementing processes for extracting, transforming, and loading data from various sources into the organization's data systems. This can include using tools and technologies such as data lakes and data warehouses to manage and process data. By developing these pipelines and workflows, data engineers can help to ensure that data is being collected, processed, and stored in a consistent and efficient manner.

Use data visualization to communicate and share insights

Data visualization is an essential tool for data engineers to communicate and share insights. By creating graphical representations of data, data engineers can quickly and effectively share their findings with others. This can help facilitate collaboration and decision making within an organization. In addition, data visualization can help to identify patterns and trends in data that may not be immediately apparent from looking at raw numbers. This can help data engineers to gain a deeper understanding of the data they are working with, and to make more informed decisions about how to analyze and use it.

Monitor and optimize data management performance, usage and cost of Modern Data Platform

Monitoring and optimizing data management performance is an important responsibility for data engineers. Data management systems can become slow or inefficient over time, and it is up to data engineers to identify and address these issues. By regularly monitoring the performance of data management systems, data engineers can identify bottlenecks and other issues that may be impacting their performance. They can then take steps to optimize these systems, such as by implementing indexing or other performance-enhancing techniques. In addition, data engineers can use tools and techniques such as load testing to simulate high-traffic scenarios and identify potential performance issues before they occur.

How Modak Nabu™ Can Assist Data Engineering Teams

Modak Nabu™ is a modern data engineering platform that significantly speeds up data preparation and improves the performance of data analytics. It achieves this by converging a range of data management and analysis capabilities, such as data ingestion, profiling, indexing, curation, and exploration.

By providing a single, integrated platform for data management and analysis, Modak Nabu™ enables data engineers to manage and analyze their data more efficiently and effectively. With Modak Nabu™, data engineers can quickly and easily ingest, profile, and index their data, reducing the time and effort required to prepare data for analysis. In addition, Modak Nabu™ provides powerful tools for data curation and exploration, allowing data engineers to quickly identify and address issues with their data, and to gain valuable insights from it. Overall, Modak Nabu™ is a valuable tool for data engineers, helping them to improve the performance and efficiency of their data management and analysis processes and drive business value from their insights.

https://modak.com/wp-content/uploads/2023/01/002.-Modak-6-Data-Engineering-Best-Practices-1-e1674455585675-640x415.png

Check out our video on Modak Nabu ™ to know more!

Co-Authors:
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak
YOUR RECIPE TO BUILD REPEATABLE DATA PRODUCTS

Co-Authors:

https://modak.com/wp-content/uploads/2022/04/Mask-group-2.svg
Baz Khuti
President, Modak USA
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak
We're all familiar with ``Movie Studios`` - a place where stories are choreographed with pre-assembled sets, immersive CGI animations, and talented actors, all working together to create films that entertain and captivate us. Every year, as new scripts are filmed, edited, and distributed, a typical movie studio set is reused and changed out for multiple films. Modak leveraged this concept in developing its own ‘Data Engineering Studio’, a pre-set, tested and proven methodology, software, and processes to where Enterprise customers can transform their data journeys from silos to value-driven-data assets. for enterprise data orchestration.

In this blog, we’ll walk you through our groundbreaking concept to deliver best-in-class data products for business consumption and explain why we believe Modak’s Data Engineering Studio ™ will revolutionize data engineering forever.

For the first time, Modak's Data Engineering Studio ™ has captured the learnings from decades of experience, hundreds of projects and thousands of data pipelines and packaged these into one cohesive methodology., providing enterprise data teams a brand-new set of pre-packaged, tested, and proven methodologies, tools, training, integrations, and practices that enable Enterprises to build continuous data flywheels.

Modak’s Data Engineering Studio™ bridges the gap between analytical, business, IT infrastructure, data platform and data processing teams build on industry standard Scaled Agile Framework (SAFe™). Accelerating the delivery of federated data domain for consumption by analytical teams and AI. Furthermore, the studio approach ensures continuous delivery of data products as a service, monitoring, and skilled managed service teams to institutionalize a DataOps culture.

With Modak Data Engineering Studio™ enterprises can easily implement Modern Data Platforms that are innovation ready and support large digital transformation efforts. Modak provides the best-in-class templates, tools, processes, expertise, and data domain knowledge to enable data orchestration across cloud provider. The capabilities provided by Modak’s Data Engineering Studio™ are enabled by Modak Nabu ™, an intelligent data orchestration platform.

Let’s take a look at the deliverables of Modak’s Data Engineering Studio:

Integrated Pod Structure

Modak Integrated POD is a fusion of self-organized, cross-functional, disciplinary team comprising Data Engineers, Data Ops Engineers, SRE Engineers, Technical Leads, SMEs and DB Administrators, with diverse extensive experience in data software tools such as Kafka, Spark, and Cloud technologies (Microsoft Azure, AWS, and Google). Modak works with the Scaled Agile Framework® (SAFe) for software development and delivery.


Cloud 3.0: Multi- Hybrid Cloud Strategy

Modak works with big data cloud software providers, cloud configuration tools to install, configure and manage cloud provider products such as Microsoft Azure Data Lake, Microsoft Synapse, AWS, GCP, etc. Data can be moved to a single cloud platform, or multi-cloud platform based on landing areas such as AWS S3 or MS Azure ADSL or Google BigTable.


Data Products

Modak Nabu™ provides workspaces where collaboration with business domain experts, data engineers and data stewards are enabled through low-code UI to create data domain products for consumption. Modak teams design, develop, and test automated data ingestion and curation pipelines from on-prem data sources to the Cloud.


Managed DataOps

Managed DataOps team comprises highly experienced and certified team of support and management of MS Azure, AWS, GCP data platforms. They monitor and provide all the cloud platforms periodically for alerts and warnings, troubleshoot any identified issue as per agreed SLA, and SLO’s and optimizing performance and cost. Within this function, Site Reliability Engineering enables monitoring cloud data platform uptime, performance, and other components that include dependency with other software components.


Deep Data Domain Knowledge

Modak has extensive domain and technical experience converting legacy data into appropriate formats, with Modak Nabu’s Data Spiders and BOTs capabilities, our data teams can rapidly create an active metadata driven data fabric, with over 15k pre-built transformation functions. Deep understanding of ingestion and processing data sets along with years of experience working with complex data formats, types, transformations, and building large-scale, including complex R&D genome data assets.

https://modak.com/wp-content/uploads/2022/10/Data-Engineering-Studio-768x498.png
DIGITAL ACCELERATORS FOR A MODERN DATA PLATFORM

A Modern Data Platform is a new architectural pattern for data management. Modern Data Platform provides an automated data infrastructure that continuously feeds analytical models and AI algorithms, through standardized data products that evolve as more data is fed into them – hence the “data flywheel” analogy. One of the tenets of a modern data platform is a focus on the entire source data landscape and tackling multiple use cases versus the traditional approach of limiting to project-level or functional level requirements.

Modak Nabu™ allows enterprises to automate data ingestion, profiling, and curation tasks. Modak Nabu™ joins multiple heterogeneous datasets and creates a data fabric which enables data lake creation. Once data has been profiled, Modak Nabu™ allows domain driven data products to be curated through data mesh framework build on Workspaces. We believe that data fabric and mesh should operate together and not as independent approaches.

Let’s understand the core elements of a modern data platform:

a) Data Fabric

The data fabric provides the data services from the source data through to the delivery of data products, aligning well with the first and second elements of the modern data platform architecture. Modak Nabu’s Data Fabric provides a “net” that is cast to stitch together multiple heterogeneous data sources and types, through automated data pipelines that proliferate an active metadata repository.

b) Data Lake

A data lake is a central repository that enables you to store all of your structured and unstructured data at any scale. Modak Nabu’s automated data pipelines accelerate the data ingestion process and reduce the time required for data lake creation.

c) Data Mesh

Data mesh aims to connects the two planes of operational and analytical data sets and deliver business-owned data products with a lifecycle (just as software) and consumed through APIs. Modak Nabu delivers domain driven data products with data-based principles. These data products are consumed by data and business users.

Summary

Let’s circle back to the movie studio analogy- shooting in a studio exponentially simplifies the filmmaking process. It saves time, capital and human resources, and the customizable sets offer great visual appeal. Similarly, Modak’s portfolio of Data Engineering Studio services enables company to prevent resource exploitation and focus on the ‘what’ and ‘why’ to drive business value, rather than pondering and struggling on the ‘how’.

Data Engineering Studio by Modak provides best-in-class delivery, managed data operations, enterprise data lake, data fabric, data mesh, augmented data preparation, data quality, and governed data lake solutions to efficiently manage data and future-proof your business.

About Modak

Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively. We provide technology, cloud, and vendor-agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.

For further information please visit: https://modak.com/modak-nabu-solution

Co-Authors:

https://modak.com/wp-content/uploads/2022/04/Mask-group-2.svg
Baz Khuti
President, Modak USA
https://modak.com/wp-content/uploads/2021/09/Aastha-Pic-160x160.png
Aastha Jha
Content Manager at Modak

Author:

https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak

Prior to the advent of cloud computing, enterprises had all their data and applications with on-prem data centers and hosting providers. To scale, companies had two choices: Expand their existing data center capacity – an expensive and time-consuming proposition or expand with hosting providers – again an expensive approach. With cloud these inhibitors to scaling, plus the availability of cloud-managed SaaS applications, the growth and adoption of cloud computing is exponential. Many enterprises are now referring to these early journeys as Cloud 1.0 or Cloud 2.0, focused on lift & shift of applications and data from on-prem to the cloud, building secure extensions of their private networks and leveraging cloud provider data processing and analytical services. However, we now appear to be at an inflexion point, enterprises are coming to the realization that:

(a) Not all workloads will move to the Cloud, research from IBM shows up to 55% of workloads will remain on-prem. Why? security, compliance, and investment in large scale on-prem infrastructure which is as cost-effective as operating in the Cloud.

(b) The dread of cloud provider unlock in, as cloud providers continue to extend their services and products outside of the initial Infrastructure as a Service (compute and storage) to databases, middleware, security etc, a plethora of services that are tied to ONE provider and do not interoperate with other cloud providers. Enterprise fear of “lock-in” to a cloud provider, once in, impossible to untangle, losing their ability to negotiate reduced pricing and inability to benefit from different cloud provider services to drive innovation and reduce cost.

We believe enterprises are now embarking on a cloud 3.0 future, a horizon that requires “interoperability” and “orchestration” at the very heart of any strategy and architecture. A future that will require on-prem applications and data, to operate across MULTIPLE cloud providers, allow Enterprises with optionality and flexibility to meet their business objectives, and to control the influence and dominance of the cloud providers – in our view to take back control!

A multi-hybrid cloud strategy provides the freedom to choose multiple cloud service providers based on the data workload and end-user requirements. Multi-hybrid cloud strategy provides benefits such as - no vendor lock-in, improved data workload management, enhanced data security, and improved ROI with a mix of on-prem data centers and multiple different private and public clouds.

Defining Multi-Cloud and Hybrid Cloud

Before we delve into the benefits, let’s first understand the difference between Multi-Cloud and Hybrid-Cloud strategy.

Multi-Cloud:

Multi-cloud strategy includes more than one public cloud provider, usually to perform different data and application operations.

Hybrid Cloud:

Leverages the sunk costs and infrastructure in on-prem data centers and applications /data and ensure security and compliance requirements are not compromised. But now needs to seamlessly interoperate with multi-cloud services.

https://modak.com/wp-content/uploads/2022/10/Multi-cloud.png
https://modak.com/wp-content/uploads/2022/10/Hybrid-Cloud.png
What are the benefits of a multi-hybrid cloud strategy?

  • Cost Optimization
  • By utilizing multiple cloud providers, enterprises benefit from different pricing options for computing and storage resources. Enterprises can allocate IT resources to the most cost-effective provider based on storage and workload needs.

  • Performance Optimization
  • Enterprises can run data workloads in multiple cloud environments as per the specific use case requirements. Enterprises can leverage more than one public cloud provider for specific data workloads and optimize performance and scalability at controlled costs.

  • Avoid Vendor Lock-in
  • One of the topmost priorities of enterprises is to avoid vendor lock-in. If their needs are not met, organizations want the freedom to switch cloud service providers. Businesses that use a multi-cloud strategy have options and are not restricted to use a single cloud service provider.

  • Risk Mitigation
  • If a vendor experiences an attack or infrastructure downtime, a multi-cloud user can quickly switch to another cloud service provider or fall back to a private cloud.

  • Innovation
  • Benefiting from the investments and strengths of each Cloud provider to drive innovation, for example Google GCP is recognized as a leader in AI / ML services due to their heritage.

  • Security
  • Enterprises are concerned about losing control over critical data and applications in the cloud environment. In a hybrid cloud strategy enterprise can have an on-prem data center or a private cloud to host their critical data or applications to have more security and control over data assets.

  • Cloud Bursting
  • Cloud bursting is a way to manage data workload with a combination of public and private clouds. If an enterprise has used the private cloud to its full capacity and there is an increase in data traffic, the enterprise can route the access data traffic to the public cloud without any service interruptions. A hybrid cloud provides better workload management and cost efficiency with cloud bursting.

https://modak.com/wp-content/uploads/2022/10/Cloud-benefits.png
The value from a multi-hybrid cloud strategy are significant, and the smart enterprises of tomorrow are now moving in this direction – in our view an uncharted and brave new world of Cloud 3.0. The technical and cultural changes are significant, further compounded by System Integrators who are incentivized and have formed strategic alliances with Cloud providers and are instrumental in influencing large Enterprises to drive transformation. IT leaders need to acknowledge the importance of neutrality and impartiality in strategic decisions to ensure they retain control and ensure Cloud 3.0 to become a practical reality.

At Modak we believe that intelligent orchestration allowing interoperability across cloud providers is the genesis for Cloud 3.0. Modak’s investment and the influence of large enterprises in the development of our flagship product – Modak Nabu™ crystallizes the ability to deliver intelligent data orchestration in a multi-hybrid cloud future. The vision is now a reality, with Modak Nabu™ deployed at Healthcare and Life Science customers enabling Cloud 3.0.
Case Study:

Background:

A Top 5 US Healthcare Insurance provider, with 90k+ employees, has adopted a Cloud 3.0 strategy with multi-cloud providers and a hybrid cloud.

Challenges:

The client was struggling in their cloud data migration journey with legacy ETL tools and on-premises Data Lake. Additionally, they were finding it challenging to control the costs of cloud operations due to a lack of visibility of cloud resource usage. The absence of proactive monitoring and alerting services was leading to cloud resource wastage. Data processing was taking a lot of time with clients’ home-grown and incumbent tools, and they were facing difficulties to scale and automate their data processing tasks.

The client faced the following challenges:
  • Manual processes and the absence of automation for data operations
  • Unclear service level objectives and indicators
  • Dependency on on-prem data lake and ETL tools impacting the speed of data orchestration and migration
  • legacy tools take over 25 hours to process
  • Higher data processing time with existing infrastructure
https://modak.com/wp-content/uploads/2022/10/Cloud-challenges.png
Solution:

After evaluating incumbents, third-party software, and cloud provider tools the company selected Modak Nabu™, a data engineering platform that accelerates the ingestion, profiling, and migration of data to any cloud provider.Modak Nabu™ accelerated clients’ cloud migration journey and reduced cloud costs by-

  • Automated creation of monitoring dashboard
  • Removing the dependency on on-prem Data Lake and legacy ELT/ETL tools
  • Automation of Data pipelines to accelerate the data orchestration in the cloud
  • Periodic review and monitoring of unused resources
  • Automated restart of services along with RCA
  • Creation of runbooks for every issue which reduces the issue resolution time by 50%
Impact:

With Modak Nabu™ the enterprise client implemented Cloud 3.0- a Hybrid/Multi-cloud strategy and accelerated the data movement workflow from on-prem to the cloud. Modak Nabu™ optimized cloud operation costs and improved data operations and services.

The enterprise client recognized the following benefits:

  • Cost optimization: Savings of 65% by removing unused resources from cloud providers' infrastructure
  • Real-time Monitoring of all data engineering services
  • Average data processing time was improved by 85% from hours to minutes
  • Eliminated the dependency on legacy ELT/ETL tools
  • Proactive alerting results in quicker issues resolution
  • Saved time and resources by Automated Data operation
  • Resolved 95%+ issues within SLA with SLI and SLO monitoring

About Modak

Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively. We provide technology, cloud, and vendor-agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.

Modak’s portfolio of Data Engineering Studio provides best-in-class delivery services, managed data operations, enterprise data lake, data mesh, data fabric, augmented data preparation, data quality, and governed data lake solutions.

Author:

https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
Author:
https://modak.com/wp-content/uploads/2021/09/aarti-1-160x160.png
Aarti Joshi
Chief Executive Officer, Modak
The Chief Data Officer (CDO) is the most senior executive responsible for advocating and promoting data as a strategic enterprise asset. The CDO role is rapidly evolving, and their success is critical to driving an organization's growth and innovation charter. CDOs must embrace their role as change agents, and shift from a defensive mindset of data governance & technical expertise to an offensive data strategy by identifying and driving a portfolio of business use cases.

According to a recent Gartner report, 50% of CDOs will fail due to a combination of internal and external factors. Because many external factors are beyond their direct control, the CDO must be aware of key internal impediments to success. The following is a guide to identifying the behavioral habits that we believe CDOs should have.

Habit One – Ownership

Takes responsibility for acting as a catalyst across the organization to identify the highest value portfolio of use cases and how these use cases can be delivered as curated data products for consumption by AI, BI, and analytical teams.

Habit Two – Collaborator

Build relationships and communication that facilitate constructive collaboration that prioritizes business outcomes to discover the data landscape and has an active inventory of enterprise data sets.

Habit Three – Storyteller

The ability to craft and deliver a narrative to multiple stakeholders to build empathy and support for how data can be profiled to drive business outcomes.

Habit Four – Bias for Action

The pace is set on multiple fronts, with technical platforms leveraging pre-assembled cloud services, driving multiple use cases, and engaging across multiple business lines. Starting small but scaling quickly to demonstrate value, and failing quickly, with no blame and learning.

Habit Five – Bridge Builder

Bridges data silos within the organization and with external providers to proliferate an active metadata repository but also ensure interoperability of integration between incumbent and cloud tool providers.

Habit Six – Advocate

To realize the value of data, CDOs must democratize and build an insights-driven organization through Data Products and make these available through Data Marketplace, allowing the monetization of data assets across the organization.

Habit Seven – Monetizes Data Products

Data not only fuels analytics but also unlocks insights generated by AI and machine learning algorithms to help answer the questions of tomorrow. As these algorithms improve the accuracy and types of insights generated with more data, the CDO needs to build “data flywheels” that continuously fuel AI models.

About Modak

Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively.
We provide technology, cloud, and vendor-agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.

Modak’s portfolio of Data Engineering Studio provides best-in-class delivery services, managed data operations, enterprise data lake, data mesh, augmented data preparation, data quality, and governed data lake solutions.

Modak Nabu™

Modak Nabu™ enables enterprises to automate data ingestion, curation, and consumption processes at a petabyte-scale. Modak Nabu™ empowers tomorrow's smart enterprises to create repeatable and scalable business data domain products that improve the efficiency and effectiveness of business users, data scientists, and BI analysts in finding the appropriate data, at the right time, and in the right context.

Author:

https://modak.com/wp-content/uploads/2021/09/aarti-1-160x160.png
Aarti Joshi
Chief Executive Officer, Modak

Co-Authors:

https://modak.com/wp-content/uploads/2022/07/MicrosoftTeams-image-160x160.jpg
Vishrut Mishra
Site Reliability Engineer Lead at Modak
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
The promise of cloud computing is to provide pay-as-you-go pricing and large-scale computing and storage infrastructures that scale on-demand. As Enterprises accelerate their journey to the Cloud to process more and more data, the IT operational (OPEX) costs are skyrocketing, mainly driven by:



  • Increasing AI and machine learning workloads fuelled by more data
  • Need to process and store any type of data to feed AI models with structured and unstructured data
  • Increasing complexity of managing large-scale compute and data platforms on the cloud
  • Cloud provider services to drive adoption and innovation but locks-in customer data and workloads with unforeseen costs at scale
  • Lack of cloud operational expertise to manage and optimize cloud infrastructure and cost
  • Regulations and compliance mandates to retain data for auditability purposes
  • The financially driven to shut down legacy on-prem data centres, to realize immediate cost savings without a clear and cohesive hybrid cloud strategy.
https://modak.com/wp-content/uploads/2022/07/MicrosoftTeams-image-7.png
As such, IT departments are struggling to manage their Cloud costs and are re-thinking their approach to a Cloud-first strategy and how to optimize IT budgets by leveraging investments in existing on-prem data centers. CIOs and CDOs are now re-framing their approach by moving to a multi-cloud and hybrid cloud architecture to provide:
https://modak.com/wp-content/uploads/2022/07/MicrosoftTeams-image-5.png
A modern data platform that inherently includes a multi-hybrid cloud strategy requires interoperability and security to enable the orchestration of data to the cloud.

Modak Nabu™ is an integrated data engineering software platform that allows enterprises to operate on any cloud provider and manage data from on-prem data sources and applications. Modak Nabu™ empowers enterprises to successfully execute multi and hybrid cloud strategies.
Case Study:

Background:

A Top 5 US Healthcare Insurance provider, with 90k+ employees, has adopted a Cloud 3.0 strategy with multi-cloud providers and a hybrid cloud.


Challenges:

The client was struggling in their cloud data migration journey with Legacy ETL tools and on-premises Data Lake. Additionally, they were finding it challenging to control the costs of cloud operations due to a lack of visibility of cloud resource usage. The absence of proactive monitoring and alerting services was leading to cloud resource wastage.

The client faced the following challenges-

  • Manual processes and the absence of automation for data operations
  • Unclear of service level objectives and indicators
  • Dependency on on-prem data lake and ETL tools impacting the speed of data orchestration and migration
  • Higher data processing time with existing infrastructure


Solution:
After evaluating incumbents, third-party software, and cloud provider tools the company selected Modak Nabu™, a data engineering platform that accelerates the ingestion, profiling, and migration of data to any cloud provider.

Modak Nabu™ accelerated clients’ cloud migration journey and reduced cloud costs by-

  • Automated creation of monitoring dashboards
  • Removing the dependency on on-prem Data Lake and legacy ELT/ETL tools
  • Automation of Data pipelines to accelerate the data orchestration in the cloud
  • Periodically review and monitoring of unused resources
  • Automated restart of services along with RCA
  • Creation of runbooks for every issue which reduces the issue resolution time by 50%
https://modak.com/wp-content/uploads/2022/07/MicrosoftTeams-image-6.png
Impact:

With Modak Nabu™ the enterprise client implemented Cloud 3.0- a Hybrid/Multi-cloud strategy and accelerated the data movement workflow from on-prem to the cloud. Modak Nabu™ optimized cloud operation costs and improved data operations and services.


A Top 5 US Healthcare Insurance provider recognized the following benefits:

  • Cost optimization: Savings of 65% by removing unused resources from cloud providers' infrastructure
  • Real-time Monitoring of all data engineering service
  • Average data processing time was improved by 85% from hours to minutes
  • Eliminated the dependency on legacy ELT/ETL tools
  • Proactive alerting results in quicker issues resolution
  • Saved time and resources by Automated Data operation
  • Resolved 95%+ issues within SLA with SLI and SLO monitoring


To know more about Modak Nabu™: https://modak.com/modak-nabu-solution

About Modak

Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively. We provide technology, cloud, and vendor-agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.

Modak’s portfolio of Data Engineering Studio provides best-in-class delivery services, managed data operations, enterprise data lake, data mesh, data fabric, augmented data preparation, data quality, and governed data lake solutions.

Modak Nabu™

Modak Nabu™ enables enterprises to automate data ingestion, curation, and consumption processes at a petabyte-scale. Modak Nabu™ empowers tomorrow's smart enterprises to create repeatable and scalable business data domain products that improve the efficiency and effectiveness of business users, data scientists, and BI analysts in finding the appropriate data, at the right time, and in the right context.

Co-Authors:

https://modak.com/wp-content/uploads/2022/07/MicrosoftTeams-image-160x160.jpg
Vishrut Mishra
Site Reliability Engineer Lead at Modak
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak
https://modak.com/wp-content/uploads/2022/06/Rajesh-Vassey-image-160x160.jpg
Rajesh Vassey
Technical Program Manager, Modak
Healthcare insurance companies depend on capturing data of their members, data such as healthcare plans, medical data from healthcare providers, review, approval of medicines, and managing Medicare plans for millions of Americans. The volume and variety of data are huge, and ensuring member data is captured, stored, and processed for analytical purposes to support that the right care, at the best possible cost and quality, is delivered in the right place is core to their business model.

A Top 5 Healthcare Insurance company provides medical and specialty insurance products that allow members to access healthcare services through a network of care providers such as physicians, hospitals, and other healthcare providers. As such, data interoperability is at the core of interacting and delivering services to their members.
The critical data assets, often referred to as the ‘crown jewels’ for healthcare insurance companies are members and claims data. Historically, the Healthcare insurance company had built on-prem data lakes populated by transactional systems and external data providers to provide a single repository for analytical consumption. The software tools and data storage infrastructure were assembled on legacy ETL tools, and custom programs and hosted on Hadoop. Over time the complexity, lack of scalability, and investment required to maintain and support such an on-prem infrastructure became inflexible and cost-prohibitive. The challenge was to consider the options of how to modernize the software tools and migrate to the cloud and thereby move away from the current processes and dependency on Hadoop.

The data volumes and business integrity checks are significant, with approximately 10+ billion records to be migrated, legacy tools taking over 25 hours to process, and frequently failing and requiring a dedicated team of contractors to manage and support.
https://modak.com/wp-content/uploads/2022/06/001.-Nabu-in-Action-blog-01.png
The design and development team set out the following objectives:

  • Support the client’s multi-cloud strategy to securely operate across multi-cloud providers

  • Ensure interoperability of data across on-prem and cloud provider systems

  • Meet and improve on current business and IT service level agreements

  • Ensure data compliance and regulatory needs are not compromised

  • Provide a cloud data platform to fuel analytics and innovation


After evaluating incumbents, third-party software, and cloud provider tools the team selected Modak Nabu™, an integrated data engineering platform that accelerates the ingestion, profiling, and migration of data to any cloud provider. The Modak Nabu™ software provides data spiders to crawl, index, and profile large data sets, and automates the creation of data pipelines and Smart BOTs to orchestrate the data movement workflows from the on-prem enterprise source systems to the MS Azure Cloud ADLS2 platform.
https://modak.com/wp-content/uploads/2022/06/002.-Nabu-in-Action-blog-01.png
Due to the clients’ Cloud 3.0 multi-cloud strategy, and leveraging existing investment, Google’s DataProc Engine (Spark) was used as the compute processing engine to enable the migration and provide resiliency and performance.
The outcomes and impact of the implementation of Modak Nabu™ are summarized as follows:

  • Reduced cost and improved service due to the dependency on on-prem Hadoop removed

  • Average data processing time improved by 85% from hours to minutes

  • Eliminated the dependency on legacy ELT/ETL tools

  • Less stress on source systems through the usage of parallel workloads implemented with Modak Nabu™

  • Alignment with the clients’ hybrid cloud and multi-cloud strategy

  • Availability and refresh of data into the MS Azure Data Lake within minutes for analysis

  • Implementation of automated data pipelines, with robust message-driven fabric and real-time monitoring to meet SLA.

  • An active metadata repository enabling the automation of data pipelines and BOTs to orchestrate data migration.

  • Automatic identification of source data schemas enabling data pipelines, eliminating manual intervention and downtime.


https://modak.com/wp-content/uploads/2022/06/004.-Nabu-in-Action-blog.png

Modak Nabu™

Modak Nabu™ enables enterprises to automate data ingestion, curation, and consumption processes at a petabyte-scale. Modak Nabu™ empowers tomorrow's smart enterprises to create repeatable and scalable business data domain products that improve the efficiency and effectiveness of business users, data scientists, and BI analysts in finding the appropriate data, at the right time, and in the right context.

Author:

https://modak.com/wp-content/uploads/2022/06/Rajesh-Vassey-image-160x160.jpg
Rajesh Vassey
Technical Program Manager, Modak
Co-Authors:
https://modak.com/wp-content/uploads/2022/04/mayank-160x160.png
Mayank Mehra
Modak - Head of Product Management
https://modak.com/wp-content/uploads/2022/04/Screenshot-2022-04-28-134610.png
Adrian Estala
Starburst - VP, Data Mesh Consulting Services

Data Fabric and Data Mesh concepts are front and center for many data-driven organizations and are routinely compared in data management and engineering circles. If you want some practical ideas to accelerate your data strategy, look for opportunities to learn from both approaches and leverage the best for your design.

A simpler and faster pathway to decentralized data sources

There are numerous articles and videos on mesh vs fabric, many of them offer useful opinions on the pros and cons. While most present the two as competing ideas, we propose that they can work together. They are both great concepts, and while there are differences in the approach, they share some key principles:

  • Eliminating data silos and enabling data democratization across the enterprise.

  • Enabling access to decentralized data sources in a multi-cloud/hybrid-cloud environment with the agility and scale that our business teams demand. Centralization is not a requirement, and for many organizations, it is not effective.

  • Simplifying the ETL process to eliminate the bottleneck that the current centralized teams present.


In this article, we are going to focus on three capabilities: Artificial Intelligence, Domains and Data Products, and Governance. Certainly, there is a lot more to discuss and more opportunities to leverage the best of both worlds but let this be our first step towards a more enriching conversation in the near future.

How a Data Fabric Leverages Artificial Intelligence

A Data Fabric uses artificial intelligence to integrate data sets across different data sources. The fabric relies on active metadata, knowledge graphs, and machine learning to drive recommendations for integration and analytics. This approach automates your discovery of new logical groupings to create virtual data domains. If you have good metadata and are working across large data sets, this is a sensible approach.

For anyone building a fabric or a mesh, look for ways to leverage AI to automate data discovery and integration. The effectiveness of the AI engine will depend greatly on the metadata and your knowledge of the data sets; you need to ‘teach’ the engine and keep an eye on data quality. If you have implemented a Data Mesh and are looking for new ways to analyze, improve the quality, or categorize your data sets, look into AI capabilities.

Data Mesh Domains Serve Up Data Products

The biggest difference between a Data Fabric and a Data Mesh is how they each address the concept of domains and data products. The fabric creates a virtual management layer that sits on top of the data sources to create logical domains. Whether it is recommended by AI or designed by an engineer, in a fabric, the domain is managed within a central virtual layer.

A mesh can also rely on a virtual layer to create logical domains and products, but it moves management and delivery closer to the consumer. The Data Mesh adds people and processes to the domain and product concepts. In a mesh, distributed domains are managed in a self-service manner by autonomous domain teams. Each domain team designs and builds data products for their consumer as their primary purpose is to simplify consumer reuse and incentivize sharing. The teams closest to the business problem and the business data, manage the domain.

For teams building a fabric or a mesh, you should empower the consumer. Data products should be curated and offered in a manner that allows the consumer to quickly find them, use them, and share them. Self-service capabilities empower domain teams to build their own data products, and some autonomy allows them to make rapid governance decisions. If you have built a Data Fabric and are looking for ways to accelerate consumer adoption, consider empowering them to manage their own domains and products.

Governance

A Data Fabric can be described as employing a top-down approach to governance. In a fabric, the metadata and virtual layers are centrally managed. A Data Mesh more closely resembles a bottom-up approach, with distributed domain teams each managing their own data governance. Whether you are implementing a fabric or a mesh, you should adapt your governance approach to meet the risk vs value profile that best fits the use case. A Data Mesh promotes autonomy to enable domain teams to govern their own areas. A domain with higher risk data may employ strict controls, whereas another domain may choose an open-access approach.

Whether you have started your mesh or fabric or are still thinking about how to get started, you have an opportunity to drive continuous improvement and consumer value by learning from the collective experiences and capabilities of both concepts.

About Modak

Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively. We provide technology, cloud, and vendor-agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.

Modak’s portfolio of Data Engineering Studio provides best-in-class delivery services, managed data operations, enterprise data lake, data mesh, data fabric, augmented data preparation, data quality, and governed data lake solutions.

Modak Nabu™

Modak Nabu™ enables enterprises to automate data ingestion, curation, and consumption processes at a petabyte-scale. Modak Nabu™ empowers tomorrow's smart enterprises to create repeatable and scalable business data domain products that improve the efficiency and effectiveness of business users, data scientists, and BI analysts in finding the appropriate data, at the right time, and in the right context.

Co-Authors:

https://modak.com/wp-content/uploads/2022/04/mayank-160x160.png
Mayank Mehra
Modak - Head of Product Management
Contact: [email protected]
https://modak.com/wp-content/uploads/2022/04/Screenshot-2022-04-28-134610.png
Adrian Estala
Starburst - VP, Data Mesh Consulting Services
Originally published on Starburst.io.
Author:
https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak

The terms Data Fabric and Data Mesh are now routinely used in the data management and engineering circles. Given the hype and marketing, reaching an agreement on their definitions and usage patterns is proving difficult. The purpose of this blog is to provide clarity from an adoption perspective.

Context

Data is dispersed throughout an enterprise in a variety of structures and formats, spanning numerous applications, databases, data warehouses, and data lakes. The migration of on-premise data repositories to the cloud extends the data landscape even more, and with Data Scientists requiring external data sets to continuously feed self-learning models, the complexity of managing data is increasing exponentially. The need to think about new data management designs and practices is now front and center in the industry.

What is a Data Fabric?

A Data Fabric needs to be seen from a data management design viewpoint, not from an implementation perspective. No single solution can provide a comprehensive one-stop-shop to enable a Data Fabric. Instead, multiple providers and consumers of data need to be brought together focused on three core tenets for a Data Fabric: agility, integration, and automation. These are supported by using an active metadata repository to capture the source technical and business metadata and visualized through semantic knowledge graphs. A Data Fabric provides data engineers and subject matter experts with the foundations to curate and deliver data domain products.

The main objective of a Data Fabric is to provide a “net” that is cast to stitch together multiple heterogeneous data sources and types, through automated data pipelines that proliferate an active metadata repository.
https://modak.com/wp-content/uploads/2021/09/Fabric.png
This allows for logical groupings (without moving the data) to create virtual data domains where augmentation techniques to apply tags (for example classify PHI data) or ML algorithms can be applied to automate the data quality and cataloging of data sets.

As such, a data fabric design is a collection of data services that deliver agile and consistent data integration capabilities across a variety of endpoints throughout hybrid and multi-cloud environments. Further, a Data Fabric adds a layer of abstraction, data can remain distributed with no movement of the physical data aside from crawling and profiling to create a logical map of the data landscape. This removes the need to replicate data for no outcome-driven reason.

Many organizations know that point-to-point integration patterns scale very poorly when faced with too many integrations. What starts out as one to few integrations, quickly morph to become a spaghetti of integration points. A good data fabric design aims to liberate this nightmare scenario with an active metadata catalog to repurpose existing data pipelines. Furthermore, enhancing the productivity of scarce Data Engineers by shifting away from manual, time-consuming, and error-prone ETL tools and toward low-code, UI-driven data pipeline creation saves time and money.

In summary, a Data Fabric can provide data architects and engineers with a design pattern where the focus is on the communication and collaboration with business users on the high-value use cases and less on the data infrastructure.

What is a Data Mesh?

The term Data Mesh was coined by Thoughtworks to address moving from monolithic data platforms to distributed data management. Data mesh aims to connect the two planes of operational and analytical data sets and deliver business-owned data products with a lifecycle (just as software) and consumed through APIs.

Consequently, a Data Mesh can be thought of as a consulting-driven data implementation paradigm that requires customers to balance the decentralized vs. centralized data domain creation, orchestration, governance and management pendulum.

The development of domain-specific Data Products follows the principles that they are discoverable via a self-service data marketplace, trustworthy as the business has validated and interoperable with other data domains and data sets.

A data domain product can be regarded as ``dossiers`` of institutionalized business knowledge that have been collaboratively curated and made available to a wide range of users. They complement currently limited and focused data marts that give information on specialized or targeted use cases and are based on structured (relational) data sets. Domain-driven Data Products, on the other hand, will have a broader and richer mix of structured and unstructured data to suit a variety of use cases, including fueling AI model design and development to answer the questions of tomorrow.
https://modak.com/wp-content/uploads/2021/09/Data-Mesh.png

About Modak

Modak is a solutions company that enables enterprises to manage and utilize their data landscape effectively.
We provide technology, cloud, and vendor-agnostic software and services to accelerate data migration initiatives. We use machine learning (ML) techniques to transform how structured and unstructured data is prepared, consumed, and shared.

Modak’s portfolio of Data Engineering Studio provides best-in-class delivery services, managed data operations, enterprise data lake, data mesh, augmented data preparation, data quality, and governed data lake solutions.

Modak Nabu™

Modak Nabu™ enables enterprises to automate data ingestion, curation, and consumption processes at a petabyte-scale. Modak Nabu™ empowers tomorrow's smart enterprises to create repeatable and scalable business data domain products that improve the efficiency and effectiveness of business users, data scientists, and BI analysts in finding the appropriate data, at the right time, and in the right context.

Author:

https://modak.com/wp-content/uploads/2022/04/Author-Name-Devesh-Salvi-160x160.jpg
Devesh Salvi
Product Analyst at Modak

Research firm Gartner predicts:Through 2025, 80% of organizations seeking to scale digital business will fail because they do not
take a modern approach to data and analytics governance.

Why such a high failure rate?

In our opinion, incumbent and traditional data platforms managed by IT organizations are primarily focused on very narrow datasets, structured data, centralized governance, and have historically been deployed on-premises. With the proliferation of cloud solutions, the need for a hybrid cloud configuration, and the growing need for wider and continuous data sources encompassing unstructured and semi-structured data sources to fuel AI models – we believe we have reached a tipping point where an alternative approach should be considered. A modern data platform is designed to accommodate not only multi-cloud and hybrid cloud capabilities, but also automated data product delivery as a service and enable multiple use cases.

What is a Modern Data Platform (MDP)?

An MDP is a new approach and architectural pattern of data management. Modern Data Platform provides an automated data infrastructure that continuously feeds analytical models and AI algorithms that learn and evolve as more data is fed into them.

Key Principles of MDP

  • A full data landscape requires inventorying all data sources and not being selective to solve specific and targeted use cases.
  • Consolidation of data into cloud-enabled infrastructure to provide a “Data Lake” – enabling flexibility of deployment on multi-cloud and hybrid cloud infrastructure.
  • Enabling a portfolio of use cases to be created, delivered, and prioritized to business needs and outcomes.
  • The application of advanced AI and ML-driven techniques to automate the standardization and harmonization of data sets into data domain assets and democratization of data with business-owned data products.

To benefit from MDP requires a parallel shift in culture

  • Adoption of a data-driven organization that provides data ownership and accessibility to business users.
  • Senior executives providing subject matter expertise, mentorship, and transparency in decision making and outcomes.
  • Sense of urgency to move faster with a start-up founders’ mentality to rapidly generate value.

To learn more please download the building blocks of a Modern Data Platform White Paper:

Co-Authors:

https://modak.com/wp-content/uploads/2022/04/Mask-group-4.svg
Milind Chitgupakar
Chief Analytics Officer &
Co-founder, Modak
https://modak.com/wp-content/uploads/2022/04/Mask-group-1.svg
Mark Ramsey
Ramsey International
Managing Partner
https://modak.com/wp-content/uploads/2022/04/Mask-group-2.svg
Baz Khuti
President, Modak USA