Decisions, decisions

Editorial Type: Opinion Date: 2022-05-26 Views: 449 Tags: Storage, Management, Strategy, Infrastructure, Analysis, Unstructured Data, Wherescape PDF Version:
Simon Spring, Operations Director, EMEA at WhereScape, explains how organisations should choose between data hubs, lakes & warehouses

Identifying and implementing the correct data structure is essential for any organisation focused on becoming 'data driven', and there is no shortage of solutions available to satisfy each use case. Whether an organisation employs data hubs, lakes or warehouses, the core objective is to find the best way to effectively ingest and manage data that will deliver the insight-driven capabilities required.

It's clear, however, that in making these choices some data managers and organisations are working with a knowledge gap that has the potential to throw plans off course. These challenges were brought into sharp focus by Gartner's 2020 report: 'Data Hubs, Data Lakes and Data Warehouses: How They Are Different and Why They Are Better Together', which underlines the importance of using the right infrastructure for the right purpose.

Often, says the report, there is confusion between data lakes, data warehouses and data hubs amongst data and analytics leaders: "For example, while Gartner client inquiries referring to data hubs increased by 20% from 2018 through 2019, more than 25% of these inquiries were actually about data lake concepts." While confusion is understandable, it's increasingly important that decision-makers fully appreciate the role of each approach and how they can be combined to make the most of the huge investments being made.

SPOT THE DIFFERENCE
So, where do the differences lie and how can organisations ensure they are heading down the right path? Fundamentally, data warehouses should be used to analyse structured data, data lakes to analyse unstructured or semi-structured data and data hubs to communicate the resultant Business Intelligence to those who need to act on it.

The problem is that people work with the mistaken belief that these three approaches are interchangeable and all accomplish the same job but in different ways. It's critical, however, that business executives not only understand the role of each for themselves but also convey it to the rest of the organisation to democratise data use.

For instance, the value of employing data lakes and the exploratory technologies that unstructured big data enables can only be fully realised if the organisations can apply their findings in a structured environment. This is where the role of the data warehouse becomes key in that a data lake can be added as a source to a data warehouse, and when its data is combined with other real-time and batch sources, the result is rich, contextualised business insight.

The role of the data hub is not only to share BI, but also to make it available for governance by those responsible for it, and as the name suggests, enable data flow between diverse endpoints. Given its importance, it's unfortunate that this is arguably the least understood.

One of the main recommendations of Gartner's report is to: "Maximize your ability to support a broader range of diverse use cases by identifying the ways that these structures can be used in combination. For example, data can be delivered to analytic structures (Data Warehouses and Data Lakes) using a Data Hub as a point of mediation and governance."

COPING WITH COMPLEXITY
It's inevitable that while the exponential growth in the collection, management and analysis of data makes more insight available, it also means the infrastructure that supports these functions must become much more complex. Moreover, that infrastructure must adapt as new demands continually emerge and as data sources periodically evolve. Organisations must avoid the assumption that they can create a data infrastructure that won't need to be changed over time.

Indeed, the Gartner report points out the value of adopting an agile approach to how new data from different sources and in different formats is ingested. Embracing the complexity and disruption this can bring can enable organisations to uncover new insight and monetise it before their competitors.

These are important considerations because knowledge gaps can also result in conflicting expectations whereby those leading the data department have different ideas of the role and importance of certain infrastructure types than those building and using it day-to-day. By removing the potential for ambiguity, organisations put themselves on a sure footing to drive positive impact from their data strategies.

More info: www.wherescape.com