If you’re investigating the potential of a modern data warehouse to help your organisation become more data-driven in its decision making, you’re probably coming across references to an alternative approach called a data lake. A data lake is designed to address some of the shortcomings of a data warehouse, but it presents its own set of disadvantages. So just what’s the difference between the two — and which one is going to be best for your business?
What is an Enterprise Data Warehouse?
An Enterprise Data Warehouse (EDW) is a system for consolidating and analysing large volumes of structured and semi-structured data from a variety of disparate sources and then serving it to users to provide them with actionable business insights.
There are a number of drawbacks to a data warehouse. While it can draw data from multiple operational systems across an organisation and intelligently combine data coming from different sources, there’s a high cost in terms of both time and effort, because the data needs to be cleansed and transformed before it can be stored. The result is that the output of a data warehouse tends to be focused on providing a fairly narrow range of reports, such as executive dashboards and sales reports, with poor support for ad hoc reporting or for uncovering hidden patterns in your data.
Data warehouses also aren’t good at handling unstructured data, such as voice, video, social media content and log files from IoT devices, which is now estimated to make up 80% of all data being generated each day. If there are useful insights to be gleaned from that data, a data warehouse won’t do much to help you find them.
What is a Data Lake?
A Data Lake, on the other hand, is simply a type of data store optimised to allow cost effective ingestion of high volumes of structured, semi-structured and unstructured data. Usually implemented using technologies such as Amazon S3, Google Cloud Storage and Hadoop HDFS, a data lake solves many of the challenges around data ingestion but creates new issues to be solved.
The main disadvantage of data lake solutions is that they typically don’t provide good support for analysis and presentation, so organisations struggle to manage, explore and exploit the data they’ve collected. It’s also all too easy to load data into a data lake willy-nilly — which is how you end up with a data swamp.
How can the cloud combine the two?
In fact, what you need from your data analytics solution is a platform that combines the flexibility, scale, and ease of ingestion of a Data Lake with the powerful organisation and analytics capabilities of an Enterprise Data Warehouse. Cloud computing is making this possible, with scalable, affordable compute and storage solutions that are surrounded by ecosystems of cloud-based services optimised for ingesting, storing, analysing and visualising structured and unstructured data.
Take the example of Google Cloud. Sitting at the heart of Google’s modern data analytics platform, Google BigQuery — which can be supplemented by other Google Cloud storage options for unstructured data — provides you with a secure, hyper-scalable data store that’s easy to manage. Around that core, a variety of code and no-code options, along with third-party solutions such as Fivetran and dbt, can handle all your data ingestion needs and allow you to quickly create robust data transformation pipelines. With these tools in place, it’s then easy to organise your data and prevent your data lake from turning into a data swamp.
Google Cloud then provides solutions to serve up the timely, engaging, action-orientated insights that allow your business users make better decisions more quickly. Tools like Data Studio and Looker let you build highly scalable and responsive dashboards and visualisations to empower key decision makers. And Google’s BigQuery ML and Cloud AutoML allow teams with limited machine learning expertise to exploit machine learning to extract value from unstructured data or take advantage of predictive analytics.
To find out more about how a modern data analytics platform combines the best features of both a data warehouse and a data lake, read our white paper on the 7 rules for a successful modern data platform — simply click on the download link above. Or come and talk to the experts in our Data Analytics team about your specific challenges and the opportunities that are open to you.