When you first decide to create a data analytics platform, you probably have a handful of use cases in mind. But anyone who’s been running a data warehouse for any length of time can tell you that things won’t stay that way.
Maybe you want to start using clickstream data from your e-commerce site to generate recommendations for customers, based on what others looked at or bought after viewing the same item. Maybe you want to identify the characteristics of customers with a high lifetime value so you can target your marketing activities more effectively. Or perhaps you’re interested in using IoT sensor data to create individualised maintenance schedules for your products.
What’s certain is that, over time, the “five Vs” of of the data you want to analyse — its variety, volume, velocity, veracity and value — will change. So future-proofing your data ingestion and data transformation pipelines should be a key consideration when selecting your data analytics platform.
If you choose the Google Cloud Platform (GCP), you’ll be in good hands. Here are 5 ways Google Cloud Platform has been designed to provide future-proofed data ingestion.
1. A variety of options and tools for data ingestion
GCP includes both code and no-code options that can handle batch, near-real time and continuously streamed data, and handle structured and unstructured data from a broad range of data sources. These options include:
- BigQuery DTS, which allows you to load data from Google SaaS apps such as Google Cloud Storage, Google Ads and YouTube Channel Reports, as well as third-party cloud storage solutions such as Amazon S3 and cloud data warehouses like Teradata.
- Federated GCS, for querying data held externally in spreadsheets and files in csv and json format. These can be stored in Google Drive or another cloud storage solution.
- Cloud Pub/Sub and Cloud Dataflow, which handle event-driven and streaming data from IoT and mobile devices.
- Data Fusion and Cloud Composer, code-free, point-and-click solutions that let you easily extract data from on-prem databases and third-party APIs.
2. An ecosystem of third-party ingestion and transformation tools
The options available to you become even richer when you consider the wider ecosystem of ingestion and transformation tools designed to work with GCP. Whatever your need, there’s sure to be a tool — designed to work seamlessly with Google BigQuery — that can handle it
- Fivetran makes it easy to deliver near-real-time streaming with no development or ongoing maintenance of complex pipelines. It supports ingestion from more than 100 SaaS providers, including Salesforce and SAP.
- dbt is an open-source library for creating and managing SQL-based data transformation. It enables anyone familiar with SQL to quickly build robust data pipelines and automate common or complex tasks.
3. A hyper-scalable underlying infrastructure
The data analytics tools provided in GCP and by its ecosystem of partners typically take advantage of GCP’s serverless architecture. This lets you seamlessly scale as you add new data sources or need to process increasing volumes of data, and maintains performance at busy times, by automatically spinning up extra compute power when needed.
You don’t have to worry about clusters, nodes and configuration, and the built-in load balancer and autoscaler are designed to tolerate extreme spikes in traffic, so you can automatically scale from no traffic to millions of requests per second, in seconds. That means you can focus on data and analysis rather than worrying about upgrading, securing or managing the infrastructure.
4. Rapid deployment of new tools
You can get up and running with most of the tools in GCP immediately. You don’t need to wait for servers to be provisioned, resources to be configured and software to be installed. You can even try out many of the tools in GCP for free as long as you don’t exceed monthly limits, and new customers get free credits to allow them to carry out a full assessment before they commit.
5. An affordable solution at every scale
Solutions running on GCP typically come with consumption-based pricing. You only pay for the resources you use, for as long as you use them. There are no high set-up costs or sudden steep increases when you cross the threshold into the next licence tier or exceed an included usage cap. So it’s easy to start small with a proof-of-concept project, while having the confidence GCP will support your long-term data analytics strategy.
Working with our data analytics and AI team
Our Data, Analytics and AI practice brings together a highly committed team of experienced data scientists, mathematicians and engineers. We pride ourselves in collaborating with and empowering client teams to deliver leading-edge data analytics and machine learning solutions on the Google Cloud Platform.
We operate at the edge of modern data warehousing, machine learning and AI, regularly participating in Google Cloud alpha programs to trial new products and features and to future-proof our client solutions.
We have support from an in-house, award winning application development practice to deliver embedded analytics incorporating beautifully designed UIs. We are leaders in geospatial data and one of the first companies globally to achieve the Google Cloud Location-based Services specialisation.
If you'd like to find out more about how we can help you build your own modern data and analytics platform, why not take a look at some of our customer stories or browse our resources. Needless to say, please get in touch with our team if you'd like more practical support and guidance.