Machine Learning (ML) is no longer just for big companies with deep pockets who can afford to hire the expertise. Cloud computing and a new generation of tools, such as Google BiqQuery ML and Cloud AutoML, have made ML accessible and affordable for businesses of every size.
Yet developing a successful ML programme is still not as straightforward or simple as you might think. Here are 6 steps to consider when planning your first ML project.
1. Choose the right proof-of-concept project
Your first project should focus on providing high-value insights. This will help establish early buy-in from influential stakeholders and allow you to demonstrate how ML can make a real difference to the business. Of course, it will also allow you to prove that the technology is viable and give you a chance to set off on the right foot when it comes to complying with existing governance and change control procedures.
2. Choose the right platform
More than most data analytics applications, ML is resource hungry. You need to build your ML applications on a hyper-scalable platform that has the compute, storage and network power you need, when you need it — at a price you can afford. Google BigQuery, for example, is a hyper-scale cloud data warehouse capable of executing SQL queries over petabytes of data and running your ML models in BigQuery itself, so there’s no need to stage data to another system. It will automatically scale on demand to match your needs, and offers consumption-based pricing, so you only pay for the compute and storage resources you use, when you use them.
3. Get your data in shape
The output of your ML model — whether you’re generating recommendations for shoppers on your e-commerce site, using it to target your marketing to attract more profitable customers, or optimising operations in your factories — will only be as good as the data you use to train, validate and test your model. You need to make sure that your data is a representative subset of your operational data and includes plenty of edge cases. The operational data you feed to the model, when in production, also needs to be in good shape. You need to chose ingestion and data transformation tools that let you build robust, fast pipelines — and then make sure you’re organising your cleansed data into the right data assets to support your use case.
4. Choose the right model
You need to choose a ML algorithm or model that’s appropriate for the business challenge you’re trying to solve, whether that’s forecasting future demand, segmenting customers or identifying defective items coming off a production line. Selecting the right model also involves taking account of how much data you’ll have to train it, how much data you’ll need to process once you’re in production, whether you need results that are very accurate or if a close approximation will do, and how quickly the model needs to run to deliver results in time for them to be useful.
5. Productionise your model
We’ve already mentioned that you need to create robust, fast pipelines to deliver your data to the model in good shape. You also need to make sure the model itself is correctly configured, to give you the best balance of accuracy and speed. Most models include a range of parameters that allow you to tweak how it behaves, from its error tolerance to how many iterations it performs, to what weighting it applies to the various variables in your data. Finally, you need to take advantage of the capabilities of your data analytics platform to automate execution of your model: scheduling the model to run at set times, for example, and automatically opening up and closing down compute resources as and when they’re needed.
6. Present the results in an "easy to digest" way
All this effort will be for nothing if you don’t make it easy for users to act on the results at the right time. Of course, if you’ve made the right choices in the previous steps, you should be getting good results in a timely fashion. You now need to choose tools that make it easy to create engaging dashboards and visualisations or allow the insights and recommendations to be seamlessly incorporated into applications, such as e-commerce sites and social media platforms. Naturally, you also need presentation and visualisation tools that are highly scalable and continue to provide a responsive experience to users no matter how much data you’re handling.
How Play Sports Network reached success through Machine Learning
The value of making the right choices at each step is demonstrated in some work we did with Play Sports Network, the world’s largest cycling media company and community. It provides a mobile app called GCN, a social media platform for cyclists, allowing like-minded cycling enthusiasts to upload, share, comment on and consume content.
Play Sports launched the app with a very basic approach to feeds, with all users seeing exactly the same content. However, the company always knew it wanted to be able to create more personalised feeds for each user. Its small in-house data science team began using ML to develop a content-based recommendation engine, but they soon ran into performance issues, with the model taking too long to run.
With training and coaching from Ancoris, the in-house team was able to complete the recommendation engine, allowing the ML model to be trained with new data every day in order to generate fresh personalised recommendations for millions of users in just 2.5 hours — at a cost of less than $100.
Ancoris also helped Play Sports get its data into better shape before feeding it into the ML model. Firstly, we helped them build an engagement scoring algorithm that sets the right weighting for various user actions tracked by the GCN app, such as liking or sharing a piece of content. The second step was to create a master index or Single Customer View (SCV) that consolidates all the different user IDs that belong to the same individual across all of Play Sports channels, to give complete insight into each unique user’s interactions.
Ancoris provided other useful insights, such as how to programmatically open up and close down compute resources on GCP, so Play Sports only pays for what it needs. David Taylor, technical lead at Play Sports, “In three months, we’ve been able to save up to a year of work, as well as develop the infrastructure needed to support our plans for the next two years.”
Working with our data analytics and AI team
Our Data, Analytics and AI practice brings together a highly committed team of experienced data scientists, mathematicians and engineers. We pride ourselves in collaborating with and empowering client teams to deliver leading-edge data analytics and machine learning solutions on the Google Cloud Platform.
We operate at the edge of modern data warehousing, machine learning and AI, regularly participating in Google Cloud alpha programs to trial new products and features and to future-proof our client solutions.
We have support from an in-house, award winning application development practice to deliver embedded analytics incorporating beautifully designed UIs. We are leaders in geospatial data and one of the first companies globally to achieve the Google Cloud Location-based Services specialisation.
If you'd like to find out more about how we can help you build your own modern data and analytics platform, why not take a look at some of our customer stories or browse our resources. Needless to say, please get in touch with our team if you'd like more practical support and guidance.