Published in Data content on November 2020. 12 minute read

Building a collaborative-based recommendation engine for Play Sports Network

  • Home
  • Explore
  • Blog
  • Building a collaborative-based recommendation engine for Play Sports Network

Today’s post is from David Taylor, technical lead at Play Sports Network. Play Sports is the world’s largest cycling media company and community, reaching and engaging with more than 40 million cycling fans and riders around the world. To complement Play Sports’ series of YouTube channels, the company provides a mobile app called GCN,  a social media platform for cyclists, allowing like-minded cycling enthusiasts to upload, share, comment on and consume content.

With our GCN app, our aim is to encourage people to become more engaged with our various brands, by keeping them on the app for longer and sparking conversations. We have a very varied user base, from people who want to know all about the latest in cycling technology to weekend riders who just want to get out on their bikes on Sunday mornings — and even people who don’t cycle but follow professional road racing events like the Tour de France.

From generic to personalised feeds

We launched the app with a very basic approach to feeds, with all users seeing exactly the same content. However, we always knew we wanted to be able to create more personalised feeds for each user. We set up a small data science team and began developing slightly more personalised feeds: taking account of factors such as the user’s language, for example. The next step was to create a recommendation engine to provide users with a truly personalised feed showing them the content most likely to interest them.

There are two kinds of recommendation engines we could use: content-based and collaborative. The content-based approach uses selections made by each user to filter what they see. A collaborative engine, by contrast, “crowdsources” its recommendations, identifying what content is likely to appeal to any individual user by looking at what other users with similar characteristics have liked. Neither is easy to develop, but collaborative filtering is computationally very intensive, so we focused initially on developing a content-based solution.

Selecting a future-proofed, scalable  infrastructure

Because we’re working with high volumes of constantly updated data about content and user activity, we looked for a cloud-based DBMS that could provide scalability and performance without significant up-front investment or ongoing costs. We chose Google BigQuery, but we found ourselves running into performance issues and our model was taking too long to run. We realised we needed help in figuring out how to make effective use of BigQuery, as well as other features in the Google Cloud Platform, to productionise the solution.

The Ancoris team's impact 

Once we began talking to Ancoris, however, they took the project to another level. They not only offered us the training and coaching we needed to complete our content-based recommendation engine and get it into production but also offered to build a collaborative engine for us in parallel.

The result was that, in three months, we’ve been able to save up to a year of work, as well as develop the infrastructure needed to support our plans for the next two years.

As part of the project, Ancoris introduced us to BigQuery ML, which makes it easy for data scientists to use and train a variety of Machine Learning models. We can now use our data, which is ingested hourly, to train a matrix factorisation model on a daily basis. The model generates a multi-billion row table containing personalised recommendations in just 2.5 hours — at a cost of less than $100.

With Ancoris support — delivered through a workshop approach, with two-hour video meetings held three times a week — we’ve also been able to complete our content-based engine. We’re about to go live with that in the GCN app, and will then look at moving the collaborative engine into production.

Some of the useful insights provided by Ancoris involved learning how to programmatically open up and close down compute resources on GCP, so we only pay for what we need, and how to use a third-party tool called dbt for our data ingestion. In fact, we’re so impressed by dbt and how easy it makes it to set up data pipelines and schedule them to run as often as needed that we’ll be using it in other projects.

Engagement scoring algorithm

To underpin these developments, Ancoris has helped us deliver two other pieces of core infrastructure. The first is an algorithm that scores engagement. For example, if a user bookmarks a bit of content, that scores more highly than if the user just likes it. Accurate engagement scoring is an important input for any recommendation engine model. Through an iterative process, Ancoris helped us work out the best way to score actions tracked in the user activity data captured by the app.

Single Customer View

The second element is a master index or Single Customer View (SCV) that consolidates all the different user IDs that belong to the same individual across all our channels. This gives us a complete insight into each unique user’s interactions with us, not only improving the accuracy of the recommendations we generate but also improving the accuracy of internal reporting used at board level.

But these technical solutions are only one of part of what has made this project a success for us. Equally valuable has been the way it’s accelerated our work. While our data scientists are extremely capable, they’re a very small team and focused on the data problem. The training, mentoring and knowledge transfer Ancoris has provided around tools, productionisation, and best practices has been just as important. It’s saved us from making a few mistakes and taking longer to figure out how to use the available tools to meet our goals.

 

Get in touch with Ancoris data and analytics team

Free resources

Please download any of our resources to help with your research and project specifications