Getting Started with the Google Prediction API

By Matt Mombrea

The Google Prediction API is a Google labs project that can aid you in many types of predictive analysis and content recommendation. The nature of the software makes is difficult to explain exactly what it is and how it works. In fact, Google itself does not even supply concrete definitions, only examples of its uses.

In a nutshell, you supply Google with a file full of historical data points that influence a single "answer" result. Google then applies machine learning techniques to predict a likely future outcome.

The first thing (and very important) is to figure out the "answer" you need to be returned from the prediction API.  The answer will be the first data field in your training model and can be thought of as the prediction. There are two types of answers, categorical (text based) and regression (number based). The answer to your  scenario might be a radio program name. You should spend a good deal of time considering all possible options for your answer since you can only have 1. A good answer will be able to tell you as much as possible about a prediction.

Once you define your answer you need to create your training model. The training model should be known scenarios, including a known answer and all data points related to that decision. Again, take care when desiging this model as the prediction is only as good as the data you train it on. Garbage in Garbage out. Another important consideration to note is that you should only use data points in the model that you will have access to for the prediction (excluding the answer).

For example you have a website that allows users to listen to radio programs online. A user listened to an NPR program first, then 5 other. NPR is your known answer and the other programs are the related data points:


You would then create another csv line for every known match up you currently have recorded (typically this is an automated process).

Once you have a file full of these, you can train your model on that file. Once your model is trained, you can ask it to predict. A prediction request is basically the same except you leave off the first data point since that is what you're asking to be predicted. If a new user comes to your site and listens to Program1 - Program5 (or close to it) and you submit a prediction request with the 5 items they have listened to, the system would return "NPR" which you could then recommend to the user.

The technical requirements for actually accomplishing this are fairly complicated. I'll design a tutorial for creating a working prediction example if I get any requests for one.

If you've got an hour to burn, check out the Google I/O discussion on the prediction API:

By Matt Mombrea


Leave a Reply

Your email address will not be published. Required fields are marked *

Meet the Author

CTO / Partner

Matthew Mombrea

Matt is our Chief Technology Officer and one of the founders of our agency. He started Cypress North in 2010 with Greg Finn, and now leads our Buffalo office. As the head of our development team, Matt oversees all of our technical strategy and software and systems design efforts.

With more than 19 years of software engineering experience, Matt has the knowledge and expertise to help our clients find solutions that will solve their problems and help them reach their goals. He is dedicated to doing things the right way and finding the right custom solution for each client, all while accounting for long-term maintainability and technical debt.

Matt is a Buffalo native and graduated from St. Bonaventure University, where he studied computer science.

When he’s not at work, Matt enjoys spending time with his kids and his dog. He also likes to golf, snowboard, and roast coffee.