Back to Blog

Getting Started with the Google Prediction API

September 29, 2011

By Matt Mombrea
No Comments

The Google Prediction API is a Google labs project that can aid you in many types of predictive analysis and content recommendation. The nature of the software makes is difficult to explain exactly what it is and how it works. In fact, Google itself does not even supply concrete definitions, only examples of its uses.

In a nutshell, you supply Google with a file full of historical data points that influence a single “answer” result. Google then applies machine learning techniques to predict a likely future outcome.

The first thing (and very important) is to figure out the “answer” you need to be returned from the prediction API.  The answer will be the first data field in your training model and can be thought of as the prediction. There are two types of answers, categorical (text based) and regression (number based). The answer to your  scenario might be a radio program name. You should spend a good deal of time considering all possible options for your answer since you can only have 1. A good answer will be able to tell you as much as possible about a prediction.

Once you define your answer you need to create your training model. The training model should be known scenarios, including a known answer and all data points related to that decision. Again, take care when desiging this model as the prediction is only as good as the data you train it on. Garbage in Garbage out. Another important consideration to note is that you should only use data points in the model that you will have access to for the prediction (excluding the answer).

For example you have a website that allows users to listen to radio programs online. A user listened to an NPR program first, then 5 other. NPR is your known answer and the other programs are the related data points:

“NPR”,”Program1″,”Program2″,”Program3″,”Program4″,”Program5″

You would then create another csv line for every known match up you currently have recorded (typically this is an automated process).

Once you have a file full of these, you can train your model on that file. Once your model is trained, you can ask it to predict. A prediction request is basically the same except you leave off the first data point since that is what you’re asking to be predicted. If a new user comes to your site and listens to Program1 – Program5 (or close to it) and you submit a prediction request with the 5 items they have listened to, the system would return “NPR” which you could then recommend to the user.

The technical requirements for actually accomplishing this are fairly complicated. I’ll design a tutorial for creating a working prediction example if I get any requests for one.

If you’ve got an hour to burn, check out the Google I/O discussion on the prediction API:

Matt Mombrea

Matt Mombrea

Matt is a longtime entrepreneur and software engineer. He also sits on the board of Computer Science at SUNY Fredonia. Born and raised in Buffalo, Matt is passionate about building and maintaining great businesses in Buffalo. Apart from leading Cypress North, Matt architects our company’s datacenter, engineers dozens of custom applications, and directs the rest of the development team.

See Matt's Most Recent Posts

Share this post

Leave a Reply

Search our blog

Start A Project

Categories

What's next?

Well...you might like one of these

Article

11 Power Twitter Searches That Can Be...

While Twitter's built in search is the norm, I've found that...

Read article

Article

Setting up a small office or home office...

In part one of this series on setting up a SOHO Voip...

Read article

Article

Hacked? Here’s How To Remove The...

As an avid internet connoisseur you've likely seen Google's...

Read article