Build a real-time recommendation engine with IBM Bluemix and PubNub – Part 1

By Shyam Purkayastha | Machine Learning

Feb 22
Image, Courtesy 123RF. Copyright: alfaphoto / 123RF Stock Photo

In this two-part blog series, I am going to show you how to build a real-time recommendation engine for advising on travel decisions. Travel time or duration is something that is always subject to a lot of speculation. Based on your travel experience along a particular route, you always bet on your intuition to decide when to start your journey. But if you are planning to travel for an all-important meeting to close a deal, a job interview that you have to crack at any cost, or a public event that you have to witness from the very beginning, then you better be prepared and alert, but how?

Note : This post was originally published in IBM Bluemix Blog

Rise of recommendation engines

The world is like a massive multiplayer game. Any action performed by a small subset of people can have a noticeable impact on a larger population. Many effects are gradual, like global warming. But there are some that occur at a much faster pace, sometimes almost instantly, so quick that they can change many times during a day. Think of changing traffic situations, the fluctuations in stock prices, or discount deals during a new product launch.

Our brain can make a qualitative assessment of these impacts and is tempted to make a decision based on some facts and observations, in our favor. Over the past couple of decades, we have also learned how to measure quantifiable data from the events relating to various systems, processes, and phenomena happening around the world. But our brains cannot work with such massive amounts of data. It can neither store such data nor is it designed to recall and process the data to derive quantitative insights.

Enter cognitive computing. We now have a new way of looking at data and solving this problem. Computers can store huge amounts of data. They can be programmed to apply numerical methods to solve a problem based on data, and they are fast! And we have now made some advances in this basic cognitive model by enabling the computer to mimic the human brain, by learning from past data and then making decisions based on that. So what can we do with that?

With cognitive computing, we have leapfrogged from building dumb software, programmed to be like our assistants, storing our data, running repeated and automated chores, and more. One of the applications that has come into existence with the advent of cognitive computing is recommendation engines. These can help us in many ways, like making more informed decisions, correcting our actions, or even limiting our choices according to our preferences. Their principal goal is to provide the best advice for performing a task, based on the available data and scenario. This advice mechanism provides an extra layer of intelligence that is not merely qualitative advice driven by a vague intuition of the mind, but an actionable insight backed by data and accurate measurements, which is what makes it a trusted recommendation.

The Travel Time Optimizer (TTO)

Traffic is a constantly changing phenomenon, and it is not humanly possible to predict the pattern based on a few trips down the road. At a broad level, you can use your intuition to make generic decisions like avoiding a busy hour. But what if you cannot choose the arrival time? When you know that you have to reach your destination on or before a certain predecided time, then the question that keeps hovering in your mind is “When should I start?” This problem compounds with changing external factors like weather. In such cases, the questions become more complex: Should I start a little early? How early? In such situations, getting real-time recommendations about the start time of your journey, based on the changing external factors along the travel route can be a great help.

To showcase the functionality of such a recommendation system, we have built an application on an Android mobile app called Travel Time Optimizer (TTO). This app can be a handy assistant to the users when they face such unpredictable questions related to their travel. This app works in unison with a back-end recommendation engine that is responsible for predicting the travel duration of a route, in real time, based on historical traffic data and prevailing external factors.

Factors influencing real-time travel recommendation

To build such a recommendation engine, we have taken into account some common factors affecting travel time which are based on our day to day observations. These are:

  • Weather: As we all know, weather can have a significant impact on the traffic. Most of the times when the weather worsens, we tend to make alternate arrangements.
  • Temperature: General observation suggests that temperature also plays a part in influencing our travel plans. However, it does not have as much influence as the weather.
  • Time of day / day of week: We all know how pleasant it is to drive on a weekend or go on an early morning drive, compared to getting stuck in a sulking busy hour log jam during weekdays. The time and the day of our travel do have an impact on our travel time.
  • Incidents: Any accident, road repair, or construction activity can have a significant impact on the travel time and can cause a traffic buildup.
  • Traffic: The historical traffic data itself provides some insights about the traffic pattern for a specific route.

Analysis of historical traffic patterns

To validate our assumption about the factors that influence traffic, we have done some tests by capturing the historical travel time for a few predefined routes and have plotted it against the factors. Given below is a plot of the historical travel time between Newark and Edison, New Jersey during the second half of March 2016, along with the temperature at Newark and Edison during that period. The data is plotted at 10-minute intervals.

Similarly, we also plotted the weather condition at both the locations for the same period.

Now, let’s also look at how traffic is impacted by day of the week.

It is tough to identify any pattern between the travel time and the various factors plotted in the graphs, but taking a closer look at the previous day-of-the-week plot, you can clearly see that the travel time has several peaks during the weekdays, whereas during weekends it is relatively flat.

Based on these findings, we can deduce that there is indeed a correlation between these factors and the travel duration. However, it is impossible to plot all the factors in a multidimensional graph, and a numerical analysis of this is beyond the scope of this post. So we will stick to our original set of assumptions based on common observations and build the recommendation engine based on the affecting factors.

Components of Travel Time Optimizer (TTO)

There are three components of the Travel time Optimizer application:

  • TTO recommendation engine: This is responsible for generating the recommendations and serving user requests sent from the TTO app.
  • TTO background data capture: This is a separate process that runs in the background and captures travel time for a few specific routes and the factors influencing traffic at that time. The data is stored in a MongoDB database.
  • TTO App: The mobile app that assists the users in posting their travel queries to the TTO recommendation system.

TTO recommendation engine

The TTO recommendation engine runs on IBM Bluemix. The IBM Bluemix cloud platform offers an unmatched computing resource available on the fly that can cater to any computational requirement across many programming languages and platforms. It is also well-suited for building applications around recommendations that require CPU intensive calculations and procedures.

TTO background data capture

The background data capture process serves as the backbone of this application because it is responsible for capturing all the data for measuring the factors that affect traffic. It also generates the travel duration predictions based on that data.

The data is captured for three predefined routes, namely:

  • Newark to Edison, New Jersey
  • Brooklyn to Denville, New York
  • Mount Zion to San Francisco General Hospital in San Francisco, California

For capturing data, this process relies on two public APIs

  • MapQuest: For getting the current travel time and incidents reported, if any, for a route.
  • Yahoo Weather API: For getting the weather and temperature condition for the route.

TTO app

The TTO app is a standard Cordova-based Android app. It provides a simple UI for the users to ask their travel-related questions, which can be answered by the server in the form of recommendations.

PubNub as communication middleware between recommendation engine and application

PubNub acts as the communication middleware between the TTO recommendation engine and the TTO app. PubNub provides a cloud-based real-time Data Stream Network that supports more than 70+ SDKs, such that it can enable any device to communicate with any other device on the Internet. This application uses two of PubNub’s SDKs for all components to communicate seamlessly with each other. These are:

PubNub works on the concept of a channel, which is a secured pathway for sending messages from one application to another application. This application relies on several PubNub channels to enable communication between the components. The PubNub request channel is used by the TTO app to send a user-specific request to the TTO recommendation engine. The PubNub recommendation channel is a private, user-specific channel that is used to send recommendations to each user.

As you have seen, traffic situations can also change within a few minutes based on the interplay of several factors. Therefore, it is also important to alert the users with real-time recommendation updates. Getting real-time recommendation updates on the changing external factors is the key to making a more informed decision. PubNub’s real-time messaging network can deliver messages from one device to another device in a secure and reliable way. By using PubNub as the backbone for this application, we can cater to millions of users, delivering trusted recommendations instantly.

Steps for generating recommendation

To offer recommendations, the TTO recommendation engine must have a prediction for travel duration. The data captured by the TTO background process contains the historical travel durations for the predefined routes at 10-minute intervals. This data can be for a few days to even a few months and depends on how long the background process has been running. Predicting a future travel duration for a particular time of day based on historical travel duration alone is not sufficient, as you have already seen the impact of changing factors. So, this requires an N-dimensional prediction based on all the factors that affect travel duration.

For this purpose, we have made use of the Scikit-learn Python library and used the KNN algorithm to derive a prediction that takes into account all the factors. As the quantum of historical data grows to cover all possible combinations of factors, the algorithm gets better at predicting.

What’s next

Once the predictions for travel duration is available, the process of recommendation boils down to two steps:

  • Deriving predictions for travel duration for a period into the future.
  • Generating recommendation based on predictions, as per user requested time.

In the second part of this post, we will take a closer look at both of the steps in achieving a functioning recommendation engine and also a functioning TTO app from the user’s point of view.


About the Author

Shyam is the Creator-in-Chief at RadioStudio. He is a technology buff and is passionate about bringing forth emerging technologies to showcase their true potential to the world. Shyam guides the team at RadioStudio, a bunch of technoholiks, to imagine, conceptualize and build ideas around emerging trends in information and communication technologies.