How Day to Day Data Becomes Predictive Intelligence

Although predictive analytics systems have become more popular in the last couple of years the term and the systems themselves still have a great deal of mysticism behind their definitions and operations. In this post I will reveal the best-practices based process we follow when delivering our predictive analytics solution in an effort to remove some of the mysticism surrounding these valuable systems.

Let me set the stage by defining what predictive analytics is and what information is needed. As the name suggests, predictive analytics systems attempt to forecast trends and behavior based on historical information. Essentially they predict what will happen given past experience. A good marketing example is product bundling or cross selling. If many customers are buying Blue-Ray DVD players and a Spider-Man DVD then the predictive analytics system will report the correlation and possibly drive a new campaign to offer a movie-player bundle.

Not surprisingly, a predictive analytics solution is built on a foundation of data, specifically operational data. Operational data is a collective term having several definitions but for now we will define it as any data originating from a business operations system. Customer order information, on-line shopping activity and direct-mail responses are all examples of operational data.

There you have it. Predictive systems use your operational data to prophesize the future. In our case we are forecasting customer specific trends and predicting how your customers will behave in various marketing scenarios. Now that we know what we are dealing with let’s get into how your data is turned into a valuable analytics solution. The first step involves finding the right data to work with.

Data Selection and Retrieval

As you can imagine even a small business generates vast amounts of operational data so we must filter out the noise by locating and identifying the data relevant to our predictive analytics solution. Just like preparing to buy groceries this step requires a human to review the available data sources (on-line traffic logs, historical orders, and customer portfolios) and then grade each data source by fidelity and quality. The data grading checklist used by Istobe is too comprehensive to discuss in this post but here are some example questions to help you do the same:

  • Is the data redundant (e.g., do multiple account or customer numbers exist?)
  • Is the data updated by a human or a machine?
  • What is the data’s lifetime? Or how long does the data stay intact?
  • If the data is related to another source how is the relation made?
  • Does the data drive any business decisions or is it directly used in any reports?

After each data source is graded we can start to figure out what to keep and how to improve it. For the data sources that we want to keep it is usually necessary to filter out dirty data by running it through a cleansing process. You may be surprised to hear that your data is probably very dirty but even in 3rd party systems dirty data exists. Imagine these scenarios and you should get a feel for the hundreds of other ways dirty data can get inserted into your data sources:

  • Users trying out new features in a CRM system
  • Test data inserted for quality control
  • Data entry errors
  • Historical data that was updated but never removed
  • System upgrades or merges

In its most basic form the cleansing process sets out to eliminate the dirt by:

  • Standardizing specific values e.g. date and time formats
  • Removing duplicate information
  • Removing inconsistent data (e.g., orders which were never completed)

The Data Selection and Retrieval phase is the most intrusive (as it requires collaboration between the custodian(s) of the data and the group building the predictive analytics system) but it is also the most important as it sets the foundation from which everything else is built.

In the next post I will discuss how the cleansed data is used in the Knowledge Creation phase.

Leave a Reply

You must be logged in to post a comment.