Abstract of graphs and charts

Predicting the future

26th March 2015 • Features

by James Lawson, Contributing Editor

Predictive analysis may have come into existence as a way to make direct mail campaigns more effective, but predictive models are now being used at every step of the customer journey. However, companies must work to fully grasp what really influences the critical decisions their customers make, and advancements in data storage has a major part to play.

Rob Fuller | VP, Marketing Solutions, Strategy, Harte Hanks
Gerhard Heide | Director, Global Market Strategy, Pitney Bowes
David Schweer | Senior Product Marketing Manager for Campaigns and Analytics, SDL
Jeremy Stanley | Chief Data Scientist, Sailthru

There’s no doubt about it. Whether it’s which product to offer or how long to wait between contacts, predictive analytics are the most effective way to optimise individual marketing decisions. From their beginnings in direct mail, predictive models are now being applied at critical points across the whole customer journey.

Potent they might be, but with advanced techniques like marketing optimisation requiring tens or even hundreds of models for each combination of channel, offer, customer and product, companies are looking for ways to reduce the effort and investment required to apply predictive modelling to their own campaigns.

“You definitely have a drive towards ‘we need more analytics everywhere’,” says Gerhard Heide Director, Global Market Strategy, Pitney Bowes. “But there isn’t an analytics team in the land that has spare capacity, so companies need smarter, more efficient ways to create and deploy appropriate models more quickly. I use the phrase ‘pragmatic analytics’ to describe it.”

Providing simple tools for non-analysts to run basic what-if analysis is part of this pragmatic approach. “It helps them answer business questions themselves and to free up the analytics team to do more vital work,” says Heide.

Modern systems and software have a major contribution to make, offering hugely enhanced performance and ease of use over only a few years ago. Rob Fuller, VP, Marketing Solutions Strategy at Harte Hanks, sees the combination of contemporary database platforms and cheap processing power as opening up predictive analytics to a wider user base.

“Traditionally you needed a very large, separate analytics database and a complete reporting and analytics environment,” he says. “Tools like Hadoop offer great functionality straight out of the box and mean that there’s a much lower cost barrier than there was.”

Famously pioneered by Google, Hadoop allows huge volumes of data to be spread over large clusters of commodity servers and processed in parallel. More recent arrivals like Splice Machine have moved the game on further.

“Today, you can use high-speed database platforms like Splice Machine that are easily scalable with commodity hardware to the petabyte level, but still retain a relational structure,” says Fuller. “From an end user analyst’s perspective, there’s no change from the original relational database.”

Harte Hanks now uses this platform to run its client’s analytics work and their operational marketing from a single database. “It works for everything but the very largest implementations,” says Fuller. “This gives clients way more performance without any massive investment and you just bolt on more SSDs and servers as data volumes grow.”

Given the vast volumes of digital data that modern marketers have to deal with, systems like these are vital but are far from the whole answer. One prime example is the challenge of preparing gigabytes or even terabytes of data to produce a training dataset from which to build and test a model. This data manipulation – dealing with outliers, creating derived variables, excluding irrelevant data – can occupy days of an analyst’s time.

“There hasn’t been much investment to handle data prep at scale,” says Jeremy Stanley, Chief Data Scientist at Sailthru. “It’s still common to use SAS or a SQL database to do it instead of a scripting language.”

Open source scripting

Stanley cites open-source scripting tools as currently revolutionising data preparation. Packages like dplyr and its foundation, the R statistical programming environment, offer easy automation and the ability to tap into vast shared libraries of data manipulation, statistical modelling and visualisation tools.

“You used to have to write pages of code to handle basic operations, which are hard to update, maintain and re-use,” says Stanley. “Tools like dplyr let you express what you want to do much faster and in a more concise way. My own productivity has gone up by a factor of at least ten in the last five years.”

Both Fuller and Stanley also cite machine learning techniques like gradient boosting as offering huge benefits for model-building productivity, able to make use of iterative model optimisation processes that again exploit the vastly increased processing power available today.

“There have been big waves of change in model types and best practice,” Stanley says. “Most of the commercial modelling packages are at least five years behind the open-source community.”

So are these techniques good enough to completely replace statisticians, as other automated model-building tools have claimed to do in the past? By trialling multiple statistical techniques against test data, automated model-building packages aim to identify the most predictive variables, combine them into multivariate models and test them to find the most robust, best performing ones.

Unfortunately, automatic tools don’t offer a complete substitute for rigorous analytic thinking. Expert input is still needed at the outset to select the right variables that help translate a real-world business challenge into a data-only problem that a model can then help to solve.

“It’s all too easy to find correlations that don’t actually predict future actions by basing the model on data that already dependent on what you are trying to predict,” notes Stanley. “It’s impossible to fully safeguard against this automatically. No algorithm can understand where the data came from.”

That means humans are still needed to pick the most influential or driving variables to analyse and test, and to specify a clear positive or negative result to focus on. The software can then go into action, trained to look for the patterns (or combination of weighted variables) in historical data that lead most often to the right result. Skilled input is also required to pick the right machine learning algorithms to use for each application.

“You still really need to know what you are doing, it takes a lot of effort to find the really predictive data,” says Fuller. “You need someone who can relate to the business issues and isn’t afraid of the maths side. Automated modelling tools do continue to get better and are very good for optimising models that are already working, but I don’t see them as a useful place to start.”

So to get analysts, software and systems working in harmony, recruit some real experts. But while large enterprises routinely have in-house analytics teams, mid-tier companies can rarely afford to fund these positions. Moreover, in certain geographic areas, it’s almost impossible to find people with the requisite skills. Engaging the services of an MSP to analyse the data and build custom models is one common solution.

“There are significant benefits to working with a partner,” says Fuller. “We can provide these skills and help clients get up to speed so they don’t have to make a big investment immediately. We build the models and then help them optimise them – and they get to keep them.”

Another popular way to lower development costs is to work with pre-built models. Rather than starting from scratch, just package the most proven and powerful models in each vertical sector covering common challenges such as likelihood to convert, time to next purchase or likelihood to attrite.

“Fortune 500 businesses will have insight teams that can build models from scratch,” says David Schweer, SDL’s Senior Product Marketing Manager for Campaigns and Analytics. “Smaller companies will have smaller teams and tend to use pre-packaged models.

“We have four pre-built models as part of our Digital Experience Platform,” he continues. “They are mostly aimed at the retail sector and cover applications like next best action and most likely to buy. As they are embedded within our applications, clients can apply them immediately.”

SDL offers consultancy to help tweak these “good enough” models for each client. “They will improve the business but won’t be perfect, so the next step is to further edit and optimise the model,” says Schweer.

Sailthru offers a novel approach to pre-packaged modelling, with an extremely comprehensive SaaS platform spanning data collection, predictive model building and operational deployment. The company collects data based on the desired modelling application, using its own tagging system to identify and track customers in web, email and mobile channels as well as bringing in historical data via API and JSON feeds, loading data into a flexible MongoDB NoSQL database.

Sailthru’s predictive engine then generates probability scores, rankings, and estimated values for nine pre-specified actions, such as making a purchase within the next 24 hours, opting out of future contacts within the next week, and expected revenue within the next thirty days.

“We start building and applying models after one month, and typically have 90% of data required for full accuracy within three months,” says Stanley. “We will look at every purchase event and its nature to identify trends over time.”

Sailthru offers its users no choice at all over what they wish to predict: the whole process is completely automated. But why should they trust his pre-selected variables and behaviours to fit their business – and why is his automated software better than the tools he criticises as flawed?

“You’re sacrificing a general purpose tool for powerful customised models that really work,” argues Stanley. “All the client has to do is use the APIs to get the data into the platform.”

The idea is that, by using standardised models based on common, well-defined and well-understood customer behaviours and their driving variables, the in-house insight team can instead concentrate on building models that are specific to that company and require different techniques and data sets. Extensive testing, including the use of control groups to measure the impact on long-term customer behaviour, helps check model accuracy and improve performance.

Sailthru even addresses another classic bottleneck – applying models within operational software – by supplying its own tools to handle segmentation and message selection, and to deliver messages across email, mobile apps, on websites and in online ads.

“Today you want to use a model in real time,” says Heide, “so you need to express it in rules at the touchpoint so you can recalculate the score in real time based on the current interaction and suggest the next best action to take.”

This is the approach that Pitney Bowes client Nationwide uses to deliver prompts to customer-facing staff in the call centre and in-branch, as well as driving content and offer selection on the web. Maximising cross selling opportunities has delivered over 200% ROI on the Portrait system the building society uses. “They now sell more on inbound than any other channel,” notes Heide.

However, as marketing tech guru David Raabe recently noted, predictive analytics are mostly “still done by specialised vendors rather than built into the marketing automation platform”. Here, tools like IBM SPSS’s Modeler, Analytic Server and Collaboration and Deployment Services help build and maintain models operationally.

The company’s Decision Management suite brings these products together to form an automated closed-loop system, feeding the results of interactions back to rescore and rerun new models which can then be deployed to replace the old ones. One SPSS client saved 30 analyst days annually per model by adopting this workflow.

Despite the help automation like this provides, there’s still a lot of work to be done before predictive starts appearing everywhere. For a start, companies must really understand what really influences those critical decisions their customers make.

“It’s a tough challenge to map out the whole customer and prospect journey and then try to optimise every interaction,” concludes SDL’s Schweer. “First, management has to become more expert in translating business problems into issues that a model can solve.”

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

« »