by James Lawson, contributing editor
Darren Cudlip | commercial director, GB Group
Jim Conning | managing director, data services, Royal Mail
Mark Webb | operations director, Alchemetrics
Mike Fox | director, UKChanges
Murdo Ross | head of solution design, DST Customer Communications
In today’s real-time marketing world, the need for speed has never been more pressing, which is why partnering with a Marketing Service Provider can deliver the fast, efficient solutions marketers require.
Speed is ever more vital in marketing, digital channels in particular. Whether it’s delivering an intelligent offer or verifying an entire UK customer database in an hour, extremely high performance systems are needed. Marketing Service Providers (MSPs) are second to none in coping with these demands.
Batch data processing is a seriously intensive job, potentially comparing multi-million customer files against hundreds of millions of reference records. Speed is of the essence as clients often schedule batch cleansing during short breaks in their operations. However, in most cases, overnight processing is perfectly acceptable.
“Real time can be something of a buzz word,” says Murdo Ross, Head of Solution Design at DST Customer Communications. “No-one expects a real-time response if you upload five million records for cleansing.”
High speed batch cleansing is UKChanges’ speciality. For one client, the MSP applies all transactional changes from over 70 websites.
“Every night, at a period of low activity, we provide a full refresh of changes from the past 24 hours, updating known records and adding new rows and re-segmenting the data,” says Mike Fox, Director at UKChanges, adding that this update cycle also triggers a wave of outbound messaging.
To ensure all the websites remain available 24/7, the MSP employs two databases. One supports website operations while the other offline backup is being updated. When ready, this is then swapped for the live version.
Maximising speed for this kind of business-critical batch processing demands optimised software and hardware. For the latter, it could mean the latest multi-core servers or adding RAM for in-memory processing to boost speed by a factor of a thousand. Where storage read/write times are the limiting factor, flash – or solid state – storage score over spinning discs.
“Minimal disk access time is the name of the game at the moment,” says Fox. “Storing relevant data for matching in the RAM means the information can be accessed faster.”
Like most of business IT today, MSPs are moving to private and public cloud infrastructures which offer economic and flexible computing horsepower. For example, Royal Mail is currently building a private cloud system to distribute daily updates to PAF users.
“We need a scalable architecture that can burst if needed,” says Jim Conning, Managing Director, Data Services at Royal Mail. “The main consideration for Royal Mail is security so we won’t be using the public cloud.”
To take full advantage of server power and exploit that cloud scalability, the right software is just as important as the latest hardware. At UKChanges, their own bespoke applications distribute and optimise job processing across the servers according to current demand, so bottlenecks and delays are removed.
One example is holding the correct, pre-keyed reference data in RAM, ready to be matched against. But MSPs must also balance the drive for pace with matching accuracy.
“Compressing the data makes it smaller, lighter and quicker to load, but go too far and it impacts the accuracy of the matching,” says Fox, noting that multi-processor machines make a server farm faster and more robust but require careful coding to make full, optimal use of them.
“Incremental speed gains can be achieved by smoothing the loop of the process, but one inefficient element in code can easily double the time a record search requires,” he says. “Multiply that increase by several million records and all of a sudden you’ve really taken your foot off the gas.”
Though working with industry-standard software can deliver excellent results and be deployed very quickly, UKChanges prefers to use its in-house development team for custom-built systems.
“The benefits in flexibility, adaptability and suitability for multiple purposes are quickly felt when complex projects come around,” says Fox.
Before the systems can get to work however, first the data has to make across from the client’s own database. According to Mark Webb, Operations Manager at Alchemetrics, the main limiting factor in batch processing is in transferring data over the Internet. “For larger files, this can add substantial time,” he says.
Transferring and then processing an entire customer file unnecessarily is perhaps the most egregious waste of processing effort. As well as lengthening turnaround times, it also increases cost for the client.
“Organisations usually don’t know what’s changed in their data so we always have to process the whole file,” says Conning. “It would be far more efficient to only process changes.”
Clients sending 25 separate customer files in varying formats and degrees of cleanliness will likely wait even longer – and again pay more. “Rather than processing, preprocessing is what takes time – and that incurs more cost,” Conning says.
When a very fast response to a validation request is required, the solutions start to move away from batch. But where data volumes are low, it’s not such a tough ask.
“Checking log-in details is one example where you need an instant reply,” says Ross. “That involves small amounts of data and it’s not too hard using a highly-indexed database. Cleansing and fuzzy matching against all the reference data you hold is trickier.”
Validating single records – names, addresses, emails – in real time is well established. For example, UKChanges has helped its clients check against live suppression databases and the OSIS file for well over a decade. Marketing’s demand for real time validation has not been huge thus far, but this is changing fast as digital takes over.
DST recently implemented a hybrid approach for one client where the operational application checks each new address’s format in real time. Then it makes an API call to DST’s cleansing system every 15 minutes to fully validate the addresses against PAF.
“You can’t really run against 22 million addresses in real time unless you put it all in memory as a look-up table,” says Ross. “You could do that but it’s not really necessary in this case.”
A hybrid updating approach is also common where real time decisioning is driven by web behaviour. All the logic to drive next best action is built into the operational platform as business rules, which are then batch refreshed.
DST specialises in this kind of work, periodically rebuilding one client’s SCV-based segmentation then using derived variables like lifestage to drive offers. With around a million customers, the processing requirements aren’t too onerous.
“We do the entire consolidation every two hours using a mirrored database as the database is unavailable while it’s being built,” says Ross. “Once the build is complete, we swap that over to become the live operational one.”
In another example of rapid updating, UKChanges generates a range of dashboards and visualisations that it refreshes every ten minutes.
“The partners in the project were concerned that the frequency might cause issues,” says Fox. “We weren’t. This is where the value of our hugely experienced development team comes into its own.”
But what about “real” real time? When matching, updating and providing responses to queries within seconds or milliseconds is needed, system specifications start to head skywards.
“Real time is appropriate when you need an answer within the same web session,” says Ross. “On the systems side, there’s nothing really new at the moment but NoSQL databases and cloud-based services like AWS can take any load you need to handle – though all our cloud systems use private data centres.”
At GBG, NoSQL databases, private clouds and very fast interfaces all form part of their current state-of-the-art systems upgrade. The solution is modelled on the one used by Transactis to deliver an anti-GLIT (Goods Lost in Transit) claims fraud system for client Shop Direct, now the model for GBG’s ClaimsID service. Adopting a common architecture will let the company tap any of its diverse data services to build client solutions.
“We’ve totally evolved our systems over the last couple of years,” says GBG’s Commercial Director Darren Cudlip. “Around 25% to 30% of our clients have genuine requirements for genuine real time.”
Shifting from relational to NoSQL Graph databases increases speed, scalability and flexibility. Cudlip points to their ability to change data models without costly migrations and to scale horizontally (using extra server clusters or nodes more easily than relational technology).
“The normal speed constraint is around databases as they tend to scale vertically rather than horizontally,” says Cudlip. “With NoSQL, MSPs can scale out their database to meet their interfaces whilst still using a simple deployment architecture.”
GBG chose MuleSoft’s Enterprise Service Bus (ESB) to provide the backbone of its service-oriented architecture. The ESB integrates data and applications across both legacy and cloud systems, using API-led connectivity along with many other options. Put simply, it can link almost any systems together in in real time.
Client implementations based on this architecture are a few months away from going live. For example, Thomas Cook will employ it to support real time in applications like web personalisation, with the NoSQL central SCV taking a mix of batch and real-time feeds.
“At Thomas Cook, we can process updates in sub-200ms times across 50 different systems via the MuleSoft ESB,” says Cudlip. “Agents can query the SCV to get a live view of customer records alongside their current reservation system.”
To support near-real-time updates for clients like Haymarket, Alchemetrics relies on its Rest-API-based DQS data platform which runs the open source LAMP Stack technology incorporating Maria DB MySQL. Using partial updates (and so not creating new large tables and indexes) during operations lets them refresh the main SCV every few minutes.
“Clients can collect data directly from websites in near-real time whilst also contacting customers via email or SMS,” says Webb. “Over 700,000 new or altered records can easily be added via partial DQS updates like this every day.”
Alchemetrics optimises performance by constantly upgrading both software and hardware. For example, a recent upgrade from PHP version 5.5 to version 7 within its application stack gave a 25% speed boost.
“As RAM is cheap, this is another route to keeping on top of processing speed,” adds Webb. “Ensuring that the tables and indexes are optimised is equally important. The R&D team keep an open mind on the technologies we use and aren’t shy in suggesting better routes to go down.”
As Alchemetrics employs massively parallel processing to handle updating and matching, new records and updates can arrive in the database faster than match keys can be created. This can generate duplicates but the company has this angle covered too.
“To counter this, we have dedupe processes also running in parallel, which continuously match and merge each new individual record,” says Webb.
As client data and end use varies significantly, setting fixed match rules can mean sub-par results. That’s where manual expertise has traditionally come in, with operators tweaking rules to suit each file and its intended application. Instead, GBG is planning to apply machine learning to adjust matching rules over time to find the best balance between over and under matching.
“Over time, that will reduce the need for ad hoc manual intervention and replace it with constant monitoring and optimisation,” says Cudlip.
As data volumes keep increasing and turnaround times shorten, most marketing functions will struggle to keep on top of their customer systems. With their historical data management expertise and constant upgrading, MSPs remain the top option for staying ahead of the curve.