In this best-practice article, John Platten of UK data migration consultants Vivamex, provides a simple strategy for managing the difficult challenge of determining which data to cleanse first in a data migration project.
Follow his approach to data cleanse prioritisation and help re-focus your data quality improvement strategy.
Why do we need cleanse prioritisation?
Cleanse prioritisation categories can be extremely useful in managing and driving out issues on a limited timescale or budget. In this article I will describe a method I have used successfully in the past on some of the most challenging and time limited data cleanses I have undertaken so far.
Data issue categories (cleanse prioritisation classes) should be capable of all of the following:
- Engaging a business’s attention
- Being readily grasped by all
- Supporting the drive to solve high priority issues
- Effective in deflecting expenditure of effort on lower priority items
- Achieving buy-in and backing from senior sponsors
It may appear curious that I list no technical objectives. However data cleanse is about people in the client business, their availability, their willingness to assist in the cleanse effort and carrying their hearts and minds through to eventual success. It is in fact rather more about effective Change Management than it is about Data Management.
Desirable characteristics of such a categorisation would include:
A logical structure that is clear to business
A clear indication of priority and urgency
Well defined classes with a strong boundaries between each
Whereas examples of undesirable characteristics are:
- A scheme that only appeals to our inner data technician
- A scheme that relies on technical rather than business language descriptions
- Ambiguity in the business meaning of a class of issue
- Ambiguity in the consequences of failing to cleanse a specific class
- Overlaps between classes or vague class boundaries
The ABC Method
The ABC Method is my own issue classification scheme, deployed successfully on several major implementation projects. It has been field proven and found to serve cleanse projects particularly well in high pressure, high risk ERP migration environments.
However, like any such scheme it has its relative strengths and weaknesses and I will cover some of these towards the end of the article.
Let’s start with my definition for classes A, B and C:
- A: The target system can not load or function with the problem data
- B: Data migrates but damages business processes
- C: Data is technically incorrect but will load and will not fully break a process
Or in tabular form:
Category A . Even small load issues can ripple across migration projects destructively. This is especially true of ERP implementations where the stated aim of the target system itself may be to consolidate and link information business wide in a manner that may not previously have been possible. This linkage presents its own challenges to the data migration team. For example: failure to load a key material record can cause a larger numbers of bills of material to fail to load which can then impact the load of outstanding production orders.
Sugar -> Medicine BOM’s -> Medicines -> Pharma Orders
It is unsurprising that the objective in many ERP migration projects is to achieve a 100% successful upload. 98% isn’t nearly good enough for an object that may be first in a long dependent chain. The Type A issue – failure to load – echoes this concern by putting load failures first in priority.
Category B. Let us say we have successfully dealt with all our A’s and are confident we can physically load the data onto the target system. Which issues are next highest in priority? I will use an example from an HR system upload to illustrate: Let us suppose that employees must undergo compulsory Health and Safety training every 12 months and that when we look at the legacy data we find LastCourseDate for several records is in the future because the year digit was entered incorrectly. The logic of the target system will not call the employee for retraining until it reaches this incorrect date plus an entire further year. The example issue would be entirely unacceptable in a safety critical business sector such as Nuclear Power or Pharmaceuticals.
Type B issues are business impacting data faults; Ones in which information has passed the initial upload criteria and reached the target system, only to interact with its logic to generate an undesirable business outcome.
Category C. Type C issues are then simply “the rest”. A particularly common example involves addresses that have the correct post code (zip) and business name but also a minor blemish such as the street name being spelt incorrectly. It’s always incredibly compelling to rush in and fix address data purely because the faults are obvious and easily comprehended. Technicians and business managers alike can fall into this trap and opt for giving the street names an additional polish only to discover too late that something far more important, such as their open invoices failing to load, was also at risk. Cue late nights, hand keying and a round of ulcer medicine for the Accounts Payable department. Don’t expect an invitation to their Go Live party!
On the other hand if your zip codes won’t fit the target system (and SAP is incredibly fussy on this point) that’s a fine example of a Type A fault. This example underlines that it is the issues that should be individually classified not the data object in its entirety.
A Second Layer
Sub Priorities 1, 2 and 3 can be used within the main classes to create a layer of sub-prioritisations as follows:
Note that it is a hierarchy rather than a grid. You may have to reinforce this point multiple times at first and bring out the diagram regularly to drive it home.
Should a mental picture of a 123 vs. ABC grid become stuck in the business’s consciousness you will encounter trouble as they will then ask questions such as whether a C1 is more important than an A3. It never will be, as we have already seen.
In practice I did find that the levels reinforced each other somewhat over time so that C1’s and A3’s were both rare beasts on the project. So that in later work I have tended to start with just A, B and C, introduce the lower level as needed during mid project if there is absolute forest of data problems to deal with and then finally drop it out towards the end when far fewer remain.
The Importance of Language
I began by stating that the coding scheme should be comprehensible to business as much as it is to technical teams. It follows that I have a related vocabulary I use during these explanations.
Dwelling on it might prevent you from making the scheme fully your own should you wish to, so these are just a few suggestions:
“Load Failure Issue”
“Process Failure Issue”
“Target of opportunity”
My favourite shorthand is possibly “Target of Opportunity” for Class C.
It doesn’t say you will do something but then again it doesn’t say you won’t. Acknowledging and listing someone’s pet issue with a lowered priority instead of wishing it away completely often leads to business acceptance of the main priorities.
I should add while on the subject of language that it is amazing how quickly the scheme is accepted by the business and enters their vocabulary. I’ve seen managers in the client business wander by and ask for “an update on their A1’s” with barely a second thought.
How Time Constraints Fit the Model
Many ERP projects involve finance data migrations.
Typically a finance migration has to hit a quarter-end or a year-end for practical reasons. Missing that target means running on at least an entire quarter with most of the ERP delivery team still on board. Given that major systems delivery by larger players such as IBM GS and Capgemini can involve up to 50 staff on anything from £600 for a junior consultant to £3K per day for the overall Programme Manager such a slippage can easily cost in the region of £2 million.
£2 million is a useful figure to bear in mind if you are still considering (against my advice) to just slip in an attractively straightforward street name cleanse ahead of addressing the other fields in key data objects such as Customer, Supplier and Material.
There can be other reasons why a migration deadline cannot be missed. One common example occurs when the legacy vendor does not want to support the system and will pull out completely at a predetermined date. I have also seen cases where an independent regulator required a system change under threat of halting operations completely if the implementation was not delivered on time.
Cost of missing an end quarter deadline in an ERP project ~ £2Million
There is also a natural link between the prioritisation classes and the absolute last day on which they can be fixed:
Category A – Must be overcome by Go Live day or manually keyed if not.
Category B – Process impacting data issues should ideally be fixed by Go Live day. But can be fixed by the business manually post load in extremis. Usually this should be planned to take place within no more than a month of migration while elements of the ERP delivery and migration team remain on board.
Health Warning: Going live with “the wrong” Type B’s or too many Type B’s can actually prove worse for the business than failing to migrate at all. Committing to a system that is fully loaded but inoperable is a well know trap in which the business can neither continue on with new system nor go back to legacy. Complete organisational failure can occur. I am saying that you can fix some types of Type B after migration within certain bounds – not that it is a good idea.
Category C – On examination many Type C issues can often be fixed by Business As Usual processes, some of which can be set up as processes in the renewed business, for instance in a centre of data excellence. So the question is again – why take them on in a high pressure migration environment.
Strength and Weakness in The ABC Method
I have alluded to the main weakness throughout: The categorisation I have described here is specifically designed to create a rather fierce “no prisoners” prioritisation that the business can readily comprehend and use as a touchstone to drive appropriate behaviour and outcomes.
It would probably be inappropriate for projects such as a steady-state, business as usual data cleanse where the objective itself might be to clean addresses completely and earn a postal service discount.
However this article is for the benefit of the Data Migration Pro community and I make few apologies for providing a really heavy hammer for knocking data into shape within a time and cost limited data migration.
(Thanks to Henrik L. Sørensen, for the Data Migration Pro blog entry that prompted this article).