Data migration is not solely the domain of database-to-database data transition, it can often involve complex content migrations of documents and unstructured data.
But how do these migrations differ to the more conventional relational database style data migrations? What is the right strategy? What techniques are involved? What are the pitfalls?
We recently caught up with one of our members, David Katzoff, Managing Director of Valiance Partners, a specialist data migration technology and service provider operating in this space. Dave has kindly answered a few burning questions on this topic.
Data Migration Pro: What are the phases that you typically adopt in a content migration project? Do you make any special provisions because of the content aspect of the migration?
David Katzoff: The steps in Valiance’s migration methodology are the same for content management applications so we typically adopt the following data migration strategy:
- Analysis and Data Discovery
- Tool Selection
- Master Data Management
- Tool Configuration
- Data Cleansing
- Dry Runs
- Formal Testing
- Production Execution
- Post Production Support
Yes, you’re absolutely right, content management offers unique challenges along the way. Here are a few examples:
- Identify the legal “original”. Interestingly enough, not all companies can identify the legal softcopy!
- Migration of Users including handling of ownership for users who are no longer active in the source system
- Verify network throughput for large scale migrations or migrations which distribute content over networks with latency (e.g. across the pond) – When you are moving GB’s or TB’s of content, this matters.
Analysis and Data Discovery
- Determine if metadata in the source is sufficient for target document process.
- Determine what actions must be performed to “initialize” content in the new system
- Review any external integrations that need to be maintained or created.
- Review renditions, versions, document links, folder hierarchy and document assemblies
- Work with the users to verify the scope of content to be migrated.
- Review suitability of source data to target data dictionary
Content management migration requires support for the CM application, for example, SharePoint or Documentum. The obvious issue is that it is not easy to evaluate prospective tools without performing some qualification with your system and common challenges with CM migrations include performance or functional shortcomings of the tool.
Master Data Management
- Harmonize document templates and workflow processes
- Leverage existing MDM system to validate/populate content metadata
Recommendation for Verification
- Verify that the documents have not been modified in any way
- Ensure that each document is associated with the appropriate application record
Data Migration Pro: How do you typically go about testing whether the content has successfully migrated?
David Katzoff: At Valiance, we always recommend that our clients use automated testing for any data migration engagement where there is any compliance or business risk. When testing content management we perform additional testing for the following:
- Content files were migrated without change and are associated with the correct record in the new application,
- Verify versions are all present and found in the correct sequential order,
- Verify that any rendition is available,
- Verify content is in the correct location or folder, and
- Test all of the metadata and associated application data using the appropriate transformation rules.
Additionally, there are functional tests that should be considered including verifying document overlays and other application behaviour.
Data Migration Pro: What are some of the common misconceptions you witness with content migration?
David Katzoff: The most common misconceptions we see with content migration system migrations include:
- The tool selection process is often “paper-based” or simply based on a salesman’s recommendation and the resulting migration runs into unanticipated challenges. This is most evident by the number of tools we see at the same client. We strongly advocate creating a brief proof of concept before putting any tools into action. As a specialist technology provider in this space we obviously want clients to select our migration tools but taking an agnostic viewpoint it’s obviously best-practice to carry out a pilot and benchmark migration tools on your specific technical, business and commercial needs. It never ceases to amaze me how many companies fail to do this and they invariably hit problems downstream.
- Scope – As with many migrations, defining the scope of data to be migrated is difficult. It is not unusual to engage with a client that wants to migrate terabytes of content just to find that there are only gigabytes of useful content.
- Data quality – Just understanding that data quality is not an issue in the source system misses an important point: data quality needs to be tested against the requirements of the destination system. This is always a significant testing challenge.
- Content management applications are complex and the source and destination processes are typically not equivalent. This often leads to unanticipated functional differences and challenging post migration support. Valiance strongly recommends getting users involved often and early to avoid such challenges.
- Data mapping can be performed without knowledge of how the content will be used in the new system. Since there are typically significant functional and process differences between the source and destination applications, creating the data mappings between the two systems is not a straight forward field-to-field mapping. In many cases, the users tasked with creating the mapping have not been involved in the details of the new system and may not understand the usage of the data. This disconnect often results in mappings that are incorrect or identification of new or revised system requirements that need to be addressed in the system build. At a minimum, users involved in the migration mapping should have training on the destination system, and it is preferable that these individuals be involved in the destination system requirements definition.
- The migration team is not included in the change control process for the new system. There have been several instances where a change impacting a migration is made to the destination system without the knowledge of the migration team. As a result, the migration testing identifies new issues that were not previously detected and require a subsequent run.
Data Migration Pro: Are there any compliance or regulatory controls required for content migrations, e.g. in the pharmaceutical sector?
David Katzoff: The pharmaceutical, biotech and medical device industries all need objective evidence to verify that a migration was complete “as designed”. This is true for any application considered to have compliance risk inclusive of content management. Valiance developed software for automated testing. TRUcompare provides testing of structured data using the required source to destination mappings, as well as the content, to verify that it has been migrated without change and is associated with the right record. We see these testing techniques being adopted in other industries as well.
Data Migration Pro: Finally, how does a content migration differ to a more traditional relational database style migration?
David Katzoff: There are a number of differences, here are some of the common ones:
- Storage volumes. We’re often dealing with extremely large document sets (terabytes )
- There is often a need to consider content conversions (authoring tools, links, publishing solutions all need to be taken into account)
- There are often processes such as approvals that may require weeks or months to complete – What do you do with the work in process?
- Testing needs to confirm that any legal document and audit trail have been migrated without change
- Scoping – One needs to consider how to parse out what’s valuable in large repositories
- Content management applications are not often transaction oriented and users, authors or approvers only interact with the system on an ad hoc basis
- The structured information transformation rules are often quite complex (the metadata or application data)
- The migration source and destination will require interfaces and support at the application layer
- The migration tools market for content management is still at an early stage when compared to the traditional ETL vendors
- Content renditions may require the splitting or merging of objects
- The migration approach tends to be more “object” oriented than “row” oriented
What is your experience of content data migration? What challenges have you faced? Why not add your comments in the section below?