Best Practices for Content Data Migration

Produced by David Katzoff of Valiance Partners

Data migration is not solely the domain of database-to-database data transition, it can often involve complex content migrations of documents and unstructured data.

But how do these migrations differ to the more conventional relational database style data migrations? What is the right strategy? What techniques are involved? What are the pitfalls?

Read the following guide to find out.

What are the phases of a typical content migration project?

The steps in our classic data migration methodology are pretty much the same for content management applications so we typically adopt the following data migration strategy:

  1. Planning

  2. Analysis and Data Discovery

  3. Tool Selection

  4. Master Data Management

  5. Tool Configuration

  6. Data Cleansing

  7. Dry Runs

  8. Formal Testing

  9. Production Execution

  10. Post Production Support

However, content management offers unique challenges along the way in contrast to standard Relational Database Migrations.  Here are a few examples:

Planning

  • Identify the legal “original”. Interestingly enough, not all companies can identify the legal softcopy!

  • Migration of Users including handling of ownership for users who are no longer active in the source system

  • Verify network throughput for large scale migrations or migrations which distribute content over networks with latency (e.g. across the pond) – When you are moving GB’s or TB’s of content, this matters.

Analysis and Data Discovery

  • Determine if metadata in the source is sufficient for target document process.

  • Determine what actions must be performed to “initialize” content in the new system

  • Review any external integrations that need to be maintained or created.

  • Review renditions, versions, document links, folder hierarchy and document assemblies

  • Work with the users to verify the scope of content to be migrated.

  • Review suitability of source data to target data dictionary

Tool Selection

Content management migration requires support for the CM application, for example, SharePoint or Documentum.  The obvious issue is that it is not easy to evaluate prospective tools without performing some qualification with your system and common challenges with CM migrations include performance or functional shortcomings of the tool.

Master Data Management

  • Harmonize document templates and workflow processes

  • Leverage existing MDM system to validate/populate content metadata

Recommendation for Verification

  • Verify that the documents have not been modified in any way

  • Ensure that each document is associated with the appropriate application record

How can you test whether the content has successfully migrated?

We always recommend you use automated testing for any data migration engagement where there is any compliance or business risk. 

When testing content management we perform additional testing for the following:

  1. Content files were migrated without change and are associated with the correct record in the new application,

  2. Verify versions are all present and found in the correct sequential order,

  3. Verify that any rendition is available,

  4. Verify content is in the correct location or folder,  and

  5. Test all of the metadata and associated application data using the appropriate transformation rules.

Additionally, there are functional tests that should be considered including verifying document overlays and other application behaviour.

What are some of the common misconceptions of a migration?

The most common misconceptions we see with content migration system migrations include:

  1. Tool Selection: The tool selection process is often “paper-based” or simply based on a salesman’s recommendation and the resulting migration runs into unanticipated challenges.  This is most evident by the number of tools we see at the same client.  We strongly advocate creating a brief proof of concept before putting any tools into action. As a specialist technology provider in this space we obviously want clients to select our migration tools but taking an agnostic viewpoint it’s obviously best-practice to carry out a pilot and benchmark migration tools on your specific technical, business and commercial needs. It never ceases to amaze me how many companies fail to do this and they invariably hit problems downstream.

  2. Scope: As with many migrations, defining the scope of data to be migrated is difficult.  It is not unusual to engage with a client that wants to migrate terabytes of content just to find that there are only gigabytes of useful content.

  3. Data Quality: Just understanding that data quality is not an issue in the source system misses an important point:  data quality needs to be tested against the requirements of the destination system.  This is always a significant testing challenge.

  4. Content Management Applications: These are complex and the source and destination processes are typically not equivalent.  This often leads to unanticipated functional differences and challenging post migration support. Valiance strongly recommends getting users involved often and early to avoid such challenges.

  5. Data Mapping: This can be performed without knowledge of how the content will be used in the new system.  Since there are typically significant functional and process differences between the source and destination applications, creating the data mappings between the two systems is not a straight forward field-to-field mapping.  In many cases, the users tasked with creating the mapping have not been involved in the details of the new system and may not understand the usage of the data.  This disconnect often results in mappings that are incorrect or identification of new or revised system requirements that need to be addressed in the system build.  At a minimum, users involved in the migration mapping should have training on the destination system, and it is preferable that these individuals be involved in the destination system requirements definition.

  6. The Team: The migration team is not included in the change control process for the new system.  There have been several instances where a change impacting a migration is made to the destination system without the knowledge of the migration team.  As a result, the migration testing identifies new issues that were not previously detected and require a subsequent run.

How do compliance or regulatory controls required for content migrations, e.g. in the pharma sector?

The pharmaceutical, biotech and medical device industries all need objective evidence to verify that a migration was complete “as designed”. 

This is true for any application considered to have compliance risk inclusive of content management.

Valiance developed software for automated testing. TRUcompare provides testing of structured data using the required source to destination mappings,  as well as the content,  to verify that it has been migrated without change and is associated with the right record. 

We see these testing techniques being adopted in other industries as well.

How does a content migration differ to a more traditional relational database style migration?

There are a number of differences, here are some of the common ones:

  1. Storage volumes. We’re often dealing with extremely large document sets (terabytes )

  2. There is often a need to consider content conversions (authoring tools, links, publishing solutions all need to be taken into account)

  3. There are often processes such as approvals that may require weeks or months to complete – What do you do with the work in process?

  4. Testing needs to confirm that any legal document and audit trail have been migrated without change

  5. Scoping – One needs to consider how to parse out what’s valuable in large repositories

  6. Content management applications are not often transaction oriented and users, authors or approvers only interact with the system on an ad hoc basis

  7. The structured information transformation rules are often quite complex (the metadata or application data)

  8. The migration source and destination will require interfaces and support at the application layer

  9. The migration tools market for content management is still at an early stage when compared to the traditional ETL vendors

  10. Content renditions may require the splitting or merging of objects

  11. The migration approach tends to be more “object” oriented than “row” oriented

 About David Katzoff

david-katzoff

David Katzoff is managing director of product development and chief architect for Valiance Partners.

David Katzoff brings more than 25 years of software product development, applications engineering, systems integration, project management and compliant business solutions design experience to his role. Much of that experience is concentrated in enterprise-class research, manufacturing and business and marketing systems for premier clients in the life sciences, financial services and manufacturing sectors.

As a co-founder of Valiance, David led the initial development of Valiance’s products TRUcompare and TRUmigrate. His thorough understanding of the supported platforms, user interface design and novel testing techniques directly resulted in Valiance’s unique product direction path.

https://www.valiancepartners.com