Best Practices for Content Data Migration

Data migration is not solely the domain of database-to-database data transition.

It can often involve complex content migrations of documents and unstructured data in conjunction with its more structured counterparts.

But how do these migrations differ to the more conventional relational database style data migrations? What is the right strategy? What techniques are involved? What are the pitfalls?

In this guide, you will learn the core steps to implementing a Content Data Migration Strategy, courtesy of David Katzoff, from Valiance Partners.

How does a Content Migration differ to a traditional Application Data Migration?

For those organisations unfamiliar with content migrations, you need to remember there are some key differences.

Whilst the phasing is similar, there are some differences between content and application data migrations that are worth highlighting:

  1. Storage Volumes: We’re often dealing with extremely large document sets (terabytes ) with content compared to traditional databases

  2. Content Conversion: There is often a need to consider content conversions (authoring tools, links, publishing solutions all need to be taken into account), the conversion process can be more complex

  3. Approval Processes: There are often processes such as approvals that may require weeks or months to complete – What do you do with the work in process?

  4. Content Testing: Testing needs to confirm that any legal document and audit trail have been migrated without change

  5. Content Scoping: One needs to consider how to parse out what’s valuable in large repositories

  6. Ad hoc Transactions: Content management applications are not often transaction oriented and users, authors or approvers only interact with the system on an ad hoc basis

  7. Complex Transformation: The structured information transformation rules are often quite complex (the metadata or application data)

  8. Application Interfaces: The migration source and destination will require interfaces and support at the application layer

  9. Maturity of Content Tools: The migration tools market for content management is still at an early stage when compared to the traditional ETL vendors

  10. Content Renditions: Content renditions may require the splitting or merging of objects

  11. Ideological Differences”: The migration approach tends to be more “object” oriented than “row” oriented

What other considerations are important?

What are some notable considerations required between content and classic data migrations where data is migrated as a result of a change in application vendor?

Whilst the phases of a content migration do not differ substantially (see below for phases), there can be big differences in a number of areas including:

Planning / Discovery / Tooling / Mastering / Verification

Most of the issues relate to the unique characteristics of content and the considerable amount of infrastructure that must support it.:

 Planning

  • Identify the legal “original”. Interestingly enough, not all companies can identify the legal softcopy!

  • Migration of Users including handling of ownership for users who are no longer active in the source system

  • Verify network throughput for large scale migrations or migrations which distribute content over networks with latency (e.g. across the pond) – When you are moving GB’s or TB’s of content, this matters.

Analysis and Data Discovery

  • Determine if metadata in the source is sufficient for target document process.

  • Determine what actions must be performed to “initialize” content in the new system

  • Review any external integrations that need to be maintained or created.

  • Review renditions, versions, document links, folder hierarchy and document assemblies

  • Work with the users to verify the scope of content to be migrated.

  • Review suitability of source data to target data dictionary

Tool Selection

Content management migration requires support for the CM application, for example, SharePoint or Documentum. 

The obvious issue is that it is not easy to evaluate prospective tools without performing some qualification with your system and common challenges with CM migrations include performance or functional shortcomings of the tool.

Master Data Management

  • Harmonize document templates and workflow processes

  • Leverage existing MDM system to validate/populate content metadata

Recommendation for Verification

  • Verify that the documents have not been modified in any way

  • Ensure that each document is associated with the appropriate application record

What are the Phases of a Content Migration?

Content migrations do not differ significantly to a standard data migration in terms of the methodology or sequence of steps undertaken.

For this reason, you can apply the standard migration cycle to content migrations:

  1.  Planning / Setting up your project office, scoping and scaling your project, onboarding the right partners and team members

  2. Analysis and Data Discovery / Performing detailed assessments, understanding the content landscape

  3. Tool Selection / You will almost certainly require a specialist content migration tool

  4. Master Data Management (MDM) / Determining your ‘Mastering’ strategy for the content as it transitions to the target

  5. Tool Configuration / Once your migration tool is onboarded, it will require alignment to your specific content platform and architecture

  6. Data Cleansing / Resolving legacy broken links, missing metadata and many, many other types of data quality issues

  7. Dry Runs / Executing multiple ‘prototype’ go-live migration processes to validate likelihood of success and to train all in the live strategy

  8. Formal Testing / Extensive testing across the entire migration architecture

  9. Production Execution / Final go-live launch and execution

  10. Post Production Support / Putting in place ongoing monitoring and any local support measures

How to test your content migration

We recommend that you use automated testing for any data migration engagement where there is any compliance or business risk. 

When testing content management it is advisable to perform additional testing that checks for the following:

  1. Content files were migrated without change and are associated with the correct record in the new application,

  2. Verify versions are all present and found in the correct sequential order,

  3. Verify that any rendition is available,

  4. Verify content is in the correct location or folder,  and

  5. Test all of the metadata and associated application data using the appropriate transformation rules

Additionally, there are functional tests that should be considered including verifying document overlays and other application behaviour.

What are some of the common misconceptions of a Content Migration?

Many people misunderstand the true scope and complexity of a content migration, and it can lead them into problems.

Here is a round-up of the most common misconceptions we see on client projects and across the industry in general.

  1. Tools will just work”: The tool selection process is often “paper-based” or simply based on a salesman’s recommendation and the resulting migration runs into unanticipated challenges.  This is most evident by the number of tools we see at the same client.  We strongly advocate creating a brief proof of concept before putting any tools into action. As a specialist technology provider in this space we obviously want clients to select our migration tools but taking an agnostic viewpoint it’s obviously best-practice to carry out a pilot and benchmark migration tools on your specific technical, business and commercial needs. It never ceases to amaze me how many companies fail to do this and they invariably hit problems downstream.

  2. All of our content is valuable” – As with many migrations, defining the scope of data to be migrated is difficult.  It is not unusual to engage with a client that wants to migrate terabytes of content just to find that there are only gigabytes of useful content.

  3. If it’s good in the source, it’s fine for the target”: Data quality – Just understanding that data quality is not an issue in the source system misses an important point:  data quality needs to be tested against the requirements of the destination system.  This is always a significant testing challenge.

  4. The target functions will be the same as the source”: Content management applications are complex and the source and destination processes are typically not equivalent.  This often leads to unanticipated functional differences and challenging post migration support. Valiance strongly recommends getting users involved often and early to avoid such challenges.

  5. The mapping will be straightforward”: Many people believe that data mapping can be performed without knowledge of how the content will be used in the new system.  Since there are typically significant functional and process differences between the source and destination applications, creating the data mappings between the two systems is not a straight forward field-to-field mapping.  In many cases, the users tasked with creating the mapping have not been involved in the details of the new system and may not understand the usage of the data.  This disconnect often results in mappings that are incorrect or identification of new or revised system requirements that need to be addressed in the system build.  At a minimum, users involved in the migration mapping should have training on the destination system, and it is preferable that these individuals be involved in the destination system requirements definition.

  6. We don’t need to keep the migration team in the loop”: Often, the migration team is not included in the change control process for the new system.  There have been several instances where a change impacting a migration is made to the destination system without the knowledge of the migration team.  As a result, the migration testing identifies new issues that were not previously detected and require a subsequent run.

About the Author

This guide was prepared by David Katzoff, Vice President System Integration Services at Valiance Partners.

David has spent much of his career focusing on product and strategy for content management migrations. His efforts have been leveraged from some of the largest systems consolidation efforts completed to date as such clients as Amgen, Pfizer, Wyeth, Celgene, Bayer-Schering, Covidien, BMW and others.