Sunday, June 05, 2011

Importing Content into Tridion

Something I get asked almost every time a new project starts - or a new consultant or partner starts working with Tridion - is how to import content into Tridion.

It kinda baffles me, since I always thought it is pretty easy to import content into Tridion, but apparently that's not the case. Here's a few things to consider, I'm pretty sure most of this applies to _any_ content management system, and is not really related to Tridion.

Well formed content
Though it seems obvious, I see many content migrations coming from systems that do not enforce a strict XML schema as Tridion does, and therefore a simple one-to-one migration will fail miserably. "Easy" workarounds on this one:
- Use XmlWriter when creating the content representation for Tridion, and the ItemFields collections to create your content in. If it fails validation, it will probably fail before you try to save it.
- When dealing with Rich Text fields, use Tidy.NET to ensure your content is valid Xhtml.

Consider if a content migration is really what you want
One of the main reasons to change WCM is that your current content format does not match the business requirements. Guess what happens if you migrate your content "as-is" into Tridion? Yup, the content format still does not match your business requirements. So why are you even contemplating migration? Sure, you can get some of the content in, but you really should think about what you're trying to achieve before spending weeks writing a content migration tool that will prove to be worthless in a very short time frame. Do not underestimate the power of manual content migration in some cases.

How easy is it to get the source content?
This obviously depends on a lot of aspects of your current/old WCM, not all of them are as easy, and all of them are different.

In other words, really think about what it is you're trying to achieve before embarking on a migration project that insists on changing mid-way through the migration.

Since you read this far, here's a couple of bonus code samples :)

Converting html to xhtml using Tidy.NET:
private const String XhtmlNamespace = "";
public static String ConvertHtmlToXhtml(String source)
    MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(source));
    MemoryStream output = new MemoryStream();

    TidyMessageCollection tmc = new TidyMessageCollection();
    Tidy tidy = new Tidy();

    tidy.Options.DocType = DocType.Omit;
    tidy.Options.DropFontTags = true;
    tidy.Options.LogicalEmphasis = true;
    tidy.Options.Xhtml = true;
    tidy.Options.XmlOut = true;
    tidy.Options.MakeClean = true;
    tidy.Options.TidyMark = false;
    tidy.Options.NumEntities = true;

    tidy.Parse(input, output, tmc);

    XmlDocument x = new XmlDocument();
    XmlDocument xhtml = new XmlDocument();
    xhtml.LoadXml("<body />");
    XmlNode xhtmlBody = xhtml.SelectSingleNode("/body");

    XmlAttribute ns = x.CreateAttribute("xmlns");
    ns.Value = XhtmlNamespace;
    XmlNode body = x.SelectSingleNode("/html/body");
    if (body != null)
        foreach (XmlNode node in body.ChildNodes)
            if (node.NodeType == XmlNodeType.Element)
                if (node.Attributes != null) 

            if (xhtmlBody != null) 
                    (node, true));
    return xhtmlBody != null ? xhtmlBody.InnerXml : null;

Getting a new or existing component (for update vs creation, CoreService with a custom client library)
static Component GetNewOrExistingComponent
    (string componentName, Folder folder)
    Component returnObject = null;
    componentName = SecurityElement.Escape(componentName);
    XmlNamespaceManager nm = new XmlNamespaceManager
        (new NameTable());
    CoreServiceSession session =
        new CoreServiceSession(CoreServiceEndpoint);
    OrganizationalItemItemsFilter filter = 
        new OrganizationalItemItemsFilter(session)
            {ItemTypes = new[] {ItemType.Component}};

    string xpath = String.Format
        ("tcm:Item[@Title='{0}']", componentName);
    XmlElement listItems = folder.GetListItems(filter);
    if (listItems != null)
        if (listItems.SelectNodes(xpath, nm).Count > 0)
            string componentId = listItems.SelectSingleNode
                (xpath, nm).Attributes["ID"].Value;
            returnObject = session.GetObject
                (new TcmUri(componentId)) as Component;
            returnObject = new Component(session, folder.Id);
    return returnObject;

Note: This sample uses a custom client Library I wrote on top of the CoreService, and this library is not available yet - and I'm not sure I will make it available at all due to how long it took me to write it, and the fact that it is a work-in-progress. Releasing it means supporting it, and I unfortunately don't have the time to support my plants at home, let alone a still-half-buggy library that someone may try to use in production systems. Anyway, the code just looks like TOM.NET, so you shouldn't have any trouble reverse-engineering what this code does.


Jeremy Simmons said...

any chance you'd be willing to share the code for people to learn from with the disclaimer that no questions would be allowed ?

Charles said...

I disagree with the approach on content migration. Companies using Tridion are most likely medium to large enterprise. They are likely to have tons of existing content. Recreating will take years to complete. Migration should the first approach.
There is nothing stops you from improving your content afterwards.

Nuno said...

Hi Charles,

I certainly understand your point, but it still remains valid that if the content was wrong before Tridion, it won't become better just because you migrated it. It might become easier to fix :) but not better just because of migration - and that's why I provided a sample for it.

@Jeremy: working on it, I'll provide some samples soon from a few libraries I wrote to import content regularly from RSS sources.

Unknown said...

I've done numerous migrations using Kapow instead of coding the solution. You can transform and validate content during the migration so there is no post clean-up effort, total time and effort is much much less, 6-8 month projects are done in 1-2 months.

Paul said...

We are struggling to convert our 300gb database from R5 to Tidion 2011, 10 years of data is proving a challenge. Is it feasible to consider an alternative approach of just migrating selected content?

I m thinning of the following scenario; Existing SGs are identified and related pages, components and keywords are migrated to the clean T11 dB with all the localized content in tact.

We have the kapow tool available if this was to play a role in this approach.