Sunday, June 05, 2011

Importing Content into Tridion

Something I get asked almost every time a new project starts - or a new consultant or partner starts working with Tridion - is how to import content into Tridion.

It kinda baffles me, since I always thought it is pretty easy to import content into Tridion, but apparently that's not the case. Here's a few things to consider, I'm pretty sure most of this applies to _any_ content management system, and is not really related to Tridion.

Well formed content
Though it seems obvious, I see many content migrations coming from systems that do not enforce a strict XML schema as Tridion does, and therefore a simple one-to-one migration will fail miserably. "Easy" workarounds on this one:
- Use XmlWriter when creating the content representation for Tridion, and the ItemFields collections to create your content in. If it fails validation, it will probably fail before you try to save it.
- When dealing with Rich Text fields, use Tidy.NET to ensure your content is valid Xhtml.

Consider if a content migration is really what you want
One of the main reasons to change WCM is that your current content format does not match the business requirements. Guess what happens if you migrate your content "as-is" into Tridion? Yup, the content format still does not match your business requirements. So why are you even contemplating migration? Sure, you can get some of the content in, but you really should think about what you're trying to achieve before spending weeks writing a content migration tool that will prove to be worthless in a very short time frame. Do not underestimate the power of manual content migration in some cases.

How easy is it to get the source content?
This obviously depends on a lot of aspects of your current/old WCM, not all of them are as easy, and all of them are different.

In other words, really think about what it is you're trying to achieve before embarking on a migration project that insists on changing mid-way through the migration.

Since you read this far, here's a couple of bonus code samples :)

Converting html to xhtml using Tidy.NET:
private const String XhtmlNamespace = "http://www.w3.org/1999/xhtml";
public static String ConvertHtmlToXhtml(String source)
{
    MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(source));
    MemoryStream output = new MemoryStream();

    TidyMessageCollection tmc = new TidyMessageCollection();
    Tidy tidy = new Tidy();


    tidy.Options.DocType = DocType.Omit;
    tidy.Options.DropFontTags = true;
    tidy.Options.LogicalEmphasis = true;
    tidy.Options.Xhtml = true;
    tidy.Options.XmlOut = true;
    tidy.Options.MakeClean = true;
    tidy.Options.TidyMark = false;
    tidy.Options.NumEntities = true;

    tidy.Parse(input, output, tmc);

    XmlDocument x = new XmlDocument();
    XmlDocument xhtml = new XmlDocument();
    xhtml.LoadXml("<body />");
    XmlNode xhtmlBody = xhtml.SelectSingleNode("/body");

    x.LoadXml(Encoding.UTF8.GetString(output.ToArray()));
    XmlAttribute ns = x.CreateAttribute("xmlns");
    ns.Value = XhtmlNamespace;
    XmlNode body = x.SelectSingleNode("/html/body");
    if (body != null)
        foreach (XmlNode node in body.ChildNodes)
        {
            if (node.NodeType == XmlNodeType.Element)
                if (node.Attributes != null) 
                    node.Attributes.Append(ns);

            if (xhtmlBody != null) 
                xhtmlBody.AppendChild(xhtml.ImportNode
                    (node, true));
        }
    return xhtmlBody != null ? xhtmlBody.InnerXml : null;
}


Getting a new or existing component (for update vs creation, CoreService with a custom client library)
static Component GetNewOrExistingComponent
    (string componentName, Folder folder)
{
    Component returnObject = null;
    componentName = SecurityElement.Escape(componentName);
    XmlNamespaceManager nm = new XmlNamespaceManager
        (new NameTable());
    nm.AddNamespace(Constants.TcmPrefix,
        Constants.TcmNamespace);
    CoreServiceSession session =
        new CoreServiceSession(CoreServiceEndpoint);
    OrganizationalItemItemsFilter filter = 
        new OrganizationalItemItemsFilter(session)
            {ItemTypes = new[] {ItemType.Component}};

    string xpath = String.Format
        ("tcm:Item[@Title='{0}']", componentName);
    XmlElement listItems = folder.GetListItems(filter);
    if (listItems != null)
        if (listItems.SelectNodes(xpath, nm).Count > 0)
        {
            string componentId = listItems.SelectSingleNode
                (xpath, nm).Attributes["ID"].Value;
            returnObject = session.GetObject
                (new TcmUri(componentId)) as Component;
        }
        else
        {
            returnObject = new Component(session, folder.Id);
        }
    return returnObject;
}

Note: This sample uses a custom client Library I wrote on top of the CoreService, and this library is not available yet - and I'm not sure I will make it available at all due to how long it took me to write it, and the fact that it is a work-in-progress. Releasing it means supporting it, and I unfortunately don't have the time to support my plants at home, let alone a still-half-buggy library that someone may try to use in production systems. Anyway, the code just looks like TOM.NET, so you shouldn't have any trouble reverse-engineering what this code does.