Component Content Management

Continuing the theme of the Future of ECM … trend #12 out of 13 …

We are all used to working with bulky, monolithic documents where individual sections or paragraphs are often worked on by different people. Once published, a typical problem that often occurs with such documents is that some of the content in the document can quickly become out-of-date. An example is where content is copied and pasted into a section of a document from another source that has its own independent ownership structure and life-cycle. When the content in the source document changes, the newly published document, as a whole, becomes stale.

This blog post looks at some of trends in ECM over the next five years that will seek to address problems such as this, with a focus on Component Content Management.

General approach to solving this problem

To solve this problem requires a more component based content management approach to be taken where individual components of content are managed separately and assembled dynamically into a document for publication. When individual source components of content change, it is possible to alert the owners of documents that have used the source content to let them know that their document may now be stale and might therefore need to be reviewed and updated. It is also possible to automatically reassemble previously published documents again using the new source content, passing it on to the document owner for review, approval and republishing as part of a workflow process.

This approach should seem familiar to many readers as it is the usual approach taken for web content management, where the content of a web page is dynamically assembled from many content sources and updated on the fly as necessary. The same principles could be applied to document management. I believe that a component based content management approach to document management will become more popular over the coming five years. Indeed, Open Text recently acquired output management and document composition vendor StreamServe, which enables content to be assembled based on rules and distributed to multiple channels.

DITA

From a vendor independent perspective, Darwin Information Typing Architecture (DITA) is a good example of this trend. It defines a topic-based approach to modular authoring, enabling content to be flexibly reused, assembled and published across different formats and media. Introduced by IBM in 2001, DITA was then released to the open community and made an OASIS standard in 2005.

The DITA standard quickly gained acceptance as an excellent content management approach for technical documentation. There are now efforts underway by OASIS to adapt DITA as a modular content framework for enterprise business documents that go beyond technical content. These efforts have come about as more and more organisations (such as those in pharmaceutical and medical device manufacturers; healthcare service providers and hospitals; high-tech companies, financial institutions and governments) are moving to utilise structured XML content. A growing number of these organisations have come to believe that DITA not only provides the best basis from which to start addressing their requirements for narrative business documents, but one which will also help them to achieve their goals faster and in a standardised manner. One of the business drivers behind this initiative is that organisations want to leverage the intellectual property that is currently locked within narrative documents. They want to share and personalise the content for different audiences and channels by enabling much more powerful search and retrieval services based on a granular topics rather than book levels.

Going a step further with Object based storage models

In all likelihood, ECM will move towards an object based storage model for documents (and indeed all content) instead of storing documents in a traditional file system. This might be the final catalyst in the switch to a component based content management approach for the creation, management and distribution of all content. It will mean that there won’t be any real differentiation between structured or unstructured content anymore – all content will be absorbed into an object based storage model. For example, a typical blog post is made up of several objects; the blog text itself, attachments, comments, metadata and tags, all linked to each other and handled using a common approach.

A potential down-side of traditional ECM systems is that the metadata is stored in a database. Usually there is complete separation between the metadata and the original content item, with programming logic required to link the content item to its metadata. In an object-based storage model, objects will come with the ability to self-describe themselves (perhaps using RDF and OWL as described below in my blog post Semantonomics), with the associated semantic metadata URL-accessible. This will enable greater flexibility in how dynamic relationships are discovered between objects. For example, imagine you are reading a piece of content — a Word document, web page or blog. It could dynamically display a link to other related objects which could be within the ECM or even across other external sources. It could also display links to people who have skills associated with the content or people who have shown an interest in it. I talk more about this in my blog post The Collaborative Office.

Object-based storage models might require a move towards using XML databases as the underlying storage, described next.

XML Databases

XML database vendors, such as (www.marklogic.com), are gaining traction in the ECM marketplace with tools specifically optimised to deal with XML content. One of the primary reasons for this is that Relational databases (the backbone of all leading ECM products) are unsuitable for the storage and management of XML content. Applying relational theory to non-rectilinear data in XML table structures quickly becomes very difficult to optimise, performance suffers and storage requirements significantly increase. Many organisations, especially financial organisations, are increasingly using XML as the format of choice for much of their data. For example, Financial products Markup Language (FpML) is an information exchange standard, based on XML, and is heavily used in the Financial Services sector for electronic dealing and processing of financial derivative instruments. To address this increased usage of XML, XML database products that are tailored to work with XML content and optimised around the storage, manipulation and search of XML content have come into the marketplace.

Some mainstream ECM vendors have also started to explore the use of XML databases. For example, in 2007 EMC Documentum acquired the XML database vendor, X-Hive, rebranding it as Documentum xDB. This was subsequently incorporated into Documentum as an optional XML storage/management component to compliment (not replace) its traditional relational database storage functionality. Over the next five years, I expect most of the other leading ECM vendors to adopt a similar approach. Some of them might even take a bold step and move entirely to an XML database.

Leave a comment