All We Need Is A Magic Wand…
28 February 2008It has been a fun week watching the goings on in Geneva, some of the well orchestrated activity outside of the BRM has been fascinating to follow. A little like a high tech episode of The Bold and The Beautiful.
Probably by pure coincidence Google’s Open Source Programs Manager, Zaheda Bhorat, added a post to The Official Google Blog expressing a “wish” for open document standards, Google’s corporate position on OpenXML standardization.
The call to action for the post appears to be a suggestion that OpenXML needs to be “unified” or “harmonized” into ODF. A simple sounding task with a much more complex reality behind it.
It was a topic that came up frequently during the technical evaluation period of the DIS29500 (Open XML) fast track process, and I thought it might be useful to share some personal thoughts around what I think would need to happen to achieve the (laudable) goal of harmonizing the ODF and Open XML file formats at this stage.
As I say, “Harmonize with ODF” (generally meaning unify) is a straight forward enough sounding task, but in my view the commentators on this topic often discount a lot of reality that would need to be dealt with along the way.
For the purpose of this post I need to take a simplistic approach to the issues involved, ideally any solution to the proposal to unify standards would also have to find a way to encompass the many document formats that exist in the market today, not just ODF and Open XML, but for the purpose of this text I will just address a potential path for IS26300 (ODF) and DIS29500 (Open XML).
There is probably one other expectation that needs to be set before we can begin to think about unification. The market today already has significant enough adoption of both file formats for us not to be able to dismiss any existing store of documents in either format, which means that we probably can’t take the approach of modifying one format or the other.
As a result the simple integration of the two formats probably isn’t directly possible at this stage and any effort to unify them would most likely lead to a third standard rather than the unified single standard that many workshop participants seemed to be looking for during the technical discussions around Open XML. Even if this was to become DIS26301, it would undoubtedly be significantly different from the IS26300 ODF standard, or any of the other revisions of ODF, that we know today.
Assuming that we have some clarity around which formats we are planning to unify we next have to work out where we will manage the project from. Today Open XML is maintained by Ecma, and ODF is maintained by OASIS, two different but not entirely dissimilar organizations. As part of the fast track process Ecma has proposed a joint maintenance agreement for OpenXML within ISO’s SC34 committee, maintenance of OpenXML will end up in SC34 at the end of the current process.
At this point in time OASIS does not have any such agreement in place, instead choosing to manage maintenance of the ODF format outside of the ISO process. We would need to find a common maintenance process and committee, given that this is all about ISO standardization it makes sense (to me at least) that SC34 would be the right place to do that, and giving up the current level of control that OASIS has over ODF would be a small step to take within the framework of delivering a more comprehensive single standard.
As a second step, the newly gathered committee would need to agree on design goals. Currently ODF is designed to be first and foremost an office automation document format, whereby Open XML is designed allow the document format to be used as a container for both office automation documents and a transport mechanism for other data that may be in use in an enterprise environment. The goal for Open XML in this context is that SMEs and Enterprises will be able to build their documents and related management tools into end to end SoA based business systems.
Next, now that we have the committee in place and decisions made around the major design goals, there would need to be a stage of technical unification. There are a couple of issues that would need to be considered here, nothing that would be impossible to deal with, but they would have to be cleared up.
The first is that the architectural structures of ODF and Open XML are significantly different once you open up the ZIP containers. ODF favours a simple structure with most of the XML held in a small number of files within a defined relationship.
OpenXML offers a more complex structure that exists to allow for embedded data from non office automation applications. The XML is broken up into parts of a document and the relationship between the files is created on a document by document basis, this design principal was originally selected for Open XML with the goal of offering more flexibility for scenarios such as server based document assembly applications and document security.
The second is that the XML notations within the current structures are very different as they stand today. It was pointed out in many of the technical workshops that ODF carries a very high weighting on the importance of “human readable XML”, whereby Open XML places a similar level of weighting on delivering the level of performance that is expected to be required in enterprise and data centre environments, this is achieved in Open XML by using notation that is not quite so pleasing to the human eye but is designed to be significantly more efficient for the applications that are expected to process it.
Once these two issues are worked out the committee would then need to look at the XML tags that are defined in each standard and ensure that all requirements from both standards were met. From the public discussions that I have read and participated in on this topic I think there is a general belief that this one step is the only one would be necessary.
Finally the issue of backward compatibility with existing binary documents would need to be addressed. Open XML carries a design goal of allowing the full mapping in XML for the corpus of existing binary documents, created by earlier versions of the Microsoft Office applications, that are stored by individuals and enterprises today.
The new standard would need to also carry forward the tags for the functionality that exists in these documents, only some of which currently sits in the 650+ pages of the ODF specification at this stage. I’m guessing given the increased scope that comes with the single document format goal there would also be a need to look at the binary OpenOffice.org formats, WordPerfect formats and maybe other file formats from applications that have not yet been discussed.
Assuming that partnerships and technical agreements can be reached on all of these points then the next step for the committee participants is to build this work into a document or set of documents and prepare to submit the text of the new third standard to ISO for approval.
In parallel applications would need to start adopting the new format, developers would need to be trained to understand how it works, tools would need to be build to allow testing and format manipulation etc.
All this is achievable, and in the spirit of unification the whole industry would need to be prepared to step up and deliver on this work in a mode of complete partnership and transparency.
Finally, once this is all done we still need to look back at the other document formats that are left out of the simple scenario that I have sketched out above. This would include looking at the other existing document formats that are in use in the market today SGML, HTML, PDF/A, PDF/X and so on, some would be relevant for further unification work and obviously some would not – just another decision that needs to be made.
Again, I’ll state that these are personal views and not in anyway an official position of my employer.
Over the coming years we will see how the industry and the standards processes answer this question, assuming it really needs to be answered.
The reality is that none of this can even begin until there is a really clear understanding of the relationship between some of these document formats. For OpenXML and ODF there is a project underway in Germany to look at what these relationships are which will get us all off to a good start, delivering some real data upon which decisions and activity can be based.
This project will give everybody involved a much stronger view of what Interoperability between these two file formats looks like and how it can be delivered. Until then everything else is just supposition.

