Backward Compatibility And Open XML

24 September 2007 by oliver

One key piece of the value that Open XML brings as a standard is the ability for document based information, data and formatting to be carried forwards from the billions of documents that exist today in the Microsoft Office binary data file formats to the new Open XML format.

Once documents are converted to the Open XML file format they can be read, manipulated and rendered by any developer or application that chooses to support the format.

The Ecma 376 Open XML file format specification provides an extensive array of specified XML tags that allow for the mapping of existing data held in Microsoft’s binary file formats to the new Open XML based format. This means that documents that are migrated to the Open XML file format will not lose data, presentation styles or other important document attributes as they are migrated to the new format.

There has been some confusion in discussions on the internet about the role of the binary data file formats in providing backward compatibility, obviously they are not needed for any developer on any platform to read or write the Open XML data structures.

Should any developer wish add to the small number of one-time-use tools that are needed to convert documents from the binary formats to the Ecma Open XML format, full access to the Microsoft Office binary file format specifications has been available without fee (RAND-Z) to any developer for some years now from Microsoft directly, but they are obviously not needed by any developer who is just working with Open XML based files.

The important role that the Open XML specification plays in the equation of backward compatibility is that it provides a clear mapping to allow representation of the complete document after migration from the binary format. Once a file is created or converted to the Open XML file format the user is then able to choose from a growing list of  applications that implement the format to work with their document from that point forwards.

As many of you are aware the Ecma Open XML specification is currently in the final stages of ISO approval. During the technical evaluation phase, which completed on September 2nd, it was noted that a small number of the tags that exist to ensure backward compatibility still need additional detail added to their definitions.

Many standards bodies, including several here in Asia, have identified this list to the ISO process and in turn Ecma International have committed to looking at them prior to the Ballot Resolution Meeting in Spring of next year.

Of course, while backward compatibility is highly important to anybody with an existing store of documents it is only one part of the value that the Ecma Open XML specification brings and it will most likely not be a top priority for many developers. We are already starting to see the specification being used to build end point applications in larger business architectures, server based document construction applications and a wide range of other solutions.

4 comments to “Backward Compatibility And Open XML”

  1. Yoon Kit:

    Hi Oliver,

    > We are already starting to see the specification being used to build
    > end point applications in larger business architectures,

    BTW, your link does not indicate the Microsoft sponsorship of the Malaysian effort. Read this:

    http://www.openmalaysiablog.com/2007/08/define-customer.html

    “The system, to be called the Malaysian International Halal Hub Open XML System, will be based on the Open XML document standard … A project team of officials from Microsoft and HDC are defining the scope of the project and method of implementation before deciding on these matters”
    - May 2007 Yasmin Mahmood, GM for Microsoft Malaysia

    1) Can you describe how OpenXML plays a part in the “Malaysian International Halal Hub Open XML System”? Are forms submitted online via a web page, or submitted as OpenXML documents?

    2) Is it really architected around a file format, or is the Portal name just a marketing gimmick because Microsoft is “sponsoring” the development of the system? Could I suggest a better name: “Halal Sharepoint Portal”?

    3) Do you have any firm directions the “project team” have decided? Have they realised that file format based portals (which is exactly what OpenXML is … unless TC45’s scope has suddenly expanded) is oh so 1980’s?

    4) BTW, the “end point applications” was mooted in May 07, a few months after Malaysia raised concerns against MSOOXML at the February ballot, and 4 months before MSOOXML failed to get the 67% approval at ISO. This means that this “application” as to be delayed for at least 6 months after the BRM when MSOOXML stabilises. HDC should be very worried about basing its International Hub on such immature and uncertain architecture. Do you think its wise?

    Your response would be greatly appreciated!

    yk.

  2. oliver:

    Welcome back, that is more of a blog post of your own rather than a comment on the text that I wrote, good discussion point though.

    And to think, I put “billions” in the original text knowing that you liked the word so much.

    Anyway, you’re a little off topic with your comment, and you seem to have invented a lot of words and context that was not in the original text.

    Addressing what I’m guessing is your meta point… yes, absolutely! folks are already using Open XML as a component of their architectures. Not quite as the basis of their architecture, but certainly as a component of it.

    In reality, as I’m sure you’re aware, it seems a little offbeat to architect any business solution “around a file format”, but the Open XML formats capability to encapsulate and segment data that needs to be round tripped through various systems is certainly important to a number of developers and architects.

  3. Yoon Kit:

    > I put “billions” in the original text knowing that you liked the word so much.

    Thanks! I find it humourous when Marketdroids use it. I wonder if we can also say that “billions” of legacy macro enabled documents will NOT be backward compatible with MSOOXML? Billions is a big word, and cuts both ways.

    > you’re a little off topic with your comment,

    Well, I sorry for spamming your blog post about MSOOXML’s “backward compatibility”, but it would be interesting to find out more about this Halal Hub Open XML System. There doesnt seem to be much more information on why the name was chosen, nor the architectural decisions made after the 2 month “study”.

    Im sure you as an expert of OpenXML in this region would have access to more information on this project to help educate us how OpenXML is the superior choice as a file format or even as a name.

    > yes, absolutely! folks are already using Open XML as a component of their architectures.
    > Not quite as the basis of their architecture, but certainly as a component of it.

    For interoperability, file exchange formats would be the module for importing/exporting information. For “choice” then, shouldn’t it also need to support plain HTML POST, CSV, EDI, SOAP, BIFF and even (hopefully) ODF / XFORMS. So OpenXML, in perspective should be a very minor part of the “Hub”.

    Whats mischievous about the name “The International Malaysian Halal Hub Open XML System” is that it implies that Malaysia supports MSOOXML. Microsoft Malaysia has already leveraged on the press generated and photo ops, to insist that the Prime Minister of Malaysia endorses OpenXML, and pressure certain groups.

    You can read the threats here:

    http://www.openmalaysiablog.com/2007/08/microsoft-turns.html

    This of course is far from the truth as both Technical Committees in the Standards Body of Malaysia voted “Disapprove with comments” against MSOOXML on technical grounds during the September ISO vote.

    > but the Open XML formats capability to encapsulate and segment data
    > that needs to be round tripped through various systems is certainly
    > important to a number of developers and architects.

    Can you describe the process of round-tripping which is unique to OpenXML? Can it apply to any other formats as described above?

    yk.

  4. oliver:

    yk, I need to be really blunt about something. Much as I would love to, it will never be appropriate for me to talk about the specifics of Microsoft’s customers on this blog if it is not already publically referenced somewhere. The blog is a personal site, and while I work for Microsoft that is about as close as it gets to being an official site (i.e. not close at all). I’m sure HDC will release the information that you’re looking for in time.

    On the question about “round tripping” of custom data.

    First of all forget about the file format as an office automation document for a moment, and think about it as a container for data in its raw form. The Ecma Open XML spec defines a way to embed custom schemas into the document that can represent just about any data you like, then guarantee that it will remain intact as one application or another opens the document, works with it then saves it out.

    Now bringing it back to being a document format again, Open XML allows you to bind elements from those custom schemas back to properties in the document if you choose so not only can the custom schemas be manipulated by automated systems, but also by a user through a form in their OA app.

    If we apply that to a healthcare scenario then you can imagine the Open XML document being used in a diagnosis process. A clinician opens up an office automation app and runs through a diagnosis process and documents the patients symptoms into custom fields in the document. When the document is saved the patient data is stored independently as a custom schema in the docx file.

    As a next step a billing system embeds a second custom schema into the document that includes invoice information to go back to the patients healthcare insurer. An addition to this scenario that is only important to this scenario in so much as it shows that a single document can have multiple embedded custom schemas. The billing system only needs code to work with the OPC, it does not need to deal with the document, or the diagnosis information.

    As a final step, I’ll submit my encapsulated patient transaction to a web service somewhere that analyzes the custom XML document that describes the patients symptoms, and as a result drops a third custom schema into the OPC that details a possible diagnosis and some suggested medication. Again, no office automation involved, and no need for the web service to understand the document or the billing schema.

    The original clinician can then reopen the document in their original office automation app and work with all the new information that has been added by various systems.

    What is important about the way Open XML deals with this is the segmentation of the data and the ability for the developer to decide up on the structure of the custom embedded schema. This means that the Open XML spec is not dictating how this data is stored, and developers can embed any one of the thousands of business schema standards that exist out there today. In the example above, for the US, a developer might choose to embed the HL7 schemas into the Open XML file.

    This capability in itself is a lot more to do with being able to use Open XML in end point systems in a larger SoA environment, and a less to do with what you might traditionally think of in terms of office automation apps, although of course the office automation app still has a key role to play. I use a healthcare example, but it could be any business process, and I’m already seeing examples of enterprise organizations doing this sort of work in scenarios such as supply chain or banking.

    Can you do this with other doc formats? Well, most of them just are not designed to do this. The majority of the plethora of document formats that are out there are designed with pure OA or document presentation in mind. For those that do allow the embedding of custom data elements it isn’t clear to me that this data would be protected as it passes through different applications, or that it would possible to implement in a form that allows the segmentation of the data and the conformance to existing business process schema standards.

    I hope that all makes sense. It has been a couple of days since I got any real sleep, so I’m not sure how coherent I’m being right now.

    Maybe I should turn this into a post in its own right.

Comment here

XHTML: Allowed tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>