Archive for February, 2008

All We Need Is A Magic Wand…

28 February 2008

It has been a fun week watching the goings on in Geneva, some of the well orchestrated activity outside of the BRM has been fascinating to follow. A little like a high tech episode of The Bold and The Beautiful.

Probably by pure coincidence Google’s Open Source Programs Manager, Zaheda Bhorat, added a post to The Official Google Blog expressing a “wish” for open document standards, Google’s corporate position on OpenXML standardization.

The call to action for the post appears to be a suggestion that OpenXML needs to be “unified” or “harmonized” into ODF. A simple sounding task with a much more complex reality behind it.

It was  a topic that came up frequently during the technical evaluation period of the DIS29500 (Open XML) fast track process, and I thought it might be useful to share some personal thoughts around what I think would need to happen to achieve the (laudable) goal of harmonizing the ODF and Open XML file formats at this stage.

As I say, “Harmonize with ODF” (generally meaning unify) is a straight forward enough sounding task, but in my view the commentators on this topic often discount a lot of reality that would need to be dealt with along the way.

For the purpose of this post I need to take a simplistic approach to the issues involved, ideally any solution to the proposal to unify standards would also have to find a way to encompass the many document formats that exist in the market today, not just ODF and Open XML, but for the purpose of this text I will just address a potential path for IS26300 (ODF) and DIS29500 (Open XML).

There is probably one other expectation that needs to be set before we can begin to think about unification. The market today already has significant enough adoption of both file formats for us not to be able to dismiss any existing store of documents in either format, which means that we probably can’t take the approach of modifying one format or the other.

As a result the simple integration of the two formats probably isn’t directly possible at this stage and any effort to unify them would most likely lead to a third standard rather than the unified single standard that many workshop participants seemed to be looking for during the technical discussions around Open XML. Even if this was to become DIS26301, it would undoubtedly be significantly different from the IS26300 ODF standard, or any of the other revisions of ODF, that we know today.

Assuming that we have some clarity around which formats we are planning to unify we next have to work out where we will manage the project from. Today Open XML is maintained by Ecma, and ODF is maintained by OASIS, two different but not entirely dissimilar organizations. As part of the fast track process Ecma has proposed a joint maintenance agreement for OpenXML within ISO’s SC34 committee, maintenance of OpenXML will end up in SC34 at the end of the current process.

At this point in time OASIS does not have any such agreement in place, instead choosing to manage maintenance of the ODF format outside of the ISO process. We would need to find a common maintenance process and committee, given that this is all about ISO standardization it makes sense (to me at least) that SC34 would be the right place to do that, and giving up the current level of control that OASIS has over ODF would be a small step to take within the framework of delivering a more comprehensive single standard.

As a second step, the newly gathered committee would need to agree on design goals. Currently ODF is designed to be first and foremost an office automation document format, whereby Open XML is designed allow the document format to be used as a container for both office automation documents and a transport mechanism for other data that may be in use in an enterprise environment. The goal for Open XML in this context is that SMEs and Enterprises will be able to build their documents and related management tools into end to end SoA based business systems.

Next, now that we have the committee in place and decisions made around the major design goals, there would need to be a stage of technical unification. There are a couple of issues that would need to be considered here, nothing that would be impossible to deal with, but they would have to be cleared up.

The first is that the architectural structures of ODF and Open XML are significantly different once you open up the ZIP containers. ODF favours a simple structure with most of the XML held in a small number of files within a defined relationship.

OpenXML offers a more complex structure that exists to allow for embedded data from non office automation applications. The XML is broken up into parts of a document and the relationship between the files is created on a document by document basis, this design principal was originally selected for Open XML with the goal of offering more flexibility for scenarios such as server based document assembly applications and document security.

The second is that the XML notations within the current structures are very different as they stand today. It was pointed out in many of the technical workshops that ODF carries a very high weighting on the importance of “human readable XML”, whereby Open XML places a similar level of weighting on delivering the level of performance that is expected to be required in enterprise and data centre environments, this is achieved in Open XML by using notation that is not quite so pleasing to the human eye but is designed to be significantly more efficient for the applications that are expected to process it.

Once these two issues are worked out the committee would then need to look at the XML tags that are defined in each standard and ensure that all requirements from both standards were met. From the public discussions that I have read and participated in on this topic I think there is a general belief that this one step is the only one would be necessary.

Finally the issue of backward compatibility with existing binary documents would need to be addressed. Open XML carries a design goal of allowing the full mapping in XML for the corpus of existing binary documents, created by earlier versions of the Microsoft Office applications, that are stored by individuals and enterprises today.

The new standard would need to also carry forward the tags for the functionality that exists in these documents, only some of which currently sits in the 650+ pages of the ODF specification at this stage. I’m guessing given the increased scope that comes with the single document format goal there would also be a need to look at the binary OpenOffice.org formats, WordPerfect formats and maybe other file formats from applications that have not yet been discussed.

Assuming that partnerships and technical agreements can be reached on all of these points then the next step for the committee participants is to build this work into a document or set of documents and prepare to submit the text of the new third standard to ISO for approval.

In parallel applications would need to start adopting the new format, developers would need to be trained to understand how it works, tools would need to be build to allow testing and format manipulation etc.

All this is achievable, and in the spirit of unification the whole industry would need to be prepared to step up and deliver on this work in a mode of complete partnership and transparency.

Finally,  once this is all done we still need to look back at the other document formats that are left out of the simple scenario that I have sketched out above. This would include looking at the other existing document formats that are in use in the market today SGML, HTML, PDF/A, PDF/X and so on, some would be relevant for further unification work and obviously some would not – just another decision that needs to be made.

Again, I’ll state that these are personal views and not in anyway an official position of my employer.

Over the coming years we will see how the industry and the standards processes answer this question, assuming it really needs to be answered.

The reality is that none of this can even begin until there is a really clear understanding of the relationship between some of these document formats. For OpenXML and ODF there is a project underway in Germany to look at what these relationships are which will get us all off to a good start, delivering some real data upon which decisions and activity can be based.

This project will give everybody involved a much stronger view of what Interoperability between these two file formats looks like and how it can be delivered. Until then everything else is just supposition.

Where To Find The Microsoft Office Binary File Format Specifications

26 February 2008

A short while ago I mentioned that Microsoft had committed to releasing the file format specifications for the Microsoft Office Binary files under the Open Specification Promise and making them generally available, removing any of the complications that developers previously had to go through to get hold of these documents.

So, the only remaining question to answer is where you have to look for these documents. There are a few organizations stepping forwards to hosting and archiving these documents.

The first location is an obvious one, and it is Microsoft. The documents can be found on Microsoft.com by following this link.

There you will find;

  • Word 97-2007 Binary File Format (.doc) Specification PDF | XPS
  • PowerPoint 97-2007 Binary File Format (.ppt) Specification PDF | XPS
  • Excel 97-2007 Binary File Format (.xls) Specification PDF | XPS
  • Office Drawing 97-2007 Binary Format Specification PDF | XPS

Additionally, Microsoft also made specifications for a number of supporting technologies available, also under the OSP, these include;

  • Windows Compound Binary File Format Specification PDF | XPS
  • Windows Metafile Format (.wmf) Specification PDF | XPS
  • Ink Serialized Format (ISF) Specification PDF | XPS

The other part of the announcement about the binary file formats was the creation of a translator project on Sourceforge that would look at the translation of older Microsoft Office documents from the binary file format to the new OpenXML format.

The project is now live, and can be found here.

At the same time there are two other organizations that have agreed to host these specifications. The first of these was the British Library, below is a small excerpt from the page that they are hosted on;

The British Library believes that it is essential to archive and, where possible, provide access to the specifications of digital file formats.  These specifications are important today for people developing applications that work with digital file formats, but archived copies will be even more critical in the future when today’s applications are long obsolete.

You will find the specifications on this page on the British Library site.

The second 3rd party organization who will host the documents is the United States National Library of Congress, and here is an excerpt from their site that that again highlights the intention to preserve access to these documents for generations to come;

Listed here are selected specifications made available for downloading by the Library of Congress with the permission of their owners and the intention of ensuring permanent access to the specifications for the digital preservation community and other users. Also listed are URLs for sources of freely downloadable specifications for digital formats from standards organizations.

You will find the documents on the Library of Congress digital preservation site here.

All in all this means that the documents are available for developers who want access to them today, and are preserved for future generations by a combination of the perpetual nature of the OSP and the effort of the Library of Congress and the British Library to host this specification documentation on an equally perpetual basis…

Mauricio Ordoñez Weighs In…

25 February 2008

The work with OpenXML over the last year has, in my opinion, done a number of good things for Microsoft as a company. A highlight for me has been the number of people inside the company who have taken it upon themselves to blog and share their thoughts, their expertise and their experience externally.

Mauricio Ordoñez is the most recent voice to join the conversation, bringing with him a wealth of information about how OpenXML is being implemented, the tools that those implementations are using and the ISVs that are working with the specification.

His first couple of posts have got things off to a good start.

He started out by highlighting the VSTO Power Tools, specifically the inclusion of the OPC Package Explorer;

The Visual Studio team has just released a set of add-ons called VSTO Power Tools. Andrew Whitechapel wrote about it in his blog. The Power Tools have a treat for Open XML developers, the Open XML Package Editor.

Today he added more to the current debate by taking a look at one of the many documents that have recently been published by the ODF Alliance. This is a group who originally had a charter to promote the use and development of the OASIS Open Document Format, they seem to be spending more time lobbying against OpenXML at this point and much less time supporting what appeared to be their original charter.

The ODF Alliance took some time to comb through the proposed dispositions that the DIS29500 project editor put forwards in preparation for the Ballot resolution Meeting in Geneva this week, then for one reason or another they chose to highlight 10 random dispositions. As a courtesy, Mauricio responded to that document and presents his own analysis of those same ten dispositions.

Keep an eye on his blog, I know he will have a lot to offer over the coming months…

DocX4All - A Java Based DOCX editor…

22 February 2008

Over the last twelve months I have met a number of developers who are working with the OpenXML specification to build a wide range of applications.

One of those developers is Jason Harrop, who has been working on a couple of projects using the spec. The first was a set of tools that allows users to simultaneously edit documents, using a plugin for Microsoft Office on one end of his applcation, and a set of Linux based backend tools to manage the communications.

His second project is a java based application that will allow users to work with OpenXML documents regardless of their choice of platform, he calls the project Docx4All. Additionally, all of Jason’s code for these two projects  is published under the GPL.

Docx4all is a WYSIWYG editor for docx files which runs on Vista, XP, and linux. Of course, it uses docx as its native file format. It is currently a proof of concept but the application already provides basic formatting and editing (including cut/paste, and styles) and so on.

Jason tells me the WordprocessingML in the docx file is unmarshalled directly into classes generated from the OOXML schemas, using plutext’s docx4j library. In principle this approach allows for 100% compatibility with with other existing office automation applications, since there is no conversion to another internal file format.

helloworld

There doesn’t seem to be any table support in the current release, but there is basic support for printing, via PDF.

helloworld-printpreview

If you want to take a look for yourself then you can launch docx4all from your web browser by following this link. It will install locally, so you don’t have to be online to use it.

If you’re interested in taking a deeper look then a virtual appliance containing a full docx4all development environment is available from here. The appliance runs Ubuntu, a good example of OpenXML development and implementation taking place completely away from Microsoft’s stack.

Cutting Back On Expenses In Geneva… A Beginners Guide

18 February 2008

Things are getting more exciting as we get closer to the Ballot Resolution Meeting for DIS29500 (OpenXML) in Geneva.

I thought it might be useful to look at a few of the terms that have become part of the generic standards lexicon during this process and see if I can define them a little for the more curious among you.

This might be really important as Geneva approaches, being able to speak the relevant lingo opens up access to a whole range of events that various third parties appear to be running alongside the BRM, mostly with the goal of influencing the outcome of the meeting to suit whatever agenda the hosting party has.

Regardless, each one of these events represents a free dinner, free glasses of whatever you fancy and some controversial yet predictable chit chat.

All of these terms appear to be important if you are thinking about blogging about OpenXML during the BRM, or if you’re just planning on mingling at some of the side events… I hear that there will  be free boat trips, open bars, and “unbiased” technical experts who just happen to be in Geneva for the whole week to help out.

So, here goes, just a few things that you can drop into conversation to give yourself a little more credibility with this crowd as you enjoy the view of Lake Geneva while sipping on a free glass or wine, beer or fruit juice;

“Troll” - this is a term that is affectionately used to refer to somebody who wants to post a comment on your blog, or write a blog entry of their own, that you do not fully agree with.

Bob Sutor, an IBM Vice President, emphasized the term in a post of his this morning, I’m not sure exactly what his text means (I have not followed Bob’s blog all that closely of late) but I think he is implying that if you don’t agree with him then he would rather not hear from you.

If you do get stuck in conversation with an apparent stranger over a free cocktail then be sure to refer to just about anybody in the community who speaks in favour of OpenXML as a troll.

In Internet parlance trolls are only there to agitate a situation, you have to convince people that to you it is clear that anybody speaking in favour of OpenXML must only be doing so because they enjoy starting arguments with strangers on the interweb, it isn’t possible that they’re just expressing their own valid point of view.

“Corporate Shill” - This is a great way of writing off experts who refuse to tow the line laid down by the ODF Alliance and their members. These guys are easy to spot, they’re probably the ones offering constructive and positive comments during the BRM itself.

It is pretty clear that anybody with expertise on the topic of OpenXML who happens to have good things to say must be on the payroll of one vendor or another. You have to at least pretend that it would be impossible for any unbiased individual to come to any conclusion other than full support for a single standard, that has been developed mostly by your party hosts, without having been corrupted in one way or another.

You can also use this important term to refer to anybody who chooses to commit their own time and resources to turn up in Geneva and add their expertise into the debate who doesn’t just harp on about ODF all the time.

“Corrupt Government Official” - Another useful addition to your phrase book for any evenings spent floating around beautiful Lake Geneva.

This is a term generally used to describe any government employee or official who has arrived at their own conclusions around the current standardization process. Many of these officials decide that support for multiple standards in their market place is good for a healthy ICT environment, some choose to support OpenXML alongside ODF, and others just don’t want to mandate technology standards at all, leaving the market to make its own decisions.

Whichever way, you need to be ready to carry the view that none of these people could possibly have come to any of these conclusions without being corrupt or otherwise being pressured by somebody.

Free thought by senior and experienced government officials in this area is something that you have to pretend just can’t be tolerated, drop your guard for a moment and somebody might take your drink off you.

“Stuffed Committee”- Before you head down to the quay to board your boat you will need to be ready with some smart anecdotes on this topic. A quick search will present you with some pre-reading, usually a third hand blog entry on a site run by somebody who was not actually there and probably wasn’t part of the process that they’re commenting on.

A stuffed committee refers to any national body where Microsoft partners or other organizations who are already working with OpenXML have turned up to join in the technical debate.

You need to think carefully about these groups, and why they came to argue with the likes of IBM and Google. Mostly they represent their own business, their employees and their customers, many of whom already choose to use OpenXML as part of their development cycles and are already building strong expertise deploying solutions built on the existing draft standard. Who would have thought they would have anything useful to add to the conversation? totally outrageous.

“Paid Off Reporter” - There are lots of these around, they are easy to spot, and they are basically any journalist who chooses to write a story that does not align with the views of the ODF Alliance. Usually they are reporters who have spent time researching the issues being debated and are “reporting” on what they have learned along the way.

So, there you have it. These five terms should be enough to give you a grounding in the language that you will need to get through the evenings. My only other word of advice is to ensure that you don’t actually express any opinions of your own during the evening, in particular any opinions supporting OpenXML.

It is easy to move from corporate guest into one of the five categories of folks above… and if that happens you’ll be spending evenings on your own in your hotel, or worse still, engaging in discussions with other more positive BRM participants over dinner somewhere.

What About This Guy, Does He Have Rights To OpenXML IPR?

13 February 2008

Last night I had the honour of sharing dinner with a couple of guys who have been very vocal throughout the current process to standardize OpenXML within ISO. They have added a lot to the debate, not always in a form that has been easy for myself and my colleagues to swallow, but they have pushed hard and for that we’re grateful.

As we come up towards the final hurdle of this particular part of the process I like to think that we can point to concrete examples of how their work has improved the OpenXML specification, and I hope they agree.

It was a pretty enlightening conversation (on several fronts) and somehow I walked away with a bunch of work that I need to do in response to some of the things we talked about, sometimes I’m just not that smart when it comes to avoiding adding items to my task list.

saintignuciusOne of the questions that came up in conversation centred around the ability for the chap in the picture to the left to implement OpenXML, specifically if he had the IPR rights that he needed to implement and distribute an application or tools based upon the specification.

The answer is the same as the one for the 30,000 marathon runners yesterday. Under the Open Specification Promise he has been directly granted all the rights he needs to any intellectual property that Microsoft owns that he might need to implement or distribute his integrated OpenXML application.

Like some of yesterday’s marathon runners, I’m not sure this guy will do much with this particular grant of rights but if he ever decided he wanted to then he has all that he needs to go ahead and build out his application.

What Do All These Marathon Runners Have In Common?

12 February 2008

new-york-marathon-bridge Simple - through the Open Specification Promise every one of these runners has been independently and directly granted rights to use any intellectual property owned by Microsoft that they would need to implement or use OpenXML - irrevocably and in perpetuity.

If any one of these runners is in any way uncomfortable with the OSP for this then they also have the option to select the CNS, or to come to Microsoft for a separate RAND-Z license.

Some may use it, most probably won’t care. Either way they have all that they need.

I’m planning on conducting further detailed studies into this important issue over the coming days.

Please feel free to email me photos of individuals or groups who you think may not have the coverage that they need to freely implement OpenXML… I would be more than happy to dig into it for you.

Brian Jones Hits The Silver Screen

11 February 2008

My inbox this morning contained links to four videos that I thought would be worth sharing with you. They are of Brian Jones, Microsoft’s representative on TC45 in Ecma, discussing the work he has been doing along the path to standardize Open XML.

A colleague of mine in Redmond has started collecting OpenXML related videos and hosting them up on YouTube, these four of Brian are in there along with several demonstrations of the file format specification in use across various platforms and operating systems. You will find them all linked from here.

08:40 - Part 1: How to deal with the comments

09:40 - Part 2: Details about “compatibility settings”

04:07- Part 3: Custom XML schemas

08:12- Part 4: What “Proposed Dispositions” means

In 2008 It Is Good To Recycle [fake news]

10 February 2008

One of the great mysteries during the process to standardize OpenXML is where some of the news stories come from, especially when they are clearly not substantiated by facts. As I have spent more time reading some of the blogs on this topic I have slowly begun to understand how some of these stories spin up, I thought it would be fun to share one with you.

Two weeks ago one of the fringe sites that have sprung up during the process, NoOOXML.org, published a random list of patents held by Brian Jones, Microsoft’s member of Ecma TC45, pointing out that the list may or may not relate to Open XML, the article shared the same tone with similar articles and basically tried to spread a little more FUD around the topic of IPR and OpenXML. You will find it linked here.

Next step, Harish Pillay, Red Hat’s Chief Technical Architect in APAC appears to have copied that same list into a blog entry implying that this is all part of Microsoft’s ongoing evil plan. Harish’s post here is here.

And then a couple of days ago NoOOXML.org then picked up the article from Harish’s blog, seemingly having no recollection that they already posted this same list in a different context or that it came from them in the first place. They quote Harish’s findings and this time they identify the list as patents covering OpenXML rather than being attributable to Brian and posted them again with a new and more dramatic twist…. because that is where the interim posts had led them.

BINGO, out pops an attempt at a free news story! Funny stuff.

As I have discussed before this is all moot, but at least I’m starting to understand where “news” comes from.

Round and round we go.

ODF Project Editor, An Open Letter On The OpenXML Standardization Process

8 February 2008

Patrick Durusau, the project editor for IS26300 and the Open Document Format TC in OASIS, has posted an open letter on his site discussing the process that Ecma has been through to standardize OpenXML.

He discusses the increasing openness of the specification at each stage in the process.

You will find Patrick’s open letter here;

The OpenXML project has made a large amount of progress in terms of the openness of its development. Objections that do not recognize that are focusing on what they want to see and not what is actually happening with OpenXML.

One of the footnotes of the letter in particular caught my eye, I think this highlights some of the contention in the current debate;

[footnote #1] Granted, I have a number of issues with the current OpenXML proposal but experts do disagree in good faith even within open standards development projects. If a proposal cannot progress until we all agree, then we risk proposals being held hostage to whim and caprice.