Where To Find The Microsoft Office Binary File Format Specifications

26 February 2008

A short while ago I mentioned that Microsoft had committed to releasing the file format specifications for the Microsoft Office Binary files under the Open Specification Promise and making them generally available, removing any of the complications that developers previously had to go through to get hold of these documents.

So, the only remaining question to answer is where you have to look for these documents. There are a few organizations stepping forwards to hosting and archiving these documents.

The first location is an obvious one, and it is Microsoft. The documents can be found on Microsoft.com by following this link.

There you will find;

  • Word 97-2007 Binary File Format (.doc) Specification PDF | XPS
  • PowerPoint 97-2007 Binary File Format (.ppt) Specification PDF | XPS
  • Excel 97-2007 Binary File Format (.xls) Specification PDF | XPS
  • Office Drawing 97-2007 Binary Format Specification PDF | XPS

Additionally, Microsoft also made specifications for a number of supporting technologies available, also under the OSP, these include;

  • Windows Compound Binary File Format Specification PDF | XPS
  • Windows Metafile Format (.wmf) Specification PDF | XPS
  • Ink Serialized Format (ISF) Specification PDF | XPS

The other part of the announcement about the binary file formats was the creation of a translator project on Sourceforge that would look at the translation of older Microsoft Office documents from the binary file format to the new OpenXML format.

The project is now live, and can be found here.

At the same time there are two other organizations that have agreed to host these specifications. The first of these was the British Library, below is a small excerpt from the page that they are hosted on;

The British Library believes that it is essential to archive and, where possible, provide access to the specifications of digital file formats.  These specifications are important today for people developing applications that work with digital file formats, but archived copies will be even more critical in the future when today’s applications are long obsolete.

You will find the specifications on this page on the British Library site.

The second 3rd party organization who will host the documents is the United States National Library of Congress, and here is an excerpt from their site that that again highlights the intention to preserve access to these documents for generations to come;

Listed here are selected specifications made available for downloading by the Library of Congress with the permission of their owners and the intention of ensuring permanent access to the specifications for the digital preservation community and other users. Also listed are URLs for sources of freely downloadable specifications for digital formats from standards organizations.

You will find the documents on the Library of Congress digital preservation site here.

All in all this means that the documents are available for developers who want access to them today, and are preserved for future generations by a combination of the perpetual nature of the OSP and the effort of the Library of Congress and the British Library to host this specification documentation on an equally perpetual basis…

Binary File Format Specifications Under the OSP And A New Open Source Converter Project

17 January 2008

Several of my Microsoft colleagues are blogging this morning about a comment that Brian Jones posted yesterday. Brian announces two things that the company is doing to support developers and organizations that want to understand the relationship between the binary file formats and OpenXML and/or manage conversions between the two.

1. The specifications for the binary file formats will be placed under the Open Specification Promise (OSP). This means that any developer is now able to gain access to these specifications without the need to contact Microsoft or sign any agreement. The file format documentation has been available since 2006 under a RAND-Z licence through the process described in this knowledge base article, the further step of applying the OSP to the documentation simplifies the process further.

2. There will be a project established on Sourceforge to build a converter between the binary files (.doc, .xls, .ppt) and the new OpenXML format (.docx, .pptx, .xlsx). This means there will be libraries available under the BSD license that clearly demonstrate the mapping between the two file formats that can again be used by any developer as a reference.

Brian’s post carries the text from one of TC45’s proposed dispositions that relates to this decision;

We believe that Interoperability between applications conforming to DIS 29500 is established at the Office Open XML-to- Office Open XML file construct level only.

Prescriptive guidance on, or tools to enable, transformation from Microsoft Office  “binary” file formats (i.e., .doc., .xls, and .ppt) (the “Binary Formats”) to Office Open XML formatted files is not the intention or in scope of DIS 29500.  As a result this request is outside the bounds of this process.

It is important to note that substantial use is being made of both the Binary Formats and Office Open XML in the marketplace today.  Many products (such as OpenOffice.org) support the Binary Formats. Microsoft has indicated that many companies and public institutions have received the documentation for the Binary Formats, and are working with it at this time, and can create mappings between the Binary Formats and Office Open XML. Translators from the Binary Formats  to XML formats such as ODF have already been developed and are in wide use. For example, the Sun ODF Plug-in for Microsoft Office (http://sun.systemnews.com/articles/112/3/sw/18208) states that  “The plug-in allows users the ability to seamlessly convert Microsoft Office documents to and from ODF. The ODF plug-in supports Microsoft Word, Excel and Powerpoint”.

Likewise, there is widespread use of Office Open XML in the marketplace today across platforms and applications.  A few examples include the implementations released by Apple (Mac OS X Leopard, iWork 08, iPhone), Adobe (InDesign), Microsoft (Office 2007, Office 2003, Office XP, Office 2000, Office 2008 Mac OS X), Novell (Suse Open Office), Google (Search / Preview), Mindjet (MindManager), Intergen, OpenXML/ODF Translator (Open Source project on Sourceforge), Dataviz (DocumentsToGo on Palm OS, MacLinkPlus on Mac OS X Leopard), NeoOffice, Altova (XMLSpy), MarkLogic (XML Content Server), Datawatch (Monarch Pro), QuickOffice  (QuickOffice Premier 5.0 on Symbian), Altsoft (XML2PDF Server 2007) and those under development by Corel (WordPerfect), AbiWord, Gnome (GNumeric),  Xandros, Linspire, Turbolinux and others.  These implementations are now available on many platforms, including Linux, the Macintosh, Windows, and handheld devices (PalmOS, Symbian, iPhone, and Windows Mobile).

The widespread use of both  Binary Formats and Office Open XML formats indicates that, at this time, 3rd party can use both formats and build mappings between them.

Nonetheless, Ecma International discussed this subject with Microsoft Corporation, the author of the Binary Formats.  To make it even easier for third party conversion of Binary Format-to-DIS 29500, Microsoft agreed to:

· Initiate a Binary Format-to-ISO/IEC JTC 1 DIS 29500 Translator Project on the open source software development web site SourceForge (http://sourceforge.net/ ) in collaboration with independent software vendors.  The Translator Project will create software tools, plus guidance, showing how a document written using the Binary Formats can be translated to DIS 29500.  The Translator will be available under the open source Berkeley Software Distribution (BSD) license, and anyone can use the mapping, submit bugs and feedback, or contribute to the Project.  The Translator Project will start on February 15, 2008.

· Make it even easier to get access to the  Binary Formats documentation by posting it and making it available for a direct download on the Microsoft web site no later than February 15, 2008.  The Binary Formats have been under a covenant not to sue and Microsoft will also make them available under its Open Specification Promise (see www.microsoft.com/interop/osp) by the time they are posted.

We will modify DIS 29500 to include an informative reference to the SourceForge project.

This is great news for a number of developers in the Asia Pacific region that I have worked with over the last year, the request for this has been raised both as part of the ISO process and additionally by developers who are already working with the file formats outside of that process.