ODA Publishes DWG 2010 File Format Specification

On 1st August 2006 I started this blog with a post titled “Should Autodesk keep the DWG format a secret?“. I believe that it is perfectly all right for a CAD vendor to safeguard its intellectual property and technology by means of proprietary file formats. However, I also believe that every responsible CAD vendor should appreciate the fact that we live in a multi-CAD environment and should offer their file I/O technology to all so that any developer can write software to access the data (not technology) described in those native file formats. If not that they should at least offer a library that converts between their proprietary format and a standard like STEP. Sadly this is not the case which has led to the birth of an entire data exchange industry where people end up reverse engineering proprietary file formats.

I must mention McNeel as a glowing exception here. By creating the OpenNURBS Initiative then have published the 3DM file format specification directly in source code form. As a result each and every 3DM file on this planet has been written in a way that it was originally designed to be written.

Since Autodesk does not license its RealDWG SDK to everyone, the ODA has ended up reverse engineering it over the years and providing its members with libraries to read and write DWG files. Today the ODA updated their DWG file format specification document to support DWG 2010. This specification is available to everyone, not just its members. I have always wondered why the ODA makes the results of its reverse engineering activities available to the public. After all, mainly the people involved in reverse engineering native file formats will find it interesting. The specification does not contain stuff that users and even many developers would care or want to understand. So it does not make much business sense for the ODA to make this information public. But then the ODA is a non-profit organization and it is probably a good thing that they are publishing this specification.

But there is a flip side to this, which is something that can be used against them. The introduction of the document contains this paragraph:

While our Open Design Specification for .dwg files is able to read and write .dwg files with excellent AutoCAD compatibility, we continue to work to improve our understanding of all the data in a .dwg file. If you find information which will help us to understand any unknown values, please contact us at http://www.opendesign.com/contact.

Do a search for the terms “unknown” and “undocumented” in the ODA DWG specification and you will get a pretty good idea of exactly how much of the DWG file format is still unknown to the ODA. In fact, that is an section in the specification called “Unknown Section”. It contains this scary paragraph:

This section is largely unknown. The total size of this section is 53. We simply patch in “known to be valid” data. We first write a 0L, then the number of entries in the objmap +3, as a long. Then 45 bytes of “known to be valid data”. Then we poke in the start address for objects at offset 16.

Maybe Autodesk put in this section as a decoy or something. But as you can see, reverse engineering proprietary file formats is a very messy business and the ODA is being completely “open” about it.