ODA Publishes DWG 2010 File Format Specification

On 1st August 2006 I started this blog with a post titled “Should Autodesk keep the DWG format a secret?“. I believe that it is perfectly all right for a CAD vendor to safeguard its intellectual property and technology by means of proprietary file formats. However, I also believe that every responsible CAD vendor should appreciate the fact that we live in a multi-CAD environment and should offer their file I/O technology to all so that any developer can write software to access the data (not technology) described in those native file formats. If not that they should at least offer a library that converts between their proprietary format and a standard like STEP. Sadly this is not the case which has led to the birth of an entire data exchange industry where people end up reverse engineering proprietary file formats.

I must mention McNeel as a glowing exception here. By creating the OpenNURBS Initiative then have published the 3DM file format specification directly in source code form. As a result each and every 3DM file on this planet has been written in a way that it was originally designed to be written.

Since Autodesk does not license its RealDWG SDK to everyone, the ODA has ended up reverse engineering it over the years and providing its members with libraries to read and write DWG files. Today the ODA updated their DWG file format specification document to support DWG 2010. This specification is available to everyone, not just its members. I have always wondered why the ODA makes the results of its reverse engineering activities available to the public. After all, mainly the people involved in reverse engineering native file formats will find it interesting. The specification does not contain stuff that users and even many developers would care or want to understand. So it does not make much business sense for the ODA to make this information public. But then the ODA is a non-profit organization and it is probably a good thing that they are publishing this specification.

But there is a flip side to this, which is something that can be used against them. The introduction of the document contains this paragraph:

While our Open Design Specification for .dwg files is able to read and write .dwg files with excellent AutoCAD compatibility, we continue to work to improve our understanding of all the data in a .dwg file. If you find information which will help us to understand any unknown values, please contact us at http://www.opendesign.com/contact.

Do a search for the terms “unknown” and “undocumented” in the ODA DWG specification and you will get a pretty good idea of exactly how much of the DWG file format is still unknown to the ODA. In fact, that is an section in the specification called “Unknown Section”. It contains this scary paragraph:

This section is largely unknown. The total size of this section is 53. We simply patch in “known to be valid” data. We first write a 0L, then the number of entries in the objmap +3, as a long. Then 45 bytes of “known to be valid data”. Then we poke in the start address for objects at offset 16.

Maybe Autodesk put in this section as a decoy or something. But as you can see, reverse engineering proprietary file formats is a very messy business and the ODA is being completely “open” about it.

  • John

    Have you actually read the specification? Look at page 64, on the file organization of v2010. Its a blank page. Also, there are bullet item in v2007 that are blank, yet the earlier released rtf has the bullet items intact. Through out the document there are table missing headers, missing information and some pages with table parts and colored lines scattered about.

    Some examples:

    Page 14 there is a table meant to decribe handle refernece codes. The table has a “code” column and has the values entered. However it also has a “description” column, spelled “desciption” which is totally empty providing no information.

    Pages 62, 63, 64: Look at 5.15 Reed-Solomon encoding, a thin blue line and two empty bullets on page 62. Pages 63 and 64 are just blank pages.

    Its as if the tool that was used to produce the PDF file, presumably from the RTF file, screwed the whole thing up.

    OR they purposely created pure garbage. This specification, and I use that term loosely, and borders on totally useless.

    You cannot use this specification to build a DWG reader. There just isn’t enough info. In the early days and up thru AutoCAD 2000 they did a real good job of providing the spec. For 2004, 2007 and 2010 format this specification will only serve to frustrate you as it gives you a few details then leaves out the most important things.

    It obvious from their product that the ODA knows all the information that is missing from the specification. However, they are choosing not to divulge it.

    Don’t be fooled. They are giving away anything. The provision of this specification is merely a gesture, compelling people like you to write good things about their “openess”.

    But for people like me, who have taken the time to try and apply this specification to a product you quickly realize this specification is merely a Cliff note. Its an overview with some good pieces that get you started. But you will spend many hours, days, years reverse engineering the file format yourself due to the incompleteness of this document.

    Rest assured the ODA’s secrets and knowledge is still well guarded and are NOT made available to the public.

  • Gayaudeshani

    Hi John,
    I also need to study about DWG files, Do you have any good references..?