I spent quite a bit of time on some mysterious patterns within the bytes in the headers of some files I had until I looked at a new one and saw that region containing random text. I now suspect it's either random memory garbage (they forgot to initialize that part of the struct to zero) or perhaps some sort of cut'n'paste buffer -- either way, a big waste of time. Oh well.
For future reference, then, here's what I have:
The file consists of: a 52-byte header, the key, dashes, and then a table of strings. (These are all defined below.)
The header consists of:
|2||12 bytes: "ACROSS&DOWN\0"|
|44||1 byte: puzzle width|
|45||1 byte: puzzle height|
|46||1 byte: # of clues|
I don't know the other bytes in the header, but the random junk I mentioned is around byte 30.
The key is width*height bytes long, and is the answer to the puzzle, one byte per cell. A period is used for a black cell.
The dashes are width*height bytes long, and appear to mirror the key with dashes in place of the key letters. I'm not sure what this is for -- perhaps it allows some puzzles to come with some answers filled in already?
The string table is the rest of the file. It is just a sequence of strings, each terminated by nul. Strings appear to be ISO-8859-1 -- at least they use their © symbol. I've also seen lines terminated with Windows-style
The first string is the title, the second the authors, and the third is the copyright. The next n (using the field from the header) are the clues for the puzzle. Finally, there is sometimes one remaining string, which is a comment. (I have one file where this contains more binary data -- haven't figured that out yet.)
The clues can be mapped to cells like this: first, numbers are assigned in the standard crossword way. Then, the first clue is 1 down (if it exists), the next is 1 across (if it exists), the next 2 down, etc.
I have an implementation of a crossword puz decoder here.