December 30, 2003

Hidden data roundup

It is time for the 2003 roundup of information leakage through hidden data.

This summer I finally got around to writing up my scaled exploitation of other peoples' hidden data through MSWord Documents. This has brought me many emails from users and administrators, including some from large organizations with sensitive data. It seems that many people find this cause for concern.

There have been several other events in this domain this year, some old and some new. Some examples have impacted major news stories but some have been simple eye candy.

This year hidden data in the image domain made a minor splash. It seems that some image manipulation programs cache a thumbnail inside image files. Furthermore that thumbnail might not reflect the fact that the image has been cropped.

Try this Google search for more information and a good example of hidden nudity as opposed to hidden text. (Beware, this might be mildly unsafe for work, depending on where you work.)

Falling back to more established media types in this domain, thememoryhole.org brings us a beautiful DOJ example. They did such a nice job that I have not bothered to run my scripts over the it.

The DC sniper trial is still in the news and of course that yielded its own hidden text fiasco back when it was an unresolved panic, but that was last year. See an overview, the redacted and the unredacted versions.

Obviously these examples are related but different in a key aspect. The image case is clueless software messing with its users, the PDF case is clueless users messing with software. Neither are desirable.

I think these two principles are reasonable to require by any normal person:

  • I want no unforseen hidden data leakage through data formats I happen to use.
  • I want my public officials to know how to safeguard data properly, after all, I personally pay them to do exactly that in this growing climate of NatSec sensitivity.

Perhaps the larger issue is that a format used for editing and maintaining data with rich functionality will probably not be suitable for publishing, and vice versa.

Posted by byers at December 30, 2003 11:09 AM