Dear Appy,

How committed are you?

Signed,

Lost and Forgotten Data

 

By Gordon Bell

Microsoft Research

 

Dear Appy,

I'm having trouble with long-term commitment -- not on my end, heaven knows, but from the apps that created me and with whom I like to associate. Over time, these pesky apps evolve and they simply don't recognize the data that

they once helped create! But, we data progeny -- and there are lots of us -- feel that as our creators, these apps should be responsible for eternal support.

 

But the little problem with recognition isn't the worst of it – sometimes the apps even disappear altogether. I ask you, is it expecting too much for 20-something year old data like me to be interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or RealNetworks), or am I just associating with irresponsible apps?

 

If things continue on their current path, it seems I will be completely un-interpretable within 20 to 50 years! My apps will move to other platforms, or evolve to be more Internet- or Next-Big-Thing-centric. Meanwhile, my owner is trying to store everything cyberizable in his life.  He's stuffed his Cyberall full of every form of directly created to digitally encoded, personal information, e.g., documents, photos, and videos. As part of his data, I want to be valid, so that I may be understood at some point in the indeterminate future!

 

Signed,

Lost and Forgotten Data

 

 

Dear Lost,

 

Well, you're right -- the problem of irresponsible apps is rampant in today's society, and it's going to have an impact on generations of data to come. In the past, old-fashioned data storage apps like writing and photography were built to last. For example, high quality paper will hold information for a millennium, and film is sometimes rated at several hundred

years. A CD is likely to be readable in 50 years, but finding the CD reader/computer & file system/app to read it will clearly be impossible if history is a guide.

 

But does that mean that the only true form of long-term storage is paper? Are today's data committed to an inordinate conversion effort with each generation if they want to be internal due to these three, too rapidly evolving, finger-pointing, levels: the media, the computer and its operating system, and the app? If so, this means storing 10-foot paper stacks of personal information versus a single DVD for the few GBytes and simply giving up on audio and video! 

 

Fortunately, to alleviate this situation, various computing forefathers have donated their personal archives to places like the Charles Babbage Institute, the Computer Museum History Center, or a university library in the hope that future scholars will find them useful. (The irony is that with this much paper that computers helped create, computers are unlikely to be helpful in assisting the retention and retrieval of their personal archives.) Still, this is a start, and with any luck, such efforts will eventually bring some continuity to the intergenerational data problem.

 

Signed,

Appy

 

Dear Appy,

 

Your cheery outlook is much appreciated, but look at poor video data for heaven's sake! An app that encoded video just two years ago has completely disappeared, leaving its data orphaned. This dysfunctional situation is a result of  the evolving nature of proprietary formats coming out of the format wars. Any one of the MPEGs would have been a better choice. Are there a few basic data-types that will be forever interpretable so that one doesn't have to print out and store in large stacks of irretrievable paper waiting to be encoded or to be otherwise found?

 

Signed,

Lost and Forgotten Data

 

Dear Lost,

 

Well, for one thing, data has learned that in order to be understood in the future, it cannot be subject to the highly volatile apps that change every year such that a particular version has to be executed in order for data to be understood ,e.g., Quicken 95...2000. This means that as apps evolve, data maintains the creating version of the app or all past data associated with a named app has to be converted forward -- this signals a new kind of custodial arrangement.

 

Alternatively, the simplest way to ensure interpretability of a simple form is to transform an app's progeny, i.e., its data, into a generic form that one has a very long term confidence in. This option assumes there are a few, golden, generic formats that will live indefinitely. ASCII text is probably the only proven long-term data type. It is too early to tell whether HTML

will make it to be a golden format. Data of the world surely have demonstrated their commitment to it. Unfortunately, an HTML document consists of a number of files including images, e.g., gif or jpeg, making it less than an ideal format. PDF looks like a potential bet for most all paper documents, if it can prove it has a long-term commitment to the relationship!

 

Clearly, the best solution would be to have just a few data-types, sharing wide acceptance and standardization, into which data can be transformed, and that are not subject to the fickle whims of rapidly evolving apps.  Forget about data in a complex database like drawing programs, or databases, e.g., DB2 or Outlook .

 

Signed,

Appy

 

Dear Appy,

 

Well, you certainly are the eternal optimist! What golden formats will exist in addition to ASCII?  How long will data held in RTF, PDF, JPEG, various MPEGs, and MP3 be interpretable ? Given the vast amount of data in Microsoft's Office apps , what commitment will these apps make to their data? In the future, what prenuptials will be required for data?  What about

app's fiduciary responsibilities to data that may have cost 100s of billions of dollars to create?

 

Signed,

Lost and Forgotten Data

 

Dear Lost,

 

You're right -- it's time to get out the wet noodle!

 

Signed,

Appy

 

Dear Appy,

Since I live near a number of those dynamic apps in Silicon Valley, maybe the solution is at hand by getting some startup to recognize the problem and to put out Data Modernization that keeps my data always readable year in, year out.  Will you pass along this idea to your readers as a problem looking for a resolution in the form of a product?

 

Signed,

Lost and Forgotten Data

 

Dear Lost,

Hopefully we'll see another dozen startups all aimed at making it rich and solving your problem.

 

Signed,

Appy