Dear Appy,
How committed are you?
Signed,
Lost and Forgotten Data
By Gordon Bell
Microsoft Research
Dear Appy,
I'm having trouble with long-term commitment --
not on my end, heaven knows, but from the apps that created me and with whom I
like to associate. Over time, these pesky apps evolve and they simply don't
recognize the data that
they once helped create! But, we data progeny
-- and there are lots of us -- feel that as our creators, these apps should be
responsible for eternal support.
But the little problem with recognition isn't
the worst of it – sometimes the apps even disappear altogether. I ask you, is
it expecting too much for 20-something year old data like me to be
interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or
RealNetworks), or am I just associating with irresponsible apps?
If things continue on their current path, it
seems I will be completely un-interpretable within 20 to 50 years! My apps will
move to other platforms, or evolve to be more Internet- or Next-Big-Thing-centric.
Meanwhile, my owner is trying to store everything cyberizable in his life. He's stuffed his Cyberall full of every form
of directly created to digitally encoded, personal information, e.g.,
documents, photos, and videos. As part of his data, I want to be valid, so that
I may be understood at some point in the indeterminate future!
Signed,
Lost and Forgotten Data
Dear Lost,
Well, you're right -- the problem of irresponsible
apps is rampant in today's society, and it's going to have an impact on
generations of data to come. In the past, old-fashioned data storage apps like
writing and photography were built to last. For example, high quality paper
will hold information for a millennium, and film is sometimes rated at several
hundred
years. A CD is likely to be readable in 50
years, but finding the CD reader/computer & file system/app to read it will
clearly be impossible if history is a guide.
But does that mean that the only true form of
long-term storage is paper? Are today's data committed to an inordinate
conversion effort with each generation if they want to be internal due to these
three, too rapidly evolving, finger-pointing, levels: the media, the computer
and its operating system, and the app? If so, this means storing 10-foot paper
stacks of personal information versus a single DVD for the few GBytes and
simply giving up on audio and video!
Fortunately, to alleviate this situation, various
computing forefathers have donated their personal archives to places like the
Charles Babbage Institute, the Computer Museum History Center, or a university
library in the hope that future scholars will find them useful. (The irony is
that with this much paper that computers helped create, computers are unlikely
to be helpful in assisting the retention and retrieval of their personal archives.)
Still, this is a start, and with any luck, such efforts will eventually bring
some continuity to the intergenerational data problem.
Signed,
Appy
Dear Appy,
Your cheery outlook is much appreciated, but
look at poor video data for heaven's sake! An app that encoded video just two
years ago has completely disappeared, leaving its data orphaned. This
dysfunctional situation is a result of
the evolving nature of proprietary formats coming out of the format
wars. Any one of the MPEGs would have been a better choice. Are there a few
basic data-types that will be forever interpretable so that one doesn't have to
print out and store in large stacks of irretrievable paper waiting to be
encoded or to be otherwise found?
Signed,
Lost and Forgotten Data
Dear Lost,
Well, for one thing, data has learned that in
order to be understood in the future, it cannot be subject to the highly volatile
apps that change every year such that a particular version has to be executed
in order for data to be understood ,e.g., Quicken 95...2000. This means that as
apps evolve, data maintains the creating version of the app or all past data
associated with a named app has to be converted forward -- this signals a new
kind of custodial arrangement.
Alternatively, the simplest way to ensure interpretability
of a simple form is to transform an app's progeny, i.e., its data, into a
generic form that one has a very long term confidence in. This option assumes
there are a few, golden, generic formats that will live indefinitely. ASCII
text is probably the only proven long-term data type. It is too early to tell
whether HTML
will make it to be a golden format. Data of
the world surely have demonstrated their commitment to it. Unfortunately, an
HTML document consists of a number of files including images, e.g., gif or
jpeg, making it less than an ideal format. PDF looks like a potential bet for
most all paper documents, if it can prove it has a long-term commitment to the
relationship!
Clearly, the best solution would be to have
just a few data-types, sharing wide acceptance and standardization, into which
data can be transformed, and that are not subject to the fickle whims of
rapidly evolving apps. Forget about
data in a complex database like drawing programs, or databases, e.g., DB2 or
Outlook .
Signed,
Appy
Dear Appy,
Well, you certainly are the eternal optimist!
What golden formats will exist in addition to ASCII? How long will data held in RTF, PDF, JPEG, various MPEGs, and MP3
be interpretable ? Given the vast amount of data in Microsoft's Office apps ,
what commitment will these apps make to their data? In the future, what
prenuptials will be required for data?
What about
app's fiduciary responsibilities to data that
may have cost 100s of billions of dollars to create?
Signed,
Lost and Forgotten Data
Dear Lost,
You're right -- it's time to get out the wet
noodle!
Signed,
Appy
Dear Appy,
Since I live near a number of those dynamic
apps in Silicon Valley, maybe the solution is at hand by getting some startup
to recognize the problem and to put out Data Modernization that keeps my data
always readable year in, year out. Will
you pass along this idea to your readers as a problem looking for a resolution
in the form of a product?
Signed,
Lost and Forgotten Data
Dear Lost,
Hopefully we'll see another dozen startups
all aimed at making it rich and solving your problem.
Signed,
Appy