Storage and Media in the Future When you Store Everything

Gordon Bell and Jim Gemmell

Bay Area Research Center

San Francisco, CA

Microsoft Research

 

Jim is “ripping” his CDs onto his hard drive in MP3 format. He’s at 2 gigabytes (GB), but he has an 8 GB drive, so space is not yet a worry. In the late 80’s, when we all had hundred-megabyte (MB) drives, it might have sounded a little far out to hear someone project this scenario of having all their music on their personal portable. But here we are. And what are we looking forward to another ten years for our personal stores? How about a terabyte (TB) -- 1000 gigabytes?

 

If it seems hard to imagine what you would do with that space, try to remember what you would have thought of an 8 GB drive in 1989. Its true, you can throw a lot in a terabyte drive without making a dent. Everything you will read in your lifetime stored as text is unlikely to amount to more than a few GB. If you like photography, you could snap 100,000 JPEGs and still only use 10 GB. Toss in, say, a quarter-million faxes to consume another gigabyte. A music collection of several hundred CD’s turns into about 20 GB using MP3 with quality suitable for 99.99% of the world with barely discernable quality degradation compared to the best 640 megabyte CD. To really start using space, why not have a 100 DVD collection storing 250 hours of HDTV on your hard drive – that would use nearly 400 GB, a little less than half you hard drive. You could also record many hours of television.

 

Did we mention this would be on your laptop or portable personal device and that the screen is certain to have resolution that will let us substitute it for paper?

 

Video really consumes space, although audio uses lots too, if you record a lot of it. For instance, if you record everything you hear at compressed voice-grade (8 kilobits per second) you might need a terabyte if you live long enough. Get used to the idea that most of what you store is audio and video, and everything else is incidental. So if it isn’t A/V you can afford to keep it.  Then the problem will be to find it.

 

Gordon it getting a start on capture and encoding legacy physical material. He is trying to get every piece of paper he has or has ever produced into his hard drive. This includes what he’s written -- books, articles, emails, notebooks and drawings. It also includes bills, legal documents and any other paper he has filed. Photos and slides are being scanned. All his public lectures e.g. a videotaped lecture at MIT in 1972 About the Futue, will be encoded on video (research.microsoft.com/~gbell). The space requirements are manageable, but nothing else is.  Email is about 100 megabytes per year… or for someone with a 40 year professional life, 40 GB.

 

After the information is put into the computer.  The next big problem is getting it back.  Indexing and retrieval are the big problem. So you’ve got it, but can you find it? Searching text can be done, its just like the web searches you are used to (in fact, you can buy the same search engines like Altavista for your personal use). As we all know from frustrated web searching, these engines can stand lots of improvement. And you won’t have real humans at yahoo categorizing your life. But don’t get hung up with the text searches, that’s the good part. The scary part is trying to find a photograph, or a desired audio or video clip. We spoke with some leaders in this kind of technology, and for all that they had done, only searching on the close-caption text of the video was consistently useful.

 

It turns out that just to have all your information is usable form is a hard problem. You will be keeping all your old files, but your new word processor might not support the legacy format any more. After battling with McDraw, ancient versions of Quicken, and even some PDP- 8 (the first minicomputer c1965) word processing formats, Gordon has decided that paper is the easiest, the true universal format – you can always scan and recognize it. But to what format? What resolution? Should we do character recognition (and take the time to make corrections)? Scanning in color may be necessary for some materials, but may yield artifacts (like blue edges to some text). Keeping a copy of a scan at the highest possible resolution is a nice way to hedge your bets – if you believe that a few years from now these now legacy files will still be readable.

 

When it gets too frustrating to find something, you can always watch some of some classic movies you’ve captured from DVDs.