The Lack of PC-based Video-Telephony:
 How I Lost  A Technology Bet

Gordon Bell
9 May 2002


Technical Report






Microsoft Research

Microsoft Corporation

455 Market St. Suite 1690
San Francisco, CA, 94105

The payment of a June 1996 bet with Jim Gray is overdue – I bet that:

By April 1, 2001, 50% of the PCs that run a Microsoft OS will ship with:

·         1-10 frames per second,

·         videophone with

·         telephone quality voice. 

At least 20% of the video-enabled PCs (i.e. 10% of the total shipments) shipped during the preceding quarter will use a videophone at least once a day or at least 5-times per week to participate in a teleconference. The connection will be via any kind of telephony, cable modem, LAN, or WAN. 

Not included in the platform count are: Network Computers[1] (NCs) telecomputers (combined phone and PC); game computers; and television computers (PCs embedded in television sets).

Winner (Gray)  treats loser (Bell) to a dinner including $150/bottle wine for 2 persons.

Loser (Bell) has to write a paper on why he lost and post it on his web site.

The question remains, – How and when will computer assisted communication including video become as ubiquitous as email or surfing the web?  It is just a matter of time, but …

Why I lost the bet

Unlike my perfect record of winning bets against optimists (Table 1), I lost this one because I was the optimist.  Now, clearly the bet was based on “wishful thinking” that video ubiquity could and perhaps should occur, not analysis.  CUseeMe was a product in use for college students to keep in touch. Camera support in NetMeeting, PowerPoint, and NetShow would surely stimulate ubiquity within five years.  Pornographic sites demonstrated that people liked video contact.  $20 camera assemblies were being incorporated into cameras and portables.  Intel had introduced cards and cameras for video conferencing that would stimulate the market.  I was caught up in the enthusiasm of  the .com build-up, with startups introducing video-mail that might “kick start” the bandwagon.  All these things happened, In 2002, a large number of camera products can be easily connected to three-five year old PCs. Still, the ubiquity has not happened.

It was also a defiant bet against naysayers – “videophones were tried between NY and Pittsburgh and demonstrated at the 1964 World’s Fair, but no one wanted them[2]”, or AT&T introduced $1200 videophones aka “the Grandma phone” and they aren’t selling. 

Second Thoughts: The Four Heuristics that guarantee failure

Jim Gemmell and I observed that the first telepresence “killer app” was just “telepresentations[3]” having started our work in 1995.  In fact, more than 40,000 visitors attended the ACM97 conference telepresently versus 2,000 conference attendees.   And they spent more aggregate hours attending the telepresence site in the first 6 months than attendees spent at the brick-and-mortar site.

To help establish our research, we observed why tele-enabled communication, meetings, and conferences were failing.  In 2001, Gemmell wrote “How to Fail at Video Conferencing[4]” that gave what we believe is the recipe for failure. While we are certain that these almost guarantee failure, we’ll have to wait until computer enhanced communication with/without video succeeds, before we understand the heuristics for success.

1.      Voice quality must be at least competitive with telephony.  This implies low latency and negligible jitter. Furthermore, good microphones are required. The ITU latency guideline for IP telephony is 150 ms. for voice. In 2002, we measured Instant Messenger’s voice latency on a LAN to be under 130 ms. However, Internet latencies are unpredictable. Also, software may add latency (we measured NetMeeting’s delay to be 290 ms). High latencies will drive users to the telephone for audio, meaning two calls (video + audio) have to be made[5]. 

To avoid echoes, the user must have a headset, handset, or echo-canceling speakerphone. None of these are expensive or difficult to install, but they are not part of the typical PC package, and no network effect exists to induce customers to buy/install them.

Today, the millions of teleconferences have poor audio quality – so it is a low bar to surpass.  Still, the IP- products are inferior today.  This makes attending teleconferences much more fatiguing as the listeners strain to recognize and understand the speakers. We believe that computer enhanced audio quality can be substantially better than a telephone or commonly used conference phones (it could be CD-quality with surround sound).    

In essence, the telephone that uses about 8 Kbps of its 64 Kbps channel is just good enough for 2-way and conference communication. Until video provides some additional value at virtually 0 cost, it is likely to remain a curiosity.

2.      Video technology should increase, not decrease “presence”.  Four attributes of current videoconferencing systems distract from presence: (1) postage stamp size,(2) low frame rates that produce jerky images, (3) low fidelity fuzzy or fragmented images and (4) lack of eye-to-eye communication called gaze awareness[6]  (the camera is not looking at you and you do not appear to be looking at participants). 

Increased improved cameras, and displays, and increased bandwidth, and processing power will ameliorate all four problems.  Image size, frame rate, and fidelity are a function of processing speeds, bandwidth, and displays. The “actual” image is limited by display size.  If “life size” is required, we’ll simply have to wait for larger and more displays for more pixels. 

Gemmell and others have shown[7] that gaze can be adjusted by tracking the head and correcting the eye location so that the participants “appear” to be looking at one another.  In 2002 gaze correction is still limited by vision research.

3.      Setting up a videoconference has to be as easy as making  POTS telephone call.  The interface, directories, and keyboard have to be competitive with 1 button or 10+ digit phone dialing. After a fairly complex installation, current video conference systems depend on complex directory systems and have call set up times of 20 seconds to buffer and synchronize. 

Broadband links to the home are also essential because computers must always be on and connected to minimize call setup time.  Messaging e.g. ICQ, AIM, Windows Messenger facilitate call setup.  Again, video conferencing could be simpler and faster than telephones.  A WM IP telephone call set-up is faster than dialing when others are on line. When many numbers are called the computer can find and dial the number faster than personal lookup.

4.      Video conferencing must be as ubiquitous as telephones. The network effect[8] needs to begin to allow every caller to assume that the callee is video enabled. While corporations users report a 70% use of telephonic conferencing, only 16% and 20% use video and web conferencing.

AT&T’s original Picturephone Meeting Service was introduced in 1979 among a dozen cities.  While the service was discontinued, it set a high water mark for video conferencing using $500K room installations and dedicated 384 Kbps links that no other system attained.  In 2001 only 75K group video conferencing units and 40K desktop video units were sold according to Wainhouse Research.

The long path to computer-based video tele-communication

Increased processing speed and better connectivity reduce the impediments to both IP telephony and video-telephony.   Whether gaze awareness, larger screens, better microphones, cameras, and displays are the impediments for ubiquity is unclear.  Social issues may ultimately be the impediment.  However, it is clear that video telephony can not occur until IP telephony is commonplace. 

Three technologies will open pathways to computer-based personal video communication:

1.      IP telephony must penetrate the voice market to some degree (>10%), providing proof the voice over IP quality of service and convenience rivals circuit switched telephony.

2.      Instant messaging used in corporations and by children voice and video e.g. AOL’s Instant Messenger, Window’s Messenger achieves wide scale use. This implies more standardization that AOL is willing to provide. The facility and services available via Windows XP and IP phone providers is being fought by SBC because it dis-intermediates both local and long distance telephony.  Others believe such systems invade privacy.

3.      2-1/2 and 3G mobile services offering video phones could drive the need for both stationary and computer video telephony provided adequate bandwidth is allocated.  Japan’s portable video phones begin to provide such capability. These more expensive services are stalled at least until prices are low, having supplied the “early adopter” market. The U.S. is on an entirely different path using 802.11 wireless.

Will Social Issues Be the Impediment?

All of these technologies are certain to aid in closing the technology gap for computer-based voice and video telephony.  In addition to solving the technical problems, simplicity of installation and use have to be comparable to the phone while providing significant benefit. Still, social issues may overwhelm the technology. The “vain” argue that users have to be prepared to be always “on camera”.  The “over-committed” argue that users can run a parallel video process such as handling email or reading.  Indeed, the most useful benefit of video assisted communication is probably: 1. a few sensory feedback bits from receiver to transmitter; and 2. forcing an attendee to be “present” and appear to be connected.

Now, I still believe that all the forms of personal video communication will be regularly in use within the next decade along the same lines as the bet – but don’t ask me to bet on it.

Table 1.  Bell Bets (and Prizes)





Gordon Bell Annual Prizes for Parallelism: These reward for achieving performance, parallelism, and cost-effective computation illustrate the power of “betting” to achieve goals.

3 annual  teams


AT&T would either destroy or divest itself of NCR within 5 years



By 1996 supercomputing would be done predominately with >1000 processors.



In 1994 a Microunity multimedia processor would be delivered,



By 1996 video-on-demand would have 500,000 subscribers.

Reddy et al


In 2003 AI would be thought more important than the transistor.

Reddy et al


In 2003 Cars that drive themselves would be available by sale.



By 2001, video cameras would be shipped and in use on most PCs.



By 2002, 10,000 workstations would communicate at Gbits/sec.

Reddy et al*


By 2001, and 2002 One billion Internet users worldwide.



By 2004, Light Emitting Polymer units would exceed Liquid Crystal Displays



In 2004, Electronic Ink will out-ship LCDs as measured by unit area.


*Yet to be paid.

[1] At the time of the bet, there was much industry discussion, most notably by Larry Ellison and by Scott McNealy that thin-client fixed-function browser-centric computers would soon dominate.  This has yet to happen.

[2] AT&T introduced PMS (PicturePhone Meeting Service) in 1979. Intel and Digital used it between Boston and San Francisco to first meet and agree on the Ethernet project.  Digital subsequently installed sites.

[3] Gemmell, J. and Bell, C.G., “Non-collaborative Telepresentations Come of Age:, Communications of the ACM Vol. 40, No. 4, April 1997, pp 79-89.

[4] Private communication.

[5]  The results were obtained by measuring the difference between  the sound source and its arrival.

[6] Finn, K.E., A.J. Sellen, and S.B. Wilbur editors, Video-Mediated Communication, L. Erlbaum Associates, Publishers, Mahwah, NJ, 1997

[7] Gemmell, Jim, Zitnick, Larry, Kang, Thomas, and Toyama, Kentaro, Software-enabled Gaze-aware Videoconferencing, IEEE Multimedia, Vol. 7, No. 4, Oct-Dec 2000, pp. 26-35

[8] Metcalfe’s Law says the value to a subscriber is equal to the number of subscribers, N; hence the network value to all subscribers is N**2.