Computer...Take A Letter... A Speech Recognition Update
Most of us are used to just shouting at our
computers when they misbehave. We wish that there was an axe
included as standard equipment along with our keyboard and mouse.
What if you had something nice to say to your computer? Imagine if
your computer actually listened to you and did what you told it to
do? If you have not heard about it – it’s called speech recognition
software.
If you've never seen speech recognition
demonstrated by some who is trained to use it - you're in for a
surprise. A person speaks into a microphone in front of their
computer and you see the words they are speaking pop up on the
screen in real time. You might be tempted to look around for someone
operating a keyboard around the corner. It is amazing to watch and
to use. You can speak at our normal speaking speed (about 120 words
per minute) and the computer "guesses" using mathematical algorithms
what word you mean in that context from what it knows about the
English language. (There are some other language models also
available.)
Speech recognition has been useful for users
with training for the past seven years. Training still makes a
difference as it does for any major piece of software. We train
people to use Word®, Excel® or PowerPoint® and speech recognition
should have training as well. (Tip - Good training resources are
available at Crown International http://www.crown1.com including
training manuals and DVDs for ViaVoice and Dragon products...) The
high degree of speed and accuracy for trained users is a combination
of improved software and most important - better hardware. The
faster, more powerful, personal computers with operating systems and
multimedia that focus on high sound quality have made all the
difference.
It is like a Star Trek future when you think
that a person can speak and a computer program can listen, interpret
and respond with the correct words. One of our favorite sentences
to demonstrate this is "Mr. Wright will write you a letter right
now." The speech software guesses based on context which "write" is
the right one for that place. Homonyms are tricky for any of us.
You can also say, "I would like my next paycheck to be two-thousand,
one hundred and sixty-two dollars and eight cents." The computer
will write $2,162.08 on the screen. The same is true for dates and
times. We say it as we normally would and the software formats it
for us. When the software makes a mistake - you correct it and it
learns. It becomes more accurate as you continue to use it.
Speech recognition software for personal
computers has been around and improving since the early 1990's with
products names like Kurzwiel, Lernout & Hauspie, Kolvox, Philips and
the dominant products IBM's ViaVoice® and Nuance's Dragon Naturally
Speaking®. Now with Microsoft's Vista Speech® coming in their next
release of Windows® speech recognition will change the future. It
is already changing the present.
Speech began with what was called “discrete
speech” where you had to pause between each word or phrase.
“Today…is…a…beautiful…day…to…play…tennis…period” It was slow at 40
words per minute but still an amazing breakthrough of the science.
In the late 1990s we finally had “continuous speech” where we could
speak at our normal speaking speed. We still include the
punctuation just as people do when they would be dictating a letter
to their assistant.
For those of us who used typewriters with
whiteout or eraser ribbons - we can only dream of what our past
might have been! All those 30 page papers I had to do at college
would not have seemed so daunting, if I would have had speech
software back then. But I suspect that the long history of speech
recognition software is still news to most people (and professors)
today. I had an interesting discussion with an English teacher who
watched a demonstration of speech. Like the calculator has been to
arithmetic, this teacher was sure that speech recognition would ruin
the written language. Perhaps. Or maybe it is just a return to a
more ancient form - the oral tradition.
Surprisingly - it was only really Star Trek
that nailed how speech recognition would become natural. As someone
who uses speech, it is still surprising to see how many futuristic
commercials and movies still have people typing. Speech as the more
natural interface makes sense - we speak before we can learn to
type.
Of the many types of users of speech
recognition today - most are in the words business. They are people
who use extensive numbers of words in their profession. So it is
lawyers, physicians, judges and educators who tend to be the early
adopters. Most of them were already used to dictation so the idea
of speaking their thoughts was already comfortable. Other
categories are executives who want to be able to control their
personal email dictation. People with disabilities that limit their
ability to use the computer keyboard or mouse have also found speech
as their way to surf the web, play games, send email or do their
work. It is liberating.
One of the quiet epidemics that is carpal
tunnel syndrome (CTS) or repetitive strain injuries (RSI) associated
with typing. It is estimated that this currently costs anywhere
from hundreds of millions of dollars to billions of dollars a year.
It is difficult to calculate it since the reporting on these
conditions and the diagnosis is still uneven. But it is safe to say
that the accumulated keystrokes from many users over years of typing
are showing up with the classic symptoms of tingling and burning in
the fingers, hands and wrists as well as stiffness and pain up our
arms, shoulders and neck. RSIs can also be associated with
headaches, migraines and a number of other pain conditions.
Speech recognition offers an alternative to all
that typing. With the aging boomers and Gen-Xers who have
accumulated RSI and Carpal Tunnel injuries over years of typing and
playing - speech will be the only game in town.
The use of speech is divided into two types of
applications. One is command and control. The other is dictation.
It is better at command and control since it is only recognizing a
word or phrase at a time. Our normal dictation is extremely complex
as we all know from studying how language works or if you’ve tried
to translate what someone is saying from one language to another.
You have to hear them well and guess correctly what they meant based
on what you know of the two languages. Most speech products are
still “speaker dependent” – one user’s voice at a time for dictation
but “speaker independent” for command and control. It doesn’t need
to know your voice to be accurate for most users. The holy grail of
speech recognition is speaker independence for dictation – where it
doesn’t matter who is speaking – the computer will interpret you
correctly. Just like on Star Trek.
But speech recognition for the PC is only a
small part of the speech story. Speech is now server-wide and is
used for example when the computer answering attendant chats with
you when you call a company on the phone. Command and control speech
is also in things like automobiles where you can adjust the heat and
the position of your seat. From huge to small you can find speech
in your PDA, cell phone and even toys. It has been predicted that it
is in these large and small applications that we will really see and
hear speech in the future as the desktop computer becomes a thing of
the past.
It will be interesting to see how the next
generation of speech recognition from Microsoft will change the
speech landscape. The next operating system to follow Windows XP®
was called Longhorn® and now is referred to as Vista®. Like XP (did
you know a basic speech recognition program was included in XP?) -
it is expected to include speech recognition software for command
and control and dictation. The early reports are that this new
release will be very accurate and require very little training. If
true - that's going to take people a step closer to our talking to
our computers everyday.
Mr. Scott on Star Trek once traveled back in
time and was confronted by a computer with a keyboard. He
commented, "A keyboard - how quaint" He knew that the keyboard and
mouse were a thing of the past. Like the telegraph it was a useful
tool in its time but definitely part of the past. We may someday
have to explain what a keyboard is - just as we have to explain what
a typewriter is and how you used white-out to correct your college
papers. Those were the days...
COMPUTER-SAVE-THIS
P.S. If you're one of those already in pain
with RSIs and Carpal Tunnel Syndrome - you should check out some of
the information by Dr. Blair Lamb, MD - a pain specialist. He shows
that most carpal tunnel syndromes and RSIs are conditions that are
primarily due to injuries in the neck caused by our typing and poor
posture when typing. The pain we feel elsewhere are referred pains
or the result of shortening muscles in our arms and wrists. To
resolve these RSIs Dr. Lamb has a number of treatments with
stretching exercises for RSI that stretch the neck as well as other
areas to lengthen those shortened muscles that are pulling and
pinching our nerves and causing us the pain.
To learn more about this, visit his website
http://www.drlamb.com He also has DVDs available that explain
pain conditions like RSIs and he has a multi-level stretching
program also available on DVD
http://www.stretch-doctor.com
Grant D. Fairley is a graduate of Wheaton
College, Wheaton, IL. He is an IBM Business Partner and is a
principal presenter with Strategic Seminars
http://www.strategic-seminars.com
He is the author of several books available through
http://www.palantir-publishing.com
Email
|