Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Speech to text has been making a strong showing recently. It's been creeping more and more in my life as a faster and easier alternative to keyboard input. It does take some practice, though, so I encourage people not to dismiss it based on their first impressions.


Again, same trap. Talking all day is NOT appealing, and doesn't work in a public setting. Works great for mobile phones precisely because often we have our phones out but can't type, like in a car or walking with my hands full.


The other issue with speech to text is discoverability both for commands and the object of the command. This presumably becomes less of an issue as natural language processing gets better. Voice recognition is pretty good with the right microphones today but using it is something like remembering the right wizard's incantation. Think a command line :-)

That's fine if it's a variation on some frequently used command but I find myself forgetting the triggers and skills I have set on Amazon or the names of playlists I might want it to play.


This is actually a great point - I think this is a one-way vs two-way communication issue. With a lot of our current input devices, tactile feedback helps with verifying that the individual "packets" of information are registering, but visual feedback is key for knowing that the high-level command worked / didn't work, and what other options are discoverable. And the fact that the visual feedback is extremely high bandwidth is key. Imagine being read everything on the screen. Come to think of it, screenreaders are hard.


If your Eminem maybe, I can type faster than I can talk.


Siri cuts me off mid sentence because I talk too slow.


Average conversational speed is 110 to 150 WPM. Maybe you type unusually fast? Professional typists might type at 80 WPM and it takes experienced stenographers to keep up with normal conversations.

Again, speaking for speech to text takes some practice, so it doesn't quite make sense to compare a lifetime of typing against your first shot at speech recognition.


Stenographers can type 300+ wpm on phonetic chording keyboards with appropriate predefined shortcuts.

People can also write shorthand with a pen at 300+ wpm.

A big part of the problem with regular typing is that our phonetic alphabet is not optimized for writing efficiency. Sometimes a single syllable takes 5+ discrete characters to type.

Standard QWERTY-style keyboards are also designed to use single finger motions in serial fashion, and requires various awkward combinations of finger motions and plenty of repositioning between keypresses; the design is designed for learnability rather than efficiency, and wastes much of the possible bandwidth of the hands.

If someone with deep knowledge of human hand physiology / neuroanatomy were to design a keyboard optimized for efficiency (at the expense of requiring years to properly master), we could probably push peak performance to 400–500 wpm.

At some point thinking speed is the main bottleneck though. Regular QWERTY keyboards are fast enough to keep up well enough with most writing tasks.


If you're going to talk about stenographers (experts at typing fast, with specialized equipment and training) let's talk about auctioneers.

The world record for speaking is about 600 WPM. The world record for stenographers is about 360 WPM.

A typical speed for speech is 150 WPM, for someone who is practiced but by no means an expert. A typical (but somewhat fast) speed for typing is 80 WPM, for someone who is practiced but not an expert.

So if you're going to compare the fastest speeds or the typical speeds, speaking comes out ahead either way. It's almost as if our ancestors had evolutionary pressure to develop speech, but lacked the same kind of evolutionary pressure for typing.


Regular people (or computers) listening are going to have a heck of a time understanding world-record speaking speed https://www.youtube.com/watch?v=l-o9vTk8Poo

As for auctioneers, most of what they say is repetition and filler. The fast talking is a gimmicky technique aimed at improving sales, not a practical tool for imparting information at maximum possible speed.

As I said before, the keyboard is not remotely optimized for the capabilities of human hands, and English orthography is not remotely optimized for information density as an encoding the spoken language.

If someone speaking had to say a full syllable for every letter, speaking speed would be much slower.

> almost as if our ancestors had evolutionary pressure to develop speech, but lacked the same kind of evolutionary pressure for typing.

More precisely, speech was developed over millennia by millions of people mutating it slowly over time, whereas modern typing hardware was designed by a single person, based on the mechanical characteristics of a particular prototype typewriter in the late 19th century (before any concept of “touch typing” even existed, and typists were forced to adapt to this mostly fixed hardware design as best they could.


So, the fastest rapper seems to talk at 280 wpm and that's not filler.

http://fivethirtyeight.com/datalab/the-fastest-rapper-in-the...


150 wpm speaking is sort of irrelevant when half or more is verbal filler and rambling that doesn't provide any signal.


> If someone with deep knowledge of human hand physiology / neuroanatomy were to design a keyboard optimized for efficiency (at the expense of requiring years to properly master)

Per language.


Go read out a hello world in C++, pressing ; is much faster and more accurate for me than verbalizing "Semicolon". What about variable names? You have to spell out the word each time to get camel case. u-s-e-r-Capitol C-o-u-n-t and that's if it works perfectly.

Python, you have to say return space space space space every damn line. Return space space space space space space space space while inside a function. It would nearly drive a man to switch to tabs.

Navigation, "go 2 characters to the right" isn't pleasant.

To top it all of picture the fun you can have in vim using speech to text. Dear lord.

Now imagine an open office full of people all doing this at the same time. That my friend, is hell.


what about a programming language optimised for this purpose?


This is what I wanted to say as well. To shift to verbal programming, we'll also have to change to a language designed with that in mind.

Interesting convergence would be VR mixed with TTS processing. So, you use VR and presence/hand tracking to physically interact and words to refine and/or program the visual blocks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: