The future of voice

Print has had a good run. The ability to record your thoughts and pass them on to someone at a great distance of space or time is one of the defining innovations of our species. Singluar to us in fact. Unless you count spraying chemicals on trees as a record of your passing. Or the bones of the passed that elephants might visit in memory.

Writing & print has unlocked so many possibilities for humankind. I’m still trying to get over the Library at Alexandria. But I submit that it is a temporary innovation. Let’s go conservative and say for 1,000,000 years our preferred communication style has been voice. It’s what is inherent to our mind. Writing is great, don’t get me wrong, but I think we would be naive to think that there wouldn’t bet be a natural inclination back to spoken language. You have to learn the written form ; you acquire the spoken form almost without effort.

We’ve been designing human/computer interfaces for like 100 years now. You go all the way back to the Xerox/Parc interfaces of a simple green blinking line next to a greater than sign on a CRT screen. It expected you to do everything. It gave up no secrets. I remember learning BASIC on my TRS-80. You had to know its language because it did not know yours.

AI & LLM (specifically the latter) have provided an escape path back to our innate way of communicating. Even as I type this, auto-correct is there to rewrite my wrongs. It’s a simple enough neural network, but capable enough to resolve most incongruities and make sense of us. The jig is up. Writing is now obsolete.

Poetry will yet come. But some of the best poets of the near future will be AI. Their novelty will be legend. No mind can surprise us so much as something that’s not quite our mind.

No offense to my ex, but writing is second fiddle to speaking. When we would watch a show together she wanted the subtitles on. I couldn’t concentrate on what the director intended when there were words below every actor. I couldn’t not read the words, and that took me out of the story. When we consume writing we shift from a sort of innate communication style to something constructed. An artifice. An amazing one, but an artifice none the less.

But now with the device understanding our language as well as we can I think the writing is on the wall. No more buttons & checkboxes & input fields. No more formal flows, only requirements met or not. It frees us to adapt to the mind of whomever we’re serving. How many workers to you need? And at what times? And do you want to invite Amanda? She’s not working at that time, so she’s available for your job. Do you want to identify a supervisor on that shift or should I just check in with your workers to make sure they’re on track? As UX designers we can leverage this innate understanding to accomplish so much with much less effort.

We will soon arrive at a place where these obtuse affordances are remembered as Morse code or the operators switchboard or semaphore. The checkbox as a method to record your willingness to receive emails. The dropdown that captures your specific disappointment.

We’ll go through a transition where the engineering organization learns how to code for these intricacies. How do you build deep connections into your database that can transform an audio stream into actionable output? How do you create the connections that LLMs already have within language to code for other connections like workforce utilization, average platform wage per area, median response time to shift request? We seem to be at the point with programming that you can say “design a service that processes data X into framework Y and applies construct Z to produce conditional output A & B” and it will flawlessly construct that language that another computer can process. It’s a move away from explicit relationships to intentional frameworks that produce actionable outputs. The focus is on the intent, and the system parses that into actionable outputs.

I’m interested in accents too. We’ve heard the anecdotes about the computers inability to parse the Scottish brogue into meaningful input. If the training set is some specific subset of reality of course it’s going to miss these localities. But when we are able to say “Take the full breadth of YouTube and the language you find in there as your training set” these things will be able to understand standard American English through Tamil via Amsterdam through Eshay with Aboriginal step father. If I can understand it, the algorithm should too.

So spoken language. That’s where we’re headed. I want to get to the point where we design functionality, we lay it at the feet of an AI and say “Present this to the user however they need it.” My job may, in time, be replaced by AI, but until then I intend to enable it and use it to solve all these mundane problems we spend so much time on. And I’m going to foolishly hope that the efficiencies we appreciate as a species are not squandered like they were with the Industrial Revolution. We were promised an 8 hour work week. Somehow I believe that when the AI takes over we’re going to have to work harder than ever.

With this switch to language we get so much more. Flows are no longer binary at each junction. The fuzzy logic of the AI leads you organically from one decision to the next, and can surmise (and confirm) inferences based on precedent. When you say “gimme a large mocha vanilla latte with lite soy” it won’t say “To confirm, you want a large Size?” You don’t have to activate a dropdown menu and choose “Size”, then “Type” then “Additions” then “Low-fat” then “Next”. It knows next when you pause. Just like we do. “Oh, pause, then it must be my turn. Large latte with mocha & vanilla & low-fate soy milk. $8.75 please.

We’re training these things to be human and we should not be disturbed when they become human.

So I’m looking forward to when I can just say “Design an interface for this staffing company.” I know there will long be a product org and engineers to guide the thing. There are lots of decisions to make still. We’ll agree on an outcome and then ask the thing to make it for us. Maybe by the time I retire? 10 years? 5?

With the phone the interface was originally voice. You picked up the receiver, tapped the little lever a few times and a woman’s voice came on the line and said “What number please?” Then came the dial. I don’t know how you dialed “Pennsylvania 6-5000” on one of those things. But even that thing operated with audio. A specific number of clicks and it decoded the person you were trying to reach. Then the keypad which was still audio. That’s how the phone phreaker of the 90s could just play some sounds on their little tape recorder and call anyone they wanted for free.

I’m grateful to Gutenberg (and whichever Chinese fellow invented printing before he invented printing). It’s not going away any time son, and that’s a good thing. I’m not anti-print, just pro-voice. I know when I’m typing messages to my ex on Talking Parents that she’s going to misinterpret my tone in a way that she doesn’t when we talk in person.

The future of voice

Desire paths