• Nexmo
  • Blog
  • Product
  • Voice
  • Anatomy of a Voice API: What Is It? How Does It Work? What Can You Do with It?
Voice API, what can you do with it?

Anatomy of a Voice API: What Is It? How Does It Work? What Can You Do with It?

Published August 29, 2017 by Jack Mardack

As the technology that supports human communication continues to evolve, we’ve come nearly full circle when it comes to speech. Communication via Voice has been under some pretty stiff competition from SMS and other text-based, real-time communications channels in the past few years. But as the original mode of human communication, Voice still has a natural edge.

For businesses, the implications for communicating with their customers are abundantly clear. Talking will always have some basic advantages over texting, and there are arguably a growing number of customer-facing situations where Voice is absolutely the only way to go. In some of these cases, businesses are reinventing an existing voice-based customer touch point, such as the inbound phone call, to achieve dramatically improved business results. In other cases, Voice will be used in boldly innovative ways to provide for the specific communications demands of totally new customer interactions. The poster example of this new wave of Voice use cases is the ride-hailing services that seamlessly provide a privatized phone call between driver and passenger, right from within the app, millions of times per day in countries all over the world.

The wild popularity of such new “software-driven” phone call experiences has opened the door for companies of all types to leverage the very same enabling technologies (such as Uber and Lyft), and to do more for their customers with Voice by conceiving and creating their own Voice experiences.

One of those fundamentally enabling technologies, and the subject of this blog, is the Voice API.

What is the Voice API?

Summarily, the Voice API allows software developers to easily create voice applications, in the same way they can already build websites and mobile apps. It comprises both the programmatic framework needed to give developers control over what happens in the phone call, as well as the power to actually connect call participants anywhere in the world, while completely abstracting the technical complexity of doing so.

Beyond giving a developer control over what happens in the phone call and making it easy to make and receive (VoIP as well as PSTN) calls anywhere in the world, the Voice API makes it possible for your voice application to interact with external data sources and services. This is fairly revolutionary in its implications to both how a business may better know its customers through phone call engagements, and also to the breadth and innovation of the experiences that can now be created within the space of a “phone call.”

Because your voice application exists in a “software context” that you completely control (same as your website or mobile app), it can work dynamically with any resource that is programmatically available. This means that you can use the valuable customer history and data you already possess to inform the phone call experience. It means that you can bring information from the phone call experience back to be stored in your CRM or other data storage system, for analysis later. (See: Tracking Phone Calls and Users with Mixpanel and the Voice API). It also means you can leverage an external engine for real-time sentiment analysis during a live phone call, as easily as you can introduce an artificially intelligent participant (such as IBM Watson) to the call, or anything else for which there is an API.

Truly, your imagination is the only constraint.

Voice API Anatomy
The basic “anatomy” of the Voice API

How Does the Voice API Work?

A good place to begin a deeper understanding of the Nexmo Voice API is with the programmatic framework we created so developers can precisely determine what “happens” during a phone call. This is how you determine what powers and options the participants have, and what their experience is.

To properly expose this as a space in which developers could create phone call experiences, we came up with Nexmo Call Control Objects (NCCOs). An NCCO is simply a sequence of instructions based on a finite number of possible actions. When a call is placed to your virtual number, Nexmo makes a request to the web endpoint where you’ve placed your NCCO, and the experience proceeds per the instructions contained therein.

The elements of an NCCO instruction set are:

  • Action — something to be done in the call
  • Option — how to customize an action
  • Type — describes an option, for example, type=phone for an endpoint option

The Actions you can use in an NCCO are:

  • Record — all or part of a call
  • Conversation — create a standard or hosted conversation
  • Connect — connect to a connectable endpoint such as a phone number
  • Talk — send synthesized speech to a conversation
  • Stream — send audio files to a conversation
  • Input — collect digits and speech from the person you are calling, then process them

Each Action has a set of attributes that define the behavior for that action. The potential attributes depend on what the action is. For example, the connect action has an endpoint as an option for to or from attributes, but the talk action has text and voiceName attributes to define what is said and in what language.

NCCOs are written in JSON because it’s a convenient format for transferring textual data that’s friendly to both humans and machines, and with which every developer is already familiar. We also like that this approach scales nicely as we want to include other channels beyond Voice, in our expanding definition of the customer “conversation.”

Here are some basic examples of NCCOs:

This NCCO reads* the text provided aloud when the call is answered:

*(In this case using the voiceName “Emma”, which is a female voice with a nice British accent)

This NCCO simply connects the inbound caller to another number.

You can of course also create a sequence of actions. This NCCO reads the text aloud when the call is answered, starts a recording, and then connects to another number.

You can obviously also make outbound phone calls. The following NCCO makes a call to a number provided and then invokes a second NCCO of instructions (retrieved from your answer URL) for when the callee answers.

EXTRA CREDIT: This NCCO connects the call to a WebSocket endpoint creating a real-time bi-directional streaming data connection to enable integration into the growing ecosystem of AI and bot platforms.

You get the idea. It’s a powerful, scalable framework that provides for a very wide latitude of experience creation within the bounds of a phone call. To read more about NCCOs visit our NCCO Reference.

What Can You Do with the Voice API?

At this point it should no longer come off as a glib exaggeration to answer that question with “Almost anything you can imagine,” at least when it comes to creating voice-driven experiences for your customers. We intended that the Voice API should do nothing less than explode the concept of “phone call” and create an amazing new experiential space for product designers and developers to work. Its full potential will no doubt be revealed in the service of those future use cases and as-yet-unimagined Voice experiences.

With that said, there are a number of Voice applications that are quite easy to build and that can be immediately useful to your business. Consider: going digital with your inbound phone calls if you rely on them for new business, or building a simple IVR menu of options for your callers, or letting a bunch of your customers know all at once about a special promotion or sale via phone call broadcast, or creating individual phone call notifications to let your customers know when something they care about happens, like a package being delivered or a remote sensor reaching a certain value. If you’re a tinkerer, consider creating a conference bridge in five minutes. Or you can let an artificial intelligence engine like Lex (the voice engine that powers Amazon Alexa) answer your inbound calls. You’ll find many more suggestive Voice use cases here.

If you’re a developer, hopefully you’ve been inspired to explore the powers of experience creation we’ve brought together in the Voice API. You’ll find this Voice API Overview a great place to jump in. If you’re a product manager or have responsibility for customer experience, we’d love to talk to you about leveraging the Voice API to delight customers and drive better outcomes in your world.

Leave a Reply

Your email address will not be published.