Creating a Complex IVR System with Ease with XState

Published June 20, 2019 by Yonatan Mevorach

Nexmo Developer Spotlight

Even if you didn’t know that’s what they’re called, you probably use IVR systems all the time. An IVR system lets you call a phone number, listen the audio-cues, and navigate your way through the call to get the info you need. Nexmo makes creating a full-fledged IVR system as simple as spinning up a Web server. In this post we’ll see how to create very complex and elaborate IVR systems while keeping the code simple and easy to maintain. In order to accomplish this, we’ll use XState which is a popular State Machine library for Javascript.

An IVR System with Less Than 35 Lines of Code

The key to implementing an IVR System with Nexmo is to create a Web server that will instruct Nexmo how to handle each step of the call. Typically, this means that as soon as a user calls your virtual incoming number, Nexmo will send an HTTP request to your /answer endpoint and expect you to respond with a JSON payload composed of NCCO objects that specify what the user should hear. Similarly, when the user uses their keypad to choose what they want to listen to next, then Nexmo makes a request to a different endpoint typically called /dtmf. The /dtmf  endpoint will be called with a request payload that includes the number that the user has chosen, which your server should use to figure out what set of NCCO objects to respond with.

Let’s see this what this looks like in code when using express to power our Web server.

Trying It For Yourself

You can start writing your app code right away. But in order to be able to call in and test for yourself that everything is working, you’ll need to complete the following:

  1. Make sure your Web server is accessible on the Web. You can do this by exposing your local development machine using Ngrok or by developing using Glitch.
  2. Sign up for Nexmo if you haven’t already done so.
  3. Create a Voice application. You can do this via the Nexmo Website, or using the Nexmo CLI. You’ll need to enter the public url of your /answer endpoint when you set up your application.
  4. Obtain a virtual incoming number and connect it to your app using the Website or CLI.

When all this is in place you’ll be able to call your number and you’ll hear the audio response that’s based on the data you return from your Web server.

Going Beyond the “Hello World” of IVR Systems

The example shown above works as expected, but a real-world IVR System will yield to the user for input many times, and will interpret the user’s numeric input based on the state of the user in the call. To illustrate this, let’s assume that in our example the user will be asked to choose the restaurant location that they’re interested in, and then to choose whether they want to listen to the open hours or make a reservation. In both of these cases, the user may press 1 on their keypad, but how we interpret this depends on the previous audio-cue and the state of the user in the call.

To support this use-case we’ll need to change the code we just wrote. Ideally, we’ll change it in a way so that as we add functionality and make our IVR System more complex over time, our code will stay simple and we won’t have to rethink how to structure it. To achieve this, we’ll model our call structure as a Finite-State Machine using XState, a State Machine library for Javascript.

A Primer on State Machines

A State Machine is simply a model for a “machine” that can be in only one state at any given time, and can only transition from one state to another given specific inputs. XState and other State Machine libraries let you model and instantiate a machine in code, in a way that the “rules” of the State Machine are guaranteed to be enforced.

Modeling our Call Structure as a State Machine

To model our call structure as a State Machine, we’ll use the Machine function that XState exposes:

As you can see in the code above, our call can only be in one of three states:

  • The intro  state where the user is listening to the introduction and is instructed to choose the location they’re interested in.
  • The mainStLocation  state where they’re listening to information about the Main St. location of our hypothetical restaurant chain
  • The broadwayLocation  state when they’re listening to information about the Broadway location.

You can also see that:

  • The only way to transition to the mainStLocation state is be in the intro state and send the DTMF-1  event.
  • The only way to transition to the broadwayLocation state is to be in the intro state and send the DTMF-2  event.

We can choose to colocate the NCCO objects related to each state inside the event definition using XState’s metaproperty

Utilizing our Machine

The object that the Machine function returns should be treated as an immutable stateless object that defines the structure of our machine. To actually create an instance of our machine that we can use as a source-of-truth for the state of a call, we’ll use XState interpret function. The interpret  function returns an object which is referred to as a “Service“. You can access the current state of each machine instance using the state property of the service. And you can send an event to change the state of the machine instance using the service’s send() method. We’ll create a callManagermodule to be in charge of creating machine instances for every incoming call, sending the appropriate events as the call progresses, and removing each machine instance when the call ends.

As you can see, each call is identified by its uuid  which Nexmo takes care of assigning to each call.

Putting It All Together

Now we can modify our Web server code to defer to the callManager whenever the Nexmo backend calls our endpoints.

As you can see, in order to know when the call has ended we added an /event endpoint. If you associate it with your Nexmo Application as the “Event URL” webhook then Nexmo will make a request to it asynchronously when the overall call state changes (e.g. the user hangs up). Unlike the /answer or /dtmf  endpoint, you cannot respond with NCCO objects to this request and influence what the user hears.

Changing the Call Structure with Ease

We just completed a refactor of our app code, but it behaves exactly the same as before. But in contrast to before, now modifying the call structure becomes as simple as changing the JSON object that we pass to the the Machine function.

So if, as mentioned earlier, we want to let the user decide if they want to listen to the location’s opening hours or make a reservation, we just have to add a few more states, transitions, and NCCO arrays to our Machine’s definition.

More XState Goodness

XState has more useful features that can help us out as our call model becomes more intricate.

XState Visualizer

The XState Visualizer is an online tool to generate Statechart diagrams based on your existing XState Machine definitions. All we have to do to generate a Statechart is to paste your call to the Machine function. This is particularly handy to share with non-developer stakeholders to have discussions about the call structure.

XState Visualizer

Self-Referencing Transitions

A state can transition into itself. This can be useful for cases where you want to allow the user to playback the latest piece of information given.

Persistence

You can register a function to be called whenever the machine transitions from one state to another using the service’s onTransition method. This can be useful to log the steps the user is taking and sending them to a remote database for future reference\analysis.

In general, XState supports serializing a machine instance’s data so you can persist it.

Strict Mode

When prompting the user for keypad input at any point in the call it’s possible for the user to enter an input value you don’t expect. For example, the user may be in a state in the call where you expect them to choose 1 if they would like to make a reservation or press 2 to listen to the opening hours. But if the user presses 9 the event sent will be DTMF-9  and that’s not a possible transition given the current state. Ideally we’d like to find a generic way of detecting when the user has entered an invalid input and instruct them to make the selection again.

By defining our machine with strict: true we can cause the send() method to throw an exception if it’s passed an event that’s not possible giving the current state. We can then catch that error further on up and reply with an appropriate NCCO response that will tell the user to make the selection again.

Wrapping Up

In this post we introduced the XState library and how it can be used to control the progress of a call in an IVR System powered by Nexmo, in a way that scales well for a real-world use-case. The complete code covered in this post can be found here. If you’re looking for more info, both Nexmo and XState have excellent documentation.

Leave a Reply

Your email address will not be published.