Speech-To-Text with Nexmo and Microsoft Azure

Published March 18, 2019 by Martyn Davies

If you’ve ever found yourself in need of something to help you receive inbound phone calls and automatically transcribe them in real time you’re in luck, because you can do that using our newly updated Nexmo-to-Azure Speech Service connector.

We’ve recently updated the code and deployment options for this connector, so it’s now even easier to deploy, modify or extend if this matches a problem you’ve found yourself trying to solve.

If that has already sold you on it, and you’re eager to get going, you can check out more details in our nexmo-community Github repository.

How the App Works With Azure’s Speech Service

Microsoft’s Azure platform provides a great set of Cognitive Services via API that allows you to work with Speech, Vision, Language and more. This app uses their Speech-To-Text API to recognise audio being streamed in real time via a websocket from a phone call facilitated by a Nexmo Call Control Object.

Put simply, you literally call the API and talk to it. Azure Speech performs recognition on the audio, and the phrases returned to the console.

Nexmo & Azure Speech Service

Running Your Own Instance

This app falls under our Nexmo Extend programme, where we create useful and reusable applications to help you get up and running using Nexmo with other great service providers like Microsoft Azure, Google Cloud and Amazon Web Services.

We’ve made it easy for you to deploy, and immediately use your own instance of this application, in as little as one click.

One-Click Deploy Options

You have the option of deploying the app to Heroku, or Azure via the buttons at the top of the Readme in the GitHub repository.

However, if you’d like to deploy it and have a safe (breakable!) way of working with the code directly from your browser, try remixing the app on Glitch instead and start extending the codebase straight away.

Deploy/Run With Docker

This app is also available to run or deploy with Docker. The quickest way to do this is to clone the repository, and from within the root directory use Docker Compose to set things in motion by running:

Whichever deployment option you choose, you’ll end up with a new hostname where the app is running, so you’ll need to link your Nexmo virtual number to it to complete the setup.

Linking the App to Nexmo

Create a new Nexmo application for this app:

Using the Dashboard

  1. Sign in or create a Nexmo account
  2. Buy a new virtual number
  3. Create a voice application
  4. Add the event URL – https://<your_new_hostname>/ncco
  5. Add the answer URL – https://<your_new_hostname>/event
  6. Click Create Application
  7. Click Numbers and link the recently created virtual number.
  8. Copy the virtual number for use in the next step.

Using the Command Line Interface

You can install the CLI by following these instructions. Then create a new Nexmo application that also sets up your answer_urland event_url for the app running locally on your machine.

This will return an application ID. Make a note of it.

Rent a New Virtual Number

If you don’t have a number already in place, you will need to rent one. You can achieve this using the CLI:

Link the Virtual Number to the Application

Finally, link your new number to the application you created by running:

Try It Out

Now, with your app running – wherever you deployed it, call the number you assigned to it and start speaking. After a brief pause, you will see whatever you say written out to the console, in real time. Below is an example of this:

Example GIF

How To Extend This

The next logical step would be to start pushing the phrases returned by Azure Speech Service out to another service that will consume them and act on what is received.

You can do this by modifying the on_return_message function, which currently ends like this:

Using the Requests library (which is already a dependency, so no need to install it again), you could POST the phrases as a JSON object to another API where they would be consumed, and subsequently acted upon. To add this functionality, change the ending if statement in on_return_message so something like this:

Each time a new phrase is returned by the Azure Speech Service, a {"phrase":"Words returned by the app."} object will be sent.

What you do with it next is up to you!

If you do extend this application, or you have questions about how it works then please head over to the Nexmo Community Slack channel where we’ll be more than happy to help you out with any queries and listen to any suggestions.

Leave a Reply

Your email address will not be published.