Dual Channel Transcription with Split Recording

Published April 17, 2018 by Michael Heap

As part of our Voice API offering, Nexmo allows you to record parts (or all) of a call and fetch the audio once the call has completed. Today, we’re happy to announce a new enhancement to this functionality: split recording. Split recording makes common tasks such as call transcription even easier.

When split recording is enabled, the downloaded recording will contain participant A (let’s call her Alice) in the left channel, and participant B (let’s call him Bob) in the right channel. This allows you to work with the audio from a single participant easily.

In this post, we’re going to walk through a simple use case. Alice calls the bank to find out information about her account, and Bob is the customer support agent who answers the call.

Record the Call in Stereo

When Alice calls the number provided by the bank, Nexmo answers the call, plays an introductory message and connects it to the bank’s real phone number—recording all of the audio in the call. To accomplish this, you’d use the following Nexmo Call Control Object (NCCO):

The important part of this NCCO is the record action, which will record the audio and send the URL to https://example.com/recording once the call is complete:

To enable dual-channel recording, we need to update this action to contain "split" : "conversation" like so:

That’s all there is to it! When you fetch the call recording from Nexmo, you’ll have Alice’s audio in the left channel and Bob’s in the right.

Call Transcription with IBM Watson

Once you have the audio file, it’s time to transcribe the text. There aren’t many providers that accept dual channel audio and transcribe them separately, so for this post we’ll use ffmpeg to split the track into two mono tracks and transcribe them separately using IBM’s speech-to-text API.

To split your audio file into two files, run the following command in a terminal (you may need to install ffmpeg first):

Now that we have two audio files we can send them to Watson and get the text back as JSON in response. You can use your language of choice to do this, but the quickest way to get things working is by using curl:

This will give us two JSON files that look similar to the following:

Build the Conversation

As we requested timestamps, we can rebuild a timeline of the conversation as it happened. Once again, you can use your favourite language for this (I’ll be using PHP). The steps we have to follow are:

  1. Loop through JSON and merge all of the entries into a single list.
  2. Order the entries based on the start timestamp.
  3. Output the conversation in order, with the timestamp, name, and text shown.

The PHP code to do this looks like the following:

When we run this code we see our conversation as it happened:

Transcription Made Easy with Split Recording

Nexmo’s new split recording feature allows you to record two participants in their own audio channel, making transcription a breeze. To enable the feature, all you have to do is add "split" : "conversation" to your record action.

To learn more about split recording, you can read our product blog post on the release or check out the documentation.

Leave a Reply

Your email address will not be published.