It never ceases to amaze me what builders can create when given tools and the space to collaborate. Case in point: the team of computer science students from Princeton who won the Vonage-sponsored “Best Use of a Vonage/Nexmo API” prize at the Fall 2017 PennApps collegiate hackathon. Their hack (Lyff), a great combination of innovation and practicality, used APIs from Lyft and Nexmo to allow users to book a Lyft with a phone call when network connectivity is too weak to use the Lyft app.
We invited the Lyff team of David Fan, Akash Levy, Zachary Liu, and Selina Wang to visit Vonage headquarters in Holmdel, New Jersey. During their visit, we recorded a video of a wide-ranging technical discussion they had with Nexmo Developer Advocate and Alexa Champion Sam Machin and Software Engineer on the Vonage Garage team Kevin Alwell. Watch the video to find out how Lyff works, why open source is a powerful thing, how the Nexmo Amazon Lex Connector was used, and much more. (You can read the full transcript below the video).
What Is the Lyff Hack?
Glen Kunene (Editor in Chief at Nexmo, the Vonage API Platform): So, I’d like to kick off the discussion by just having the team—anyone can jump in—sort of describe what is Lyff is. What is this project that you built? What does it do? What problem does it solve?
Akash Levy (Electrical Engineering student at Princeton University): So, Lyff is basically, it’s a phone service that allows you to book a Lyft without access to the internet. Essentially, you call a phone number and from there it walks you through the process of ordering a Lyft to a street address which you specify. It makes use of the Nexmo API of many Amazon services and also the Lyft API in order to do this.
Kevin Alwell (Software Engineer at Vonage Garage): So I think in addition to what you mentioned is that it’s not only special because it books a ride for you. It’s not just an application that books a ride for you, but it was built off of really use case which is that Selina, she didn’t have internet connection which she’s looking for…well, she was looking for a ride when she got off the bus 45 minutes away from PennApps. So, you don’t actually need an internet connection and you can still fulfill this ride request. So, maybe you want to talk a little bit about that piece and how you specifically ended on why the voice is such an important piece of your Hack?
Akash: It tends to emulate more of the traditional type of ways that we would order taxis. So, with taxi services in the past you call a phone number, you order a taxi to a certain location and then you can specific where you want to go. And this kind of enables that kind of service for legacy devices, devices which aren’t connected to the internet, devices where you don’t have necessarily have access to the internet at all times. Potentially, in places where you have low coverage, for example, that this might be an extremely useful service for getting from point A to B.
Kevin: What might be hard to understand for some people, the significance there, is that we’re taking the request through voice. We’re mapping it to an online third-party service or a stream of them which was like the Amazon Lex, we’re mapping it to the Lyft API, the Nexmo API is kind of passing in along and facilitating that connection there. And that’s really an important piece that I definitely want to make sure we emphasize because that was huge for us obviously as Nexmo, why you use that voice API, so.
David Fan (Computer Science student at Princeton University): There’s a surprising number of areas even in like really densely-populated places where you don’t have coverage. So far, for some experience in the summer, a lot of time you didn’t have coverage. It would help if I could call, really, just call, book a ride, instead of walking to somewhere else to get better coverage.
Akash: So, from our own personal experience, we’ve seen that even in highly-populated areas you tend to have these potentially problems with buildings blocking cell phone signals and things like that. And so, in situations like that, it’s easier to just look at your surroundings, find an address where you want to be picked up and then communicate that in a different way.
Sam Machin (Nexmo Developer Advocate and Alexa Champion): I think one of the interesting things from my point of view with voice as well as compared to data, you see about coverage, that usually once you’ve got a voice call up, it will stay up because when you make a voice call you kind of reserving a channel in the network for that call. So, it’s not like if you got a weak data signal, you go, “Oh, yeah, I’ve managed to find a way.” And then you can’t, like, complete the process which is even more frustrating that it dies. At least, generally, with a voice call, as long as you stay where you are, the capacity is guaranteed for the duration of that call once you’ve got through.
So, especially, even when we say about coverage, but in busy environments like you’re coming out of a concert or something and everybody’s hitting the data network and you’re just trying to spread it thinner and thinner. Whereas if you can get your voice call through, you can complete the transaction and then get off the network and give it up. Sometimes it just doesn’t really rescale out well enough to cope with lots and lots of people all trying to make the same data requests.
Kevin: And so I guess that poses an interesting question to the team which is that, “What if I am coming out of a concert and trying to make that call and book a ride? How effective?” I guess, obviously you didn’t build Lex, the NLP engine, but how much interference can it actually handle so that it books you a ride accurately?
Zachary Liu (Computer Science student at Princeton University): Yeah, we had some challenges with the accuracy of Lex especially because we’re going over a phone connection which is not as high quality as, say, speaking directly to an app on your phone, but we did see that we’re able to make some improvements the Lex Connector to improve, develop reliability of the recognition. We did see that it is possible with some tweaks to have fairly reliable recognition and we can also tweak the interface to confirm your location and to improve the experience.
Akash: Yeah, so that’s a really great point that Zach made. A big component of our project was actually in building confirmations for this fulfillment task. Basically, a lot of the steps through the pipeline were basically confirming, “Hey, is this the address you want to go to?” Basically, saying because we did notice that a lot of times the address where we would put down would basically not with the same one as that the person was saying. So, yes, I think that was a big piece.
Kevin: The Google Maps API was really helpful with that, I’m sure.
Akash: Absolutely it was.
Using the Lex Connector
Sam: So you guys are doing a geocoding on the input string by selecting, you’re capturing. In the Lex bot where you’re doing…what kind of slot were you using? Is there an address slot I think within Lex, isn’t there?
Akash: Yes, there is an address slot. So we were using an address slot and then using mapping that with Google Maps geocoder.
Sam: Yeah, confirm because, obviously, the slot just looks the structure of an address so then you give that to Google Maps and it will actually make it uses a real address.
Akash: Exactly, yup.
Sam: So, you guys were saying about some of the tweaks you made to the Lex Connector. I’m looking at your poll request which is really interesting because I wrote the Lex Connector originally. It started off as a bit of a Hack project.
So, what’s quite interesting is I think a lot of the stuff that you guys are playing with was the timers. So, some of those values, just the short and long things, which was kind of make sense. And I think, I do wonder whether some of the times is almost like you set the times for a specific scenario. So, different interactions have maybe shorter or longer phrases so you want shorter cut-offs, longer cut-offs and tuning there.
So, one of the reasons I’ve been kind of looking at the stuff you guys are proposing and thinking, “Should we just sort of take these times and make these the default? So, should we be kind of making them more of a configurable parameter on a per?” We’ll have some defaults and then make it easy for you guys to just set them for what works for your app.
So, that’s part of the reason why I’ve merged it. Yeah, actually, we’re kind of looking at a testing on different cases. But the other one, you got some other phone calls. Quite interesting. So, there is actually now an improvement to the Lex API to take most of a narrow band speech. So, right now, we kind of upscale it to 16 kHz and feed that in which is more like the audio you get off your phone, but a phone call audio is only 8 kHz.
So, now, Amazon have models for narrow-band, for 8 kHz so, hopefully, as well, with the next update we’ll be able to support that and it should give you better accuracy because right now it’s trying to sort of do high-def accuracy on narrow-band audio. Whereas, if you’re now doing narrow-band accuracy on narrow-band audio it should be a bit more accurate.
Akash: That’s really cool. We were actually looking for a feature like that when we were building the Hack and we didn’t see it. So that’s really great to see that these improvements are being made.
Sam: Yeah, so, now, we’ll just down-sample it before we go off to. It’s quite easy because actually, in the Nexmo API were up-sampling. So the call comes in off the phone network at 8 kHz, typically. We up-sample everything to 16 kHz because our core network and sort of best potential because some stuff is 16 kHz.
So, internally, everything runs at 16, but then right now, so we’re parsing out. So, some of the test I’ve been doing recently, we’re struck going from 16 back down to 8. So, really, actually, undoing the up-sampling that’s being done in the network so you’re not losing any information because the original source was only 8 anyway. So you’re just stripping out the stuff you’re creating. It’s looking pretty promising. It’s definitely more accuracy and some of the APIs I’ve been playing with, like, the Google API is a lot better, works at 8 kHz. So, it was kind of interesting to see how that works.
Akash: Yeah, I think Zach can talk more about this, but another big component of the thing we changed in the Lex Connector was we added a more parallel model and that we added like multithreading. And I think… yeah, Zach, do you want to…?
Zachary: Yeah, I think one of the bugs we encountered was a delay introduced by the cell network. There’s maybe a half-second to one second delay between when you say something and when the server receives it, and same in the reverse direction. And during that time, static and other background noise sometimes get picked up as speech. So, that’s something we had to deal with, detecting when speech starts and stops, and also having that work well with the delay in the network. You need to compare to, say, an app.
Sam: That’s one of the things with all these voice APIs is that most of the APIs are designed to work with like a mobile app or a hardware or appliance, like an echo or whatever like that. And those devices are doing very locally the kind of voice activity detection to work out when to capture audio or not. So, you got the initial capture and then the further exchanges.
So, obviously, when you first invoke it, you capture audio until you get silence and then you send it, but it’s those follow-up questions. The problem with a phone call is that the audio is permanently open. So, yeah, it’s trying to figure out ways we can control the listening so that we say actually, “Yes, we’re always listening,” but just because there’s audio and you wait till we’re kind of prompting for input, for a follow-up question.
Akash: Exactly, so that was kind of the big modification we made to the Lex Connector.
Zachary: Yeah, that’s definitely one of the big challenges with voice recognition in general. I know that’s one of the big areas of improvement is speech starting and stop detection.
Sam: Yeah, and everyone thinks it’s always about the wait call stuff and all that went it, “Oh, great, yeah, we can wake it up and we can make it start listening,” but actually, it’s easy enough to know when to start listening. It’s harder to know when to stop listening and especially if people with unnatural pauses in, because you’re doing fairly crude sort of audio processing at that point. There was none of the machine learning and natural language processing really coming in to play. You’re just listening for sound and at that point, you’re thinking, “Well, is this end of the speech and am I processing a strange phrase?” or that kind of stuff.
Zachary: None of the key you get from speech, the computer’s reboot force knowing it’s either on or off, either speech or not speech.
Sam: Yeah, it goes back to the whole thing that I’ve been having for years and kind of trading stuff about how much communication is nonverbal and body language, and gestures. Now, it’s all coming back into thinking, “Oh, am I talking to these assistants with speech.” Obviously, all that body language and gesturing that we do as humans is completely lost, so we’re trying to have natural human conversations, but with purely, the verbal audio piece. Yeah, it’s some really interesting challenges I think for the future of this stuff.
The Hackathon Experience
Sam: One thing I’d be quite interesting to know is, like I’ve done a lot of Hackathons and things myself and it’s kind of my hobby as well as my job now. But did you guys kind of find the technologies and the APIs, and then kind of fit the use case? Or did you think of the thing you wanted to build and then figure out, “Oh, how can we do this? What can we use? Oh, we could use Lex, we could use the Lyft API, we could use Nexmo,” that kind of stuff?
Akash: It was a combination, actually. So, I think we were looking at both like in terms of the APIs that were available at the Hackathon. We were looking at those and thinking, you know, “Which ones could we build a Hack which would utilize these in a new noble way that’s useful?”
So, we were kind of looking at that and then we were looking specifically at kind of the… so, I’ve never seen Nexmo at a Hackathon before so I was looking into what their product did. And when I saw that, I was like, “Oh, there are so many cool use cases here.” And then we started looking through the other APIs and thinking about how we could combine them. And then Selina’s experience came in.
Selina Wang (Computer Science student at Princeton University): Yeah, when Akash mentioned this project, I was like, “Hey, that would’ve been pretty helpful just now.” So, yeah.
Akash: Yeah, so, I guess, maybe we could walk you through the timeline of like how our project progressed. So, yeah, I would say, when we got there, we got there around…
Zachary: Say like late Friday night.
Akash: Yeah, late Friday and I’d say we’d spent most of the time at the beginning just thinking about different project ideas, brainstorming. I’d say we spent most of the first night, Friday night, thinking about ideas and just exploring like we kind of went through a lot of the API websites and looked at those.
Zachary: And even on Saturday we spent a lot of time experimenting with the APIs and learning how to use them. As if it like the first time most of us had used Lex, the first time that most of us have used Nexmo and Amazon services. So, spending a lot of time, figuring out how to use these services and learning what their capabilities were and what their limitations were was also a huge chunk of our time.
David: Yeah, around like really late on Saturday, 3:00 a.m. on Sunday, we were trying to fix the raise condition in the Nexmo, in the connector, and then we fixed like maybe an hour before the submission deadline.
Akash: Yeah, so I think it helped a little bit that I’d had some experience with Amazon Alexa ahead of time which isn’t the same as Amazon Lex, but it generally relies on similar technologies. So, we used the Lex basically to do most of the voice processing and taking it through the process of booking Lyft. I don’t think any of us had used a Lyft API before, but we actually found it very easy to use. And Zach actually took some time to reverse engineer the process of logging into a Lyft in order to book it. Zach, do you want to talk about that a little?
Zachary: Yeah, so the Lyft API was primarily designed for use case on a mobile app where you can launch a browser and have the user logged themselves in, but our use case was pretty unique that we can only recognize things by voice. So, we realized that the Lyft API allows you, so the Lyft website allows you to log in using a code that’s sent to you by text message, but this was not exposed through the API.
So, we did some reverse engineering to see what the request the website made while it was sending these codes and receiving the codes in the backend, and integrated that into our Hack. So, we did a little bit of reverse engineering to figure that out.
Akash: Yeah, and so that was a pretty big piece of the Hack. In terms of productionizing it, we’d potentially want to work with Lyft in order to get a correct access to this type of API potentially through other means. Yeah, and I think that was a really big piece. It was just kind of figuring out how we wanted to make this work.
I think our Hack was more complete than I was expecting it to be by the end. We had essentially all the pieces that we needed for it to be productionized there. The final steps that I think that would need to be to take this project from just being a Hackathon project to production is kind of we would want to make it a better stateful.
Another thing that we implemented was a stateful kind of… the call is basically stateful. So, you can hang up and take it back up and then you can go back to where you left off. And I think that’s a really useful feature in the case where you have kind of spotty coverage. This will help you pick up the process where you left off and I think that’s really important for the kind of purpose that we built this for.
David: And it’s also we’d have to maybe give the user more permission with the cards. So, maybe like the license plate number or something.
Akash: Yeah, and other than that, I think towards the end, so we actually never tested our project on an actual Lyft. We were using the sandbox API which is basically you can see what… like it basically allows you to book a Lyft in the development mode, so it doesn’t actually contact their servers and make requests that you don’t want.
So, we didn’t actually test that, but there would generally be more information we’d want to give to a user about like where they’re being picked up, how many minutes away the ride is, what the license plate number is, what the car looks like, something like that. That kind of information we want to communicate to the user in a better way. We didn’t have much time at the end. This was something we would have added on later. When productionizing, this would definitely be an important thing to add in.
Lyff-inspired Use Cases
Sam: The way I see Lyff, one of the big use cases to it, so Lyft users, the customer rides right now because all these phone calls… obviously, you’ve got the piece of booking. Typically, you book your ride via the app, but then the communication with the driver is still done over a phone call or a text message normally. And I think there’s some really interesting kind of scope there for integrating the two.
So, actually, if you’re in the phone call, you can phone that phone call. It can then transfer you to a driver maybe or something. So you’re already on a call and it books the Lyft, and with deeper integration, because people right now are switching between an app experience and a phone call experience anyway. So if they’re going to start off in a phone call experience then let’s make it a single one.
Akash: Yeah, 100% agree. So, I think with deeper Lyft integration, this would be a fairly simple feature to add and definitely making use of some of the more intricate Nexmo APIs with doing voice proxying, the call proxying. We actually looked into that while we were doing the project, we explored that API a little bit. We couldn’t really figure out how to integrate that with our project, but that definitely would be, if we had cooperation of Lyft, that could definitely be a further…
Sam: But right now you need to kind of access to the real driver’s phone number, not the number that we give the app which is a proxy number because you…
Akash: That would be on the Lyft side a little bit more, but it would be very cool.
Sam: So, interestingly, I was thinking about is like this use case, everything today, people are doing in apps where we’re ordering our dry cleaning, ordering pizza. And do you think that those sort of same model would work for other things as Lyft as well? So, you know, actually we could build more of these bots on phone calls to access services as a complete alternative to mobile apps or services. Anything you’ve thought about that?
Akash: Absolutely. I think, you know, it’s very good to have a backup in the case that there’s something goes wrong with our internet connections. There’s no guarantee of reliability 100% of the time and in the cases like that, I think our phone services tend to be more reliable generally than our internet services do and even today.
And hopefully, we should see that these kind of services are used as a backup and not as the main service because it generally tends to be quicker and easier to use an app in most contexts, but as you said, I think it would be very nice to have kind of a phone service in the backend as a backup solution for a lot of these things.
Sam: But, I think, especially now with things like this sort of AI bot platforms being able to sit there as a backend, it means that companies that were running a phone service, actually answering a phone and taking orders was really expensive for a restaurant or something traditionally because it was quite time consuming. Whereas you know somebody can place their order slowly online and browse, and surf, and things, then boom, it just sends a little print out, or a little message to the restaurant that says, “Here’s your food order,” or whatever.
So I guess using bots now we can get all the benefits of doing something kind of automatically for the business, but the consumer can still have that voice interaction if that’s what they want. You know, you don’t want to install an app for everything, like I order a pizza maybe once a month. I don’t want an app on my phone I’m going to use once a month. I just want to have a number saved in my phone book.
Zachary: And I think also optimizing the voice experience also unlocks a few new markets and new potential applications. You mentioned voice bots, but also for the blind who can’t use an app in a traditional apps, these also unlocks new potential and you can use voice to request a ride and interact with other apps in a purely voice way. So along with that I ‘ve also seen ways so you can…
Kevin: I can see. That was a good point.
Zachary: Yeah, so streamlining the voice experience definitely.
Sam: Big sensibility, isn’t it?
Judging the Lyff Hack
Glen: So actually I have a question for Kevin. So you are on the other side of this at the actual PennApps Hackathon and saw this as a submission, and the Vonage Garage team had actually already built kind of an app that use SMS to accomplish similar things. So, I’m just wondering about the day of reviewing their submission and what kind of set them apart, and then maybe talk a little bit about the similarities between what they built and the Cab Bot.
Kevin: Sure. So, there are lots of similarities. From a high level, it’s actually very much the same as what we built for the Cab Bot. We ended up using different services because what we actually built, just to kind of fill in the team here, is a text-based, so, using the Nexo SMS, and a text-based way to fulfill a ride requests.
And rather than using Lex, we used something called APII which is Google’s NLP platform. And something that stood out to me when I spoke with you at the Hackathon, obviously, I thought it was a genius idea because I had the same idea myself, right? No, I’m just kidding, but I thought it was a really great idea.
But what really stood out to me was the engineering potential of your team because not only did you identify a problem and developed a solution, but you did that without, you know, even though you faced challenges and what you tried to pull into your application, the tools. So what you did was you actually created a new tool or you modified the tools that you had at your disposal to make your app work. And so, I know for myself and the other judges, that was a really huge thing because, obviously, there were other Hacks that used the same API, the voice API, the SMS API, but that really stood out to me and, yeah, I think it was a Hack well-done.
The Power of Open Source
Zachary: So I think that goes to show the power of open source even with a corporation like Vonage by having their tools be the open source with the community. People working on projects using their APIs can contribute back and find new ways to use products that the company may not even encounter.
Akash: Yeah, absolutely. I think the fact that the Lex Connector was open source was what enabled us to really make a working product in the end. Like we were able to tune it to our own needs of the product and actually that was what enabled the kind of call quality that allowed us to make our product worked.
Kevin: Yeah, absolutely. And I guess that kind of speaks in at least in part to the value of having open source projects for companies because, while we do have great resources internally, obviously, Sam is exceptional and his team is exceptional. You know, sometimes it just helps to have other people working on completely separate use cases and that pulls in that other thought of, “Hey, maybe we can expand in this direction.” And ultimately, the company benefits and the consumer benefits because it allows them to have the foundation which is what Sam built, the Lex Connector there, but also to build on top of it for their specific use case.
Sam: I think that was kind of it’s all new stuff, this. You know, this idea of interactive, these voice platforms. So we played around, we built it, but we didn’t know we got it right necessarily. So, how these open sources, is like you guys can improve it and tweak it. It’s still really a development. I mean, we call it a Lex Connector, or B to product anyway and the docs and things. So we have the hosted version really to get people started quickly, but all the different models of how people are going to interact and how they’re going to use it. Also, it’s one it’s being able to improve it, but also being able to extend on this stuff. We will develop a platform. We give you stuff to build with.
Kevin: And is it possible as the developer to think of every possible use case?
Sam: Absolutely, yeah. We don’t know how people are going to want to use things. So the more flexibility and it really is a balancing act between providing enough utility and value and giving you something that you can get working with quickly, but also then you can mold and shape and tweak and change the bit you want to change without having to kind of start from a completely blank shade. And, yeah, open source is really the model of that. But, of course, I think a lot of companies see it as either commercial or open source, whereas we can provide a commercial service and we can provide a managed service for you and we’ll run it or you can just take it and use it and you’re still using with our service. So, we kind of win both ways, but I think that is really useful.
Discovering the Lex Connector
Kevin: Yeah, I guess that kind of brings up a really another question which is that, “How did the team come across the Lex Connector?” How did you come across it during your Hackathon? Is it in documentation?
Akash: Yeah, it was. We were looking through the next API and we were looking specifically for Chabot functionality. And we saw the Lex Connector and we thought, “Oh, this is like exactly what we want.” And it was especially good for me, having used Amazon Alexa before, the Lex kind of provides a similar framework to the Amazon Alexa framework. And so there was kind of a little bit of easy to transfer my knowledge from one area to the other. I worked on mostly the Lex aspect, most of it, yeah.
Kevin: And how did you feel about the documentation for Nexmo overall as far as launching your account, creating a number, finding the number, hooking up the web hooks and everything? How was that overall experience?
Akash: All of that was documented very well. I think the Lex Connector in particular was a little bit tricky to setup and get that connected, but I mean, it’s a new product.
Kevin: I mean, sometimes it just happens to be the nature of the functionality, right? That it’s not so trivial such as if you’re sending an SMS, all you have to do is input the number in the message, but there’s other considerations.
Akash: Yeah, but setting up the number, getting all that stuff setup was fairly straightforward. We were pretty impressed.
Sam: Did you guys try the hosted version? So, I say, we run it in two modes. We have the hosted and then we have it as open source. So, did you initially kind of try the hosted one to do this, well, “Hello, world” level stuff and then fork it?
Akash: Yeah, we actually did use the hosted version originally, the one that’s hosted on the Nexmo side, and that’s where we started seeing some of the issues that were for our specific use case. And then Zach started looking into that, he spent a couple hours diving into what other problems might be and going and making an actual fix for that.
Zachary: Yeah, it was very helpful having the hosted version and especially having the copy and paste samples available on the documentation because it let us try out the service and immediately verify that this was going to work for our use case. And only once we hit some limitations and then we branched up into our own custom version, but having the hosted version here available to use is very helpful.
Sam: That’s a really good thing because it’s a bit of a new model for us by doing these things. So, we certainly need these middle wire connectors as we’re expanding out the platform. And I think it is the right kind of approach, and the end goal would be to have as fully productized and deep integration as possible, but still we want to keep the… probably keep the open source option there.
So it’s trying to understand that develop a journey of, say you want that quick easy discoverability and testing because you don’t want to spend hours spinning up servers and building code to find that it isn’t suitable, but you need a test drive, don’t you, really, know that it’s in the right space and it’s worth pursuing.
Akash: Absolutely. Also, we built it incrementally. So, kind of our initial solution was just saying, you know, just doing data input and then sending it back to us. So that’s kind of our sanity check and once we got that working, that was like a big, we’re like, “Woo.” And that’s how we knew that we could actually take this forward and built it into full product. So, it’s good to see that initial, you know, the first time we called the number that we bot and it actually said, “Hello. Welcome to Amazon Lex.” And then we were like, “Oh, this is really great.” So, yeah.
Glen: So, guys, I’m actually interested in any kind of burning questions you may have specifically about the Connector or maybe more general for Sam, seeing that he’s kind of the architect of the tool, just based on your experience.
Akash: Yeah, how commonly is it used right now? And do you know what the specific use cases of other use cases might be for it?
Sam: Okay. So, yes, who’s using or how much use, and who’s using Lex. The entry answer is we don’t necessarily know exactly who’s using it because we’re a platform and it’s self-service. So we’re seeing traffic on it certainly on the hosted version and then when we get out to the “run your own version,” you know, it’s open source, this is one of the trade-offs you don’t really know exactly what people are doing with it. But it’s predominantly in the sort of customer service, the use cases. Certainly, the questions, I mean, mainly what I see is when people ask me questions on the platform and things and most people are doing something, they’re using it to sit in front of their call center or their existing telephony service and offload some of that traffic.
So, you call a number and the first person you speak to is the Lex bot and they kind of ask you the basic questions like, “What are your opening hours?” or, “I want to check something’s in stock,” and then they transfer the call over. So one of the big use cases really I think that maybe we haven’t properly or we can make even easier. Well, now, you have to be in work, is the idea of talking to a bot and then the bot deciding who to transfer you or where to transfer your call to a sort of real human operator, and it’s this hybrid approach rather than the pure, “I just want to speak to a bot.” I think, in that way, your pack is probably more unique. It’s maybe 20% of the scenarios are pure bot and 80% of a hybrid bot human way of working.
I guess the other one, there’s other interesting startups. There’s a couple in New York actually I have spoken to, who were doing AI assistant. So, it’s not so much that you call a bot, but the bot makes calls on behalf of you to the services, you know, “I want to cancel my gym membership.” You have the assistant call the gym and negotiate with the gym to cancel your membership for you so you don’t have to make those kind of calls, and people have actually offloading tasks they don’t want to do and calls they don’t want to have to make, the bots to make the call for them which is, yeah, very interesting.
Akash: That is very interesting. We never even considered that possibility.
Zachary: That’s right.
Sam: Yeah, I mean, the whole model works both inbound and outbound. It’s a little bit odd I guess if you’re a regular kind of person on a front desk in a company or something, and you got a phone call and then it’s from a bot and you’re going to think it’s marketing, but yeah, it’s getting there. There’s a few scenarios like that and eventually, we’ll just get to a world where it’s bots calling bots and talking to each other properly.
Selina: Like, does it actually work though, like that company’s product?
Sam: Yeah, there were a couple of companies out there that are doing virtual PAs not massively in the voice. Most of them are experimenting with voice. There’s been a couple of them around with email recently that’s like you can CC in this virtual assistant into your emails and say, “Oh, yeah, set up a meeting with Fred on Monday,” and then the assistant takes over the email conversation and companies like that are now experimenting with adding voice.
So, what we’re seeing is they’re going, “Oh, okay, we know people,” and people are paying hundreds of dollars a month for these assistants, these bots to just organize your meetings for you. So, that’s definitely, and it seems a couple of hundred dollars a month seems a lot for a bot, but you look at the cost of having a full-time PA or an assistant and it’s bringing it down there. So, I’m not quite yet seeing you enough for to afford or do employ myself a virtual assistant.
Zachary: I had a question about development of the Lex Connector. Did you work directly with Amazon in building this connector? And what was the level of collaboration between Nexmo and Amazon?
Sam: Yeah, so, I guess mostly we got Vonage, so I got access to the Lex Beta or whatever it was. It was announced that we invent last year, and it was in like a private. develop a preview, or whatever they call it exactly. So I’ll go on to that preview, so I had access to the docs, but predominantly I built it in isolation just from reverse engineering their docs. Again, like you guys, I’ve done a lot of stuff with Alexa and especially a lot of the hacking of them with Alexa has been the Alexa voice service. So I built a Raspberry Pi-based Alexa device and web-based one and some stuff there.
So, I was kind of, again, familiar with what I’ve done with Alexa and voice processing. I’m not just familiar with the Nexmo WebSocket, so I was like, “Oh, I could put these two together.” So there is the case where really just like assuming you were always going to embed your bot into an iOS or an Android app as your access channel. There was some documentation for how you talk to the audio API on its own, the post content endpoint. The authentication took a lot of guesswork because that really wasn’t documented. I was just, “Oh, yeah, just sign your response,” but you can’t sign a binary payload.
So that was a couple of days of just changing things and poking and seeing what worked. But the first version was pretty much totally reversed engineered from some preview docs, and since then we’ve been having conversations with them and today, maybe like the use case, they’ve got customers that are asking to be able to put Lex on phone calls. So, hopefully, I will have a couple of meetings with the team in Seattle and stuff about it.
Next Steps for Lyff
Glen: Maybe we can close on this question, unless anyone has, any other items they’d like to discuss with each other, but I’m kind of interested in thinking about Lyff or a product, and you’re looking at the product roadmap and version 2.0 and so on. Where would you imagine taking this, given time, given resources? I’m actually interested in the team’s answer, and just Kevin or Sam also, when they’re thinking about this as a starting point. Where could you take this?
Akash: Absolutely, so I think we’ve been talking about this a little bit. So, in terms of Lyff itself, I think there are a few things we could do in terms of improving the ease of use for the user. And we definitely need to complete it, make sure that the stateful events are working properly, things like that. And also, could upgrade the call quality now that I know that that is fixed.
So, those are the big things that we could do like cosmetic fixes mostly. But I think in terms of moving beyond that, I think we’d actually want to work with Lyft possibly. If we were to take this to the next level, we could talk with Lyft, potentially integrate this with their own Lyft types of service. As Sam was saying earlier, we could directly do a call proxying with the driver which would be really nice. I think that that would be a really great use of Nexmo’s API. And potentially, even in the future, we could extend this kind of idea to apply to other types of services, other types of automating things like perhaps Uber, for example, or other things where the APIs enable this type of service. Zach, do you have anything?
Selina: Yeah, actually, I think integration with Lyft would be the best way for our Hack to make the most impact because Lyft already has so many users. I think it’d be really cool if like the Lyft app could detect if you have like a weak connection and they could be like, “Hey, like, do you want to book through a phone call instead?”
Akash: Yeah, that’s a really good point. So, Selina was saying that, basically, in the app, actually, I think it already does do sort of weak signal detection and in that case, it could actually recommend calling a phone number.
David: Yeah, exactly.
Sam: Yeah, so it’s basic. It’s the error pages and it’s like the 404 page or wherever it goes, “Sorry, there’s a problem because the request has timed out,” or something. So, yeah, you now have that kind of process and if all you’re going to do is go from that error-handling scenario to offering the voice option as a fallback is really good. I think you’re right. It’s definitely something that it’s a great Hack and it’s a really good Hack, but you could never productize it outside of Lyft necessarily because what’s in it for you guys? You’re going to pay money to run a server so that Lyft can make more money, so really it would have to be a sort of thing you would sell these, whatever, provide to them and it’s in their interest to run it.
I think the reason for or the idea though that this is not a unique use case, the Lyft API, there were hundreds of companies out there and even the really small stuff, particularly on food restaurants, pizza places, the single shop that don’t have the resources, the marketer to have an app, but these guys. So, if you can build a repeatable framework and turnout these bots for them, then for you guys on as a team standalone, is not part of a bigger company.
I think there’s an interesting opportunity there to produce more and more of these bots directly for people who don’t have an app, or don’t want an app, or just want to have that kind of simple scenario and say, “Oh, yeah. Hey, we just deliver the order to you at your tablet and it’s behind the counter on a kind of email type message,” or something.
Kevin: And to build on that point Sam, oftentimes when do a Hackathon, for everyone who’s done a Hackathon here, you know that you have a propensity to kind of just throw out your Hack and move forward from there, right? But what I would do is I would definitely encourage you to kind of pay it forward to people who are going to be hacking on something similar in the future. And if you could open source what you have developed or maybe share a blog post or something that shares the information on the architecture and how you build what you built, in this way, someone who starts in the future can start from where you left off, and that’s how we kind of build up from there. So, in the future, we might be having a conversation on the other side of the table where you’re talking about your Connector and how someone built on that.
Sam: Yeah, I think you guys, certainly, your fork at the connect at the Connector’s open source, I didn’t’ see. Did you actually open, published the Lex Bot code as well? Or are you keeping that one there?
Akash: Yes, actually. So on our GitHub repository, we have the entire code base available and in the ReadMe we have a section on how to set up. So, I don’t know exactly how to export a Lex project. It’s all on the website. So, I’ve taken screenshots of everything that you need in order to set it up and gone through detailed instructions. Basically, I recreated the entire Lex project and made a step-by-step procedure for doing that since I wasn’t sure how to export the project.
Kevin: That were C, huh?
Kevin: That’s good.
Sam: That’s one of the problems, I think, with a lot of these tools right now. Traditionally, we wrote code and we have this big bundled zip file with all our code and our resources and everything in it, and we could just deploy it. Whereas, this sort of platform assistance, you have a lot more models and configuration that’s all put in via in kind of a GUI or you’re setting up different things. It’s not just a bundle of data that you can upload yet at this time. I mean, some of them have got sort of APIs.
Akash: I was a little surprised. I thought that there might be some sort of Lex configuration file that you could download and upload and things like that. It looks like they haven’t really gotten gone that far with the service yet. I mean, Lex is also preferably new service. I’m sure that they’d be able to get to that at some point.
Sam: I think there is now. I don’t know if it was launched. There was a Lex, and they got skill creation API or bot creation API now. So, and there’s certainly is for Alexa. So, one of the things released about a month ago, so for Alexa, was the skill management API and the command line talk. So now you can do it all from a command line which means you can have a single build folder that says, you define adjacent model, interaction model adjacent and your utterances and all that, and you sort of type ASK deploy or whatever and it pushes out there.
I think I have seen some stuff for Lex. It’s actually a while since I’ve looked to the bot side of Lex. So, I have to get back and see, but yeah. It’s getting there, it’s definitely the tooling around all of this stuff is, you know, the company is shipping the product first. Because obviously, they want to get the thing out there and then when they’re figuring out how people are using it and what the stumbling blocks are, that’s when the tooling tends to come along and improve things. I mean, Lex is really only been out to develop a preview six months, I think.
So It was announced that we invent last year. So it’s just after 12 months old and I think it was about April-May time, it went public. So, yeah, it’s a very new technology, I think. It’s a very new model. It’s not like anything that’s kind of come before and all is.
Akash: Yeah, I agree. Ultimately, I could see the goal being, as Zach was saying earlier, this could definitely be very good for vision-impaired people where it might be difficult to interact with an app directly. And I could see a whole suite of these kinds of services being built around different apps that we have on our phones.
The Value of Hackathons to Companies Like Vonage
Zachary: I had a question for Vonage and Nexmo. What excites you to do about Hackathons and what motivates you to go to Hackathons? Like, what’s the biggest value you see coming out of these events?
Kevin: So, we partner generally, the HR team kind of leads the charge, on the Vonage side. I know Nexmo has a DevRel team and that’s where Sam comes from. But from the Vonage side, it’s all about driving awareness for talent in the area. So, for talented people like you, talented engineers like you, we love to have you experimenting with our tools, improving our tools such as you did with the open source repo. We love to get out there and see what, I guess, the newest technology is that people are looking to expose themselves to. Because at these Hackathons, a lot of times, I know, myself, I’ll go out there and not realize some of the different use cases and how important machine learning, NLP, AI is becoming and all the possibilities.
I mean, just some of the use cases that come out of these Hackathons are just phenomenal. So, yes, so we love to get exposed to the talent, we love to hear what it is that you’re working on, what you think is important, kind of pull those ideas back into the organization. And if we can, obviously, invite people like you to come back and join us, and just kind of driving Vonage and driving our mission which is to be at the forefront and be innovative always, and challenging the status quo.
Sam: So, yeah, totally, first, like the thing with Hackathons, the Nexmo, the bit where the Nexmo side of the business sort of differs from the traditional Vonage side, is that Vonage tend to sell products to businesses and more of end finished kind of product company, and we sell building blocks to developers. We sell raw material that people would use to build products, and potentially Vonage can use the Nexmo products.
So, mainly, for us I guess one of the things we do Hackathons, we don’t normally get our Hackathon expecting to like recruit our next big customer. It’s great and we never know that you may get somebody who starts a project at a Hackathon, build stuff off our API and get a bunch of VC funding the next day and kind of suddenly is the next Uber or whatever, but that’s really quite a unique scenario.
Really, for us, one of the great things about Hackathons though is the product feedback. So, as a developer relations team, we actually sit within product in Vonage rather than sales or marketing or even engineering, and it’s because we’re really pushing forward the product. And we know that at Hackathon, you’re going to get kind of the best testing, pushing, stressing in a very intense products feedback and testing scenario that you wouldn’t get in a sort of normal sales, marketing, development cycle with customers.
So, you guys are kind of showing us what things to do with the product. And, like I said, you guys already fed back improvements, and that’s fabulous and when we get some great stories out of it like this, but also we got the use of the product and the education and the stuff you guys are trying to do, and the limitations may be that you hit up against in a Hackathon where the pressure’s on. We can go back and fix those or improve the experience there so that when the big customers come along at a later date it’s a better experience there. So really that’s kind of our main focus of Hackathons.Tags: amazon lex, hackathon, Lyft
This post was written by Glen Kunene