Building a Phone-Powered Photo Booth with Nexmo’s Voice and SMS APIs

Published September 01, 2015 by Tim Lytle

I’m always looking for something interesting to build at developer events. Earlier this year at php[tek], I decided to create a phone-powered photobooth.

Turning a user’s phone into the interface for this photobooth meant I could skip all the physical user interface you’d find in a traditional photo booth. No need for buttons to start the photo taking process, and no need to print the photos; my goal was to deliver the images straight to the user’s phone.

But that introduces its own set of interesting constraints.

Setting the Stage

At a hackathon a few years ago, my team had built a similar photobooth concept, and used a simple, mobile-friendly website to start the photobooth and deliver the photos. But each photo ‘session’ needed to be tied to a unique URL, and getting the user to that URL proved to be very difficult.

QR codes may be a great solution, but in practice the 5 minutes it took people to find that QR code app they installed years ago, remember how to use it, and then finally scan the code was quite an inefficient process.

Relying on a mobile browser and a data connection was also an issue. A solid connection was needed to control the booth, and that could be problematic over conference WiFi or a data connection deep inside a venue.

So when I decided to build this for php[tek], I needed a ubiquitous way give the user control of the photobooth and provide access to the photos once they were taken. I also needed to consider where this all would reside on the internet. I wanted the photos to be archived someplace that wouldn’t change, long after I took the application down.

Finally there was the actual taking of the photos. I needed a camera I could control remotely. Fortunately I had one, a GoPro. I just needed to reverse engineer how the GoPro app comminuted with the camera, and luckily, someone else had already done that for me.

The Building Blocks

To solve the issue of connecting the user to the photobooth, I immediately thought of Nexmo’s Voice and SMS APIs. To facilitate the real-time communication between various systems, I enlisted the support of PubNub. And to solve the tricky situation of making sure the photos stayed online and accessible with as little effort as possible, I used the favorite content host of developers everywhere, Github.

To connect the user to the photobooth, let them know what to do (‘make a funny face’), and let them know their session was over, I built a really simple IVR using PHP and Nexmo’s Voice API. I set all this up in the Nexmo dashboard, purchasing a phone number and then pointing that number to the URL of my application. When a user dialed the photo booth’s number, Nexmo would make a request to my PHP application.

That request included a few parameters like the user’s phone number (their caller ID) and the Nexmo call ID (a unique identifier for that specific call). Since Nexmo’s Voice API acts like a browser, I stored those for later in the simplest way possible, right in the $_SESSION variable.

session_start()

$_SESSION[‘number’] = $_GET[‘nexmo_caller_id’];

$_SESSION[‘callid’] = $_GET[‘nexmo_call_id’];

With the important data stored, the application then rendered a VoiceXML document from a simple template. Nexmo’s Voice API uses VXML to define the interaction between the user on the phone and the voice application. If you want to know more about VXML, check out this post on about how it all works, or take a look at our VXML Quickstarts.

For any inbound call to the photobooth, Nexmo would fetch the VXML from my PHP application. In this case, the VXML document was just a series of prompts for the user, that looked like this:

<prompt>

   Hello. This is the Nex mo photo booth. If you can, put me on speaker.

   Then press one when you’re ready to have your photo taken.

</prompt>

Once we got started taking photos, things got interesting. I wanted the IVR to coordinate with the photobooth process, prompting users to smile or make faces, letting them know the last photo was taken, making sure they didn’t leave the booth early.

To do this, I’d have to initiate the camera’s ‘shutter’ (using that term loosely when it comes to a GoPro) from VXML.The <data> element allows a VXML document to load data, so I abused that a bit, and ‘fetched’ data from my PHP application.

<data fetchhint=”safe” name=”dummy” src=”ivr.php?<?php echo str_replace(‘&’, ‘&amp;’, http_build_query([‘action’ => ‘photo’, ‘delay’ => 3,’number’ => $number, ‘session’ => $callid]))?>”/>

I didn’t do anything with the returned data, but by making the request, it triggered a message to a PubNub channel. PubNub provides an easy to use real-time messaging API that works on pretty much any platform. I could send a message from the PHP application running on a server someplace, and have the system connected to the GoPro listen for messages on the same channel.

The message was simply whatever query string was sent from the VXML <data> request – basically an action and a delay – with the user’s phone number and the ID of the call (remember, those were stored in the session). Because VXML expects the response to be valid XML, I had to send back dummy data.

$data = array_merge($_GET, [

   ‘session’ => $callid,

   ‘number’  => $number,

]);

$pubnub->publish(‘tekbooth’, $data);

echo “<?xml version=’1.0′ encoding=’UTF-8′ ?><dummy></dummy>”;

Since most callers don’t want a call to last longer than necessary, Nexmo’s Voice API tends to fetch any <data> as early as possible. To avoid fetching all the <data> at the start of the call, or even before the prompts were heard, I used a rather convoluted technique of throwing VXML events, abusing the <noinput> element to keep ‘prompting’ the user with different activities, and taking the last photo at the start of the next prompt to make the flow seem natural to the user. Take a look for yourself.

Now that a PubNub message published every time I wanted to take a photo, I just needed to hook that message up to the GoPro, and actually snap an image. PubNub has client libraries for pretty much anything you’d consider a computer, but for this version I decided to keep it simple and run it from my laptop.

Since I was already using PHP, I created a simple class that would subscribe to the PubNub channel. When a new message arrived, it would send the shutter command to the GoPro. I created a simple wrapper around the already documented GoPro Wi-Fi commands to control the camera’s shutter. Using the GoPro’s built in webserver, it also provided access to the camera’s files – the photos that had been taken.

$pubnub->subscribe(‘tekbooth’, function($data) use ($camera, $darkroom, $daemon){

   //setup default values

   $data[‘message’] = array_merge([

       ‘count’ => 1,

       ‘mode’  => ‘photo’,

       ‘delay’ => 0

   ],$data[‘message’]);

   $session = $data[‘message’][‘session’];

   $number  = $data[‘message’][‘number’];

   $delay   = $data[‘message’][‘delay’];

   if($delay){

       sleep($delay);

   }

   //take photo

   error_log(‘taking photo’);

   $camera->shutter();

   sleep(2);

   $last = $camera->getLastFile($camera::FILTER_PHOTO);

   error_log(‘got last file: ‘ . $last);

   error_log(‘adding photo to session: ‘ . $last);

   $darkroom->addPhoto($session, $last, $number);

}

Once the photo was taken, I needed to store it somewhere. I doubt this was a use case discussed during the design of their API, but Github makes it pretty easy to add or update a repository’s files through the API. By updating the gh-pages branch, I was able to publish a simple web page with the photo booth’s images as the photos were taken.

Because each photo was added as it was taken, the most complex part of the process was determining if the page needed to be created. If it did, the HTML was loaded from a template. If the file already existed, it was loaded from the repository. Once the content was located, the DOM was modified to add an <img> element for the new photo. Then the file was either created or updated.

$update = $this->client->api(‘repo’)->contents()->exists($this->user, $this->repo, $path, $this->branch);

if($update){

   $set = new PHPHtmlParserDom();

   $set->load($this->client->api(‘repo’)->contents()->download($this->user, $this->repo, $path, $this->branch));

   $info = $this->client->api(‘repo’)->contents()->show($this->user, $this->repo, $path, $this->branch);

} else {

   $set = new PHPHtmlParserDom();

   $set->loadFromFile($this->page);

}

//do some DOM manipulation

$div = $set->find(‘#photos’)[0];

//—//

//format the HTML

$content = MihaeuHtmlFormatter::format((string) $set);

$content = preg_replace(“#ns*n#”, “n”, $content);

if($update){

   $response = $this->client->api(‘repo’)->contents()->update($this->user, $this->repo, $path, $content, ‘Adding Photo ‘ . PHP_EOL . $file, $info[‘sha’], $this->branch);

} else {

   $response = $this->client->api(‘repo’)->contents()->create($this->user, $this->repo, $path, $content, ‘Adding Page and Photo ‘ . PHP_EOL . $file, $this->branch);

   $this->addSession($session, $number);

}

That process only included the HTML not the actual photo, because this happened in real-time, as messages were sent from the IVR over the PubNub channel. Uploading the actual photo takes some time, and happened in a separate process. For this, I created another class that just observed the GoPro’s filesystem, and uploaded any new files that were discovered.

Before the files were uploaded, I added a watermark with the conference logo and Nexmo’s logo (after all, the photobooth was at our table).

error_log(‘checking gopro files’);

$imagine = $this->imagine;

$new = [];

foreach($this->gopro->getFiles(GoPro::FILTER_PHOTO) as $file){

   //check if we’ve already seen the file

   if(in_array($file->getSequence(), $this->list)){

       continue;

   }

   $new[] = $file;

}

error_log(‘found files: ‘ . count($new));

foreach($new as $file){

   //read and load the file

   $content = $this->gopro->download($file);

   $photo = $imagine->load($content);   

   //—watermark the photo–//

   //send to github

   $this->darkroom->upload($file, $photo->get(‘jpg’));

   //mark as done    

   $this->list[] = $file->getSequence();

}

if(!count($new)){

   error_log(‘no new files, waiting a bit’);

   sleep(60);

}

The initial VXML request from Nexmo triggered a call to Nexmo’s SMS API, and sent the user a link to the GitHub page where they could see their photos. Once the call ended, they’d have the link, and could view their photos. Since the filenames were based off the call ID, the SMS could be sent before anything was uploaded.

$this->sms->send([

   ‘to’ => $number,

   ‘from’ => $from,

   ‘text’ => ‘Get your photos here: ‘ . $baseUrl . ‘/’ . $callid . ‘.html’

]);

The Result

Screen Shot 2015-09-01 at 4.57.56 PM

Once assembled, I successfully created a phone controlled photobooth. I hosted the PHP application on AWS, and only needed to keep it running for the conference itself. Once the first user started the process, the same repository that contained my application’s code now had a gh-pages branch with the output of the application.

You can find all the code over in the tekbooth repository. Note that in this post I’ve abbreviated or modified some of the code examples to express the concepts more simply.

If you want to see some of the photos, you can reverse engineer the Github pages links, but to make it a bit easier, here’s an example of the results with a fellow sponsor, and a hot dog.

What’s Missing

Since this was a prototype of sorts, there were certainly things to improve. The biggest trouble spot was handling two Wi-Fi connections from my laptop – one to the venue for internet access allowing my laptop to subscribe to the PubNub messages, and one to the GoPro to control it.

Either the noisy environment (all the attendees devices) or the rather cheap USB to Wi-Fi adaptor I had made it difficult to keep both connections live at the same time. If I polish this in the future, I plan to replace my laptop with a Raspberry Pi and use some higher quality Wi-Fi adaptors.

I’d also allow users to login using their phone number. While Nexmo’s Verify API makes adding second factor authentication simple, in cases like this, it can be used to enable passwordless login. The user’s phone number is collected when the call the booth, so authenticating with their number could allow them to caption their own photos, or easily share them on social networks. If you take a closer look at the code, you’ll see the groundwork for this concept already there.

Of course an index page of all the photos is an obvious feature that was missing in the prototype.

These are just a few of the features that could be added. A screen with a preview, and maybe some visual cues that match the IVR would be great. A real-time feed as photos are taken is another possibility. Maybe the booth itself could tweet – but hey, it’s open source, so even if I never get a chance to polish it up, you can fork the project and have your own photobooth.
If you’re inspired by this, head over to our latest developer contest and build something else with Nexmo and PubNub.

Leave a Reply

Your email address will not be published.