Building an Image Classifier in Tensorflow

Published October 02, 2018 by Tony Hung
Categories:

In this post, you are going to build a basic image classification model for processing images sent by members of a conversation in an iOS app integrated with Nexmo In-App Messaging. After a user uploads an image, a caption describing the image will be displayed.

We’re going to use Python to build our image classification model. Don’t worry if you have not worked with Python or you have no prior knowledge into machine learning.

What is Image Classification?

Image classification in machine learning is when you have a photo, and the machine learning model will be able to tell what subject is in the photo. For example, if you take a picture of a dog, the machine learning model will be able to say “This is a dog”.

First, in order to build a machine learning model, we need to have data in order to train it.

A machine learning model uses training data for the model to learn. To start, we’ll need to choose the training data. For this post, we’ll use the CIFAIR-10 data set.

This dataset contains images in 10 classes, with 6000 images per class. Its a well used data set for machine learning, and it will be a good start for our project. Since the data set is fairly small, we can train the model quickly.

Running this Notebook

This notebook is hosted on Google Colab. Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

Note, you will need to have a Google account to run the notebook.

Running the notebook is super easy. In every cell that contains code, there is a run button to the left of the cell. Tap the run button to run the code. You can also use the keyboard command Shift then Enter.

Building out the Model

The first thing we need to do is import our packages.
These packages are pre-installed on Google Colab so we don’t need to install them.

Notice, we’re using Tensorflow and Keras as a frontend to Tensorflow.
Keras is a great framework that allows you to build models easier, without having to use the more verbose methods in Tensorflow.

Next, we’ll load the CIFAR data set. Using Keras, we’re able to download the dataset very easily.

We split the dataset into 2 groups, one for training (x_train, y_train), the other for testing (x_test, y_test).

Splitting the dataset allows the model to learn from the training set. Then, when we test the model, we want to see how well it learned by using the test set. This will give us our accuracy- how well the model did.

Next, we’ll declare some constants.

  • batch_size is the number of samples that going to be propagated through the network.
  • epochs are how many times we train on the full dataset.
  • class_names is a list of all the possible labels in the CIFAR-10 dataset.

We’ll use these constants later when converting our model into CoreML.

First, let’s have a look a few images.
We have a function that plots 4 random images and their corresponding label.

Building the Model

Now, we’ll setup a simple model. We are creating a deep neural network using convolutions, dropout, and max pooling.

In the end, we’ll flatten the network and use
Relu, followed by a Softmax.

This will give us a vector (1-dimension matrix), filled with mostly 0’s.

It will look like this.

This vector corresponds to the given label from the image
So in this example, the 1 in the seventh place would be a frog, since ‘frog’ is at the seventh place in class_names list.


The following shows the entire network.

That’s it!

Training the Model

First, we compile the model to get its loss.
The loss is a measure of how well the model did during testing. A high loss means that the model did poorly.

Here you use Adam Optimizer, an algorithm extending to a stochastic gradient descent widely used for machine learning, to calculate the loss.

Then we’ll call .fit which will train the model for 100 epochs. This means that the full training dataset will be trained 100 times,
The batch_size of 32 is the number of samples that going to be propagated through the network.

We then see how well it did after every epoch using model.evaluate.
It gives us a score for the model (higher numbers are better) and the loss (lower numbers are better).

Note, this took about 15 minutes running on Colab. If you want to see the results quicker, set the epochs parameter to 1 or 2. Its accuracy won’t be as good, however.

Our final accuracy was 81%, and our loss was 0.7, which is pretty good.

To reiterate, accuracy is how well the model was able to classify each image, while loss indicates how bad the model’s predictions were.

For more information, check out this definition of loss and accuracy on Google’s Machine Learning crash course.

Converting the model to Core ML

After we have trained the model, we can save it, then convert into the Core ML format.

Released at WWDC 2018, Core ML enables iOS developers to integrate a broad variety of machine learning model types into an iOS app. Here you use this technology with Nexmo In-App Messaging to facilitate your own deep learning for processing images.

First, we need to save the trained model.

We’ll use coremltools, which will convert the model into a format that our Stitch app can use.


Note, the Core ML package is not pre-installed on Colab, so we need to install it using pip

From above, you can see that the package was installed in our notebook.

Next, we’ll convert the saved model into Core ML.

Since we have used Keras to train our model, It’s really easy to convert to Core ML. However, this varies based on how you built your model. Core ML tools have other functions to use for other machine learning packages including Tensorflow and Scikit Learn. See the coremltools repo for more info.

The output above shows all the layers inside the model. These directly correlate to how we created the model in this cell.

Take a look at the parameters for the convert function.
Here, we’ll set the input to be an image for both the input_names and image_input_names parameters. This will help the Core ML model know what type of input it is expecting, which is an image.

Then, we scale the images down in the image_scale parameter to a number between 0 and 1.

Next, we set the class_labels parameter to class_names constant that we created previously.

When we use this model in Xcode, the result will be a String, corresponding to the predicted label of the image.

Now, we can have a look at the Core ML model.

You can see that our input is a 32×32 pixel image, and our output is a String, called classLabel

Next, we save the mlmodel locally using a Google Colab package to download the file to our machine.

Incorporating the Model into our Stitch App

Once our model is saved, we can now import it into our app. To do this, just drag the model that was just saved into Xcode.

Make sure the model is included in the target by verifying that Target Membership is selected.


Next, we’ll write the code in our iOS application that will use this model.

In our Stitch Demo Application, users are able to upload a photo into an existing Conversation.

Nexmo’s In-App Messaging enables users, as members of a conversation, to trigger not just TextEvents but ImageEvents by uploading a photo into an existing conversation. For this sample, we’ll try to predict the contents of the photo that a user uploaded.

You integrate the functionality for observing ImageEvents for Core ML directly into your ViewController. An example of how this can be done can be found on the source code for this sample.

In our ViewController, we will instantiate the model.

Now, inside of the cellForRowAtPath method, we’ll check if the event is an ImageEvent, and if so, then display the photo from the ImageEvent.
Then, we take the image, convert it to a PixelBuffer, at a size of 32×32 pixels, then feed it into the model.


The reason why we have to resample the image is because the model is trained on images of 32×32 pixels, so if we don’t resize the images, the model won’t be able to give a prediction (We’ll see an error in Xcode saying that the image size is incorrect).

The model will then return a classLabel. This will be the name of the image that the model predicted, which could be one of the following labels: “airplane”, “automobile”, “bird”, “cat”, ” deer”, “dog”, “frog”, “horse”, “ship” or “truck”

Conclusion

After looking at our predictions, we can tell that the model will only be able to recognize only 10 labels.
The full notebook is available on GitHub.

This is good for a demo, but not for a production application. In a future post, we’ll look at building an image recognition model with more data. We’ll look into the popular ImageNet database, which contains 14,197,122 labeled images.

It’s a 150gb download, so we’ll look at how to download, train, and integrate it into our Stitch demo app.

Leave a Reply

Your email address will not be published.