Building a Dog Breed Detector Using Machine Learning

Published December 04, 2018 by Tony Hung

Here, at Nexmo, we use Facebook Workplace as one of our many channels for communication. If you haven’t used it or heard about it, it’s just like Facebook, but for companies. All of us here at Nexmo have an account, and we are able to view and join different groups throughout the organization.

A few months ago, one of our coworkers created a group for showing our pets, and it was a great idea, and a lot of members on the team post photos of their pets. I check the group almost everyday, and it’s a good way to enjoy the finer things in life (puppies!).

So after looking at everyone’s photo of their dog, cat, and even bunnies, some people asked, “What breed is that?”. Once I saw that, I had an idea, to build a machine learning algorithm to figure out what dog breed was in the photo.

In this post, we’re going to learn how to build a dog breed detector using Keras, which is a very popular framework for building machine learning models.


This post assumed you know some Python as well as having a very basic understanding of machine learning. You should know what Keras is and how to train a basic machine learning model.

Where do I start?

In order to tackle many machine learning problems, you need data, and lots of it. Specifically, we need photos of a lot of dogs, and what kind of breeds there are. For this project, we are going to use the dataset from the Dog Breed Identification Challenge on Kaggle. This dataset contains over 10,000 images of dogs, categorized by breed.

Building the Model

First, let’s start with building the model. I’ll be using Google Colab to build my Jupyter Notebook, in Python. A Jupyter Notebook is a open sourced web app that lets you write code, as well as text and images. It’s a great way to get started. Google Colab is a free service that will host your Jupyter Notebooks.

Note: If you want to see how the model is built, you can view my notebook here.

Before building the model, we need to get the data, which is hosted on Kaggle. To load the data, we need to use a package to download the data to our notebook, using the Kaggle API.
This will allow us to download the dataset for the Dog Breed Competition. Before we can download the dataset, we need to create an account on Kaggle, and get your Kaggle API key and secret.

Go to “Create New API Token”, and save the file to your machine.
To download the data, we’ll run this cell.

If you don’t understand every line in this code and any other section of code, don’t worry. You will be able to copy and paste the source to run everything yourself, without having to worry about the details.

When you run this cell, it will prompt you to select a file. Find the JSON file that was downloaded from Kaggle, and upload to the cell. You will then be able to run the Kaggle API and download the dataset into the notebook. Once the files are downloaded, we’ll unzip the files using !unzip. The ! before the command allows you to run a command line action inside Google Colab. The !unzip command just unzip’s each file.

The downloaded files from Kaggle contain the following:
* Training Images, located the \train folder
* Test Images, located in the \test folder
* A CSV file called labels.csv, containing the breed name and the filename, which points to the image in the training folder.

Now, we can load our data into a Dataframe, using Pandas.
A DataFrame, is a simple data structure that contains rows and columns, kind of like a CSV.
Pandas is a Python package that provides high-performance, easy-to-use data structures and data analysis tools. It’s used in a lot of machine learning applications. If you do any machine learning applications, one of the first packages you’ll be using is Pandas.
If you want to learn more about Pandas, check out their own tutorial, 10 minutes tutorial to pandas.

Using Pandas, we can import the csv from the Kaggle dataset into a Pandas Dataframe.


This takes the csv, loads it into a DataFrame pd.read_csv('labels.csv'), then sorts the DataFrame by breed alphabetically. Then, we print out the first 10 rows using df.head()

Next, we need to write a function that will resize all the images to the size we need, which is 299x299px. It will be clear why we need to resize the image later.


The read_img() function will load the image at the size we need (299x299px) and convert it to a multi-dimensional numpy array. Numpy is another Python package that is used in machine learning very frequently. It makes it easier to work with these types of arrays in Python.

Next, we need to convert the breed names (basenji, scottish_deerhound) into vectors (1 dimensional array of numbers), since our machine learning model can only deal with numbers. To do this, we’ll use Scikit Learn’s LabelEncoder. A LabelEncoder takes all the breed names and converts the name of the breed into a integer. Each number will be different for each breed (0 for basenji, 1 for scottish_deerhound etc). Scikit-Learn is another another open source package that makes getting into machine learning easier.

Next, we’ll split the dataset in two, one for training and the other for testing. When we train our model, we’ll use the data from the training set to train the model, then, when we need to see how well it did, we’ll test the model on the test set.


Finally, we’ll take all the images in the training set, and resize them using the read_img function we created earlier. Then we need to process each image to put it in the correct format that our model is expecting .


In this function, we loop through every item in our DataFrame (for i, img_id in tqdm(enumerate(df['id'])): and call the read_image function, which takes the img_id, which is ID of the image, which correlates to the file name of the image from the \train folder, and resize to 299x299px. We then call the xception.preprocess_input function.

Before we get to what this function does, we’ll need to understand what xception is.

Training models from scratch requires a lot more images than the amount we have (10k), as well as a lot of computing time and resources. In order to accelerate this process, we can use a technique called Transfer Learning. Than means, we can use a model than was pre trained, on another dataset, such as Imagenet dataset. Xception is one of those pretrained models. We can count on the pretrained model with extracting features from the image. We will then just train model for our specific use case: determining the breed.

I’ve experimented with a few models, and found that Xception gives the best results for the breed detection use case, with the images dataset we’re using.

For other datasets, there may be other models that are more suited to your needs for better results. So make sure to make a few tests before deciding on which model to use. To learn more about this and other pretrained models, check out this post. This post also goes over what ImageNet is and how it related to this pretrained models.

OK, now let’s get back to what the xception.preprocess_input() function does. This takes the image, which is now a numpy array, and converts it into a format that the Xception model is expecting, in which all the values in the array are between -1 and 1, which is known as normalization.

Now, we can build our model.

Since we are using Xception as our base model, our custom model is very simple. For our own model, we’ll load the output from Xception, which is all the layers that have already been trained on images from Imagenet, then build a Sequential model.

From the Keras blog:
“The Sequential model is a linear stack of layers.” – Getting started with the Keras Sequential model

All this means is that we can stack other layers on top of the Xception model. This will allow our model to train on our dog images.

Here’s our model.

First we take our base_model, which is Xception, then, Freeze the layers. This means that we won’t do any training on those layers, since they have already been trained.
Next, we’ll take the output from the base_model and add the following layers:
* BatchNormalization – applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.
* GlobalAveragePooling2D layer – reduces the number of parameters to learn.
* Dropout – randomly turns off inputs to prevent overfitting.
* Dense layer – which connects every neuron in the network.
* followed by another Dropout layer.
* Finally, we create another Dense layer, and set it to the number of breeds we are training for.

Selecting these layers are based on trial and error. There is no known way to determine a good network structure when building your model. When building your own models, Stack Overflow is your best friend.

Training The Model

Now, we’ll start building out actually training the model. The way I am training the model is very basic. You’ll see this type of code when going through other models in Keras.

We first add some callback functions, which are functions that we run after each round of training, also known as a epoch.
We have 2 callbacks:
* early_stopping with the patience parameter of 5: This will stop the training if the model does not improve after 5 epochs.
* model_checkpoint: This saves the model to a file for later use.

Next we set the optimizer to RMSprop. An optimizer is how the model ‘learns’. For each epoch, the model calculates the loss function, which is how bad the model did, as compared the test set. The goal is to make this loss as low as possible, which is called Gradient Descent.
Keras supports many optimizers, and in my experiments, RMSProp, which performs Gradient Descent, seemed to work best.

Next we’ll build the model using model.compile function, which accepts the optimizer, what loss function we want to calculate (sparse_categorical_crossentropy), and setting the metrics parameter to accuracy, which will tell us how accurate the model is after each epoch.

After that, we’ll do our training by calling model.fit_generator(). This function’s parameters are: ImageDataGenerator‘s for both our training set and test set, how many steps we will run, number of epochs, what to validate on, and the number of steps to validate. We’ll train this model for 10 epoch’s for now, just to see how we did.

So, we have a model that is 99% accurate when predicting 12 breeds!

Now, we can test our model on some images of dogs and see if we are able to get the correct breed from that image. We’ll write a function that takes an image from the internet, format it to what the model expects (299x299px image) and make the prediction using model.predict(). This function takes in a image, as a numpy array, and returns the output as a list of probabilities for each breed. We use np.argmax() to find the index of the highest probability from the output of model.predict(). To return the name of the breed, we use the labels.csv that we loaded from the Kaggle dataset which contains the 12 breed names. We’ll then sort the list alphabetically and return the breed name.

Now let’s just test this function, just to be sure it’s working. We’ll download a photo of a Scottish Deerhound, using wget, which is command line utility to download files, and see how the model does.

Nice! The model predicts that the dog in the photo is a Scottish Deerhound, which it is!


From this post, we’ve learned how to build our own machine learning model using Keras, train our model by using Transfer Learning and learned how to make predictions using our model.

In a future post, we’ll go over how to deploy this model to a server as a simple API for others to use. Then, we’ll go about building a Workplace Bot that allows anyone in our Workplace` group to ask what breed of dog is in a photo from a Workplace post.

Leave a Reply

Your email address will not be published.

Get the latest posts from Nexmo’s next-generation communications blog delivered to your inbox.

By signing up to our communications blog, you accept our privacy policy , which sets out how we use your data and the rights you have in respect of your data. You can opt out of receiving our updates by clicking the unsubscribe link in the email or by emailing us at [email protected].