Behind Neural Networks – AI for Dummies (2/4)
AI for Dummies (2/4)
Welcome to part two of our 4-part series on the latest trends and applications of Computer Vision. In case you missed the previous episode, click here for Part 1!
Last week, we introduced Computer Vision, the art of making computers understand images, and briefly went over how we used to do it. Today, we’ll focus on the technology that changed everything, deep learning, one approach of machine learning. So, let’s get started!
The idea of machine learning is to map some kind of input onto an output. In other words: we ask a question — input — and the algorithm provides us with an answer to our question — output. Sounds simple, doesn’t it? As you might have guessed, it isn’t actually that easy. When we want to build a machine learning algorithm the first and most important step is to accurately formulate the question in order to get the desired answer.
Let’s consider the image with Barry, the sun-glass wearing dog from the previous post. We can now formulate different questions depending on the task that we want the algorithm to perform:
- “Is there a dog in this image?”
- “Is this a dog or a cat?”
- “Which of the following objects can be seen in this image: dog, cat, plane, or duck?”
- “Where and how many dogs are in this image?”
and so on…
To answer these questions, we use Artificial Neural Networks.
What is an Artificial Neural Network?
Inspired by Mother Nature, researchers have been trying to imitate the inner workings of a biological brain since the early 40’s. The resulting mathematical representation — the Artificial Neural Network — is a system of nodes, called neurons, that receive inputs and send outputs to each other. Even though biology first inspired it, the area of Artificial Neural Networks has since then diverged to become its own field of research, and is now a matter of study and engineering.
I hope you were good at LEGO® when you were younger because we’re going to use a lot of basic building bricks to demonstrate the structure of Neural Networks:
- Input Layer: a list of your input features. For a single image, the input layer consists of a single array of 3 dimensions: width and height of the image in number of pixels and the number of color channels, i.e. 1 channel for black & white and 3 or more channels for color images.
- Hidden Layer(s): the secret sauce of your network. These layers allow you to model complex data thanks to their nodes / neurons. They are called hidden, because the true values of their nodes are unknown in the training dataset (see below), where we only know the input and the output. Each neural network has at least one hidden layer, if not it is not a neural network. Networks with multiple hidden layers are called Deep Neural Networks. The most common type of hidden layer is the fully-connected layer in which each neuron is connected to all the neurons in the two adjacent layers, but not to the neurons in its own layer. Convolutional layers are another type of hidden layers that are very prominent when dealing with images, but more on those in the next blog post.
- Neurons: the processing units of the network. Each neuron weighs and sums the different inputs and passes them through an activation function. The role of the activation function is to buffer the data before it is fed to the next layer. Just like a light dimmer on your lamp, the activity of your neuron can be changed.
- Output Layer: the final layer with neurons. This is where the data comes out of your model, so the number of neurons needs to be exactly the number of outputs you want, i.e. the question you want to answer. In the case where we want to know if Barry is a dog or a cat, the number of output neurons is exactly 2: one for the probability of being a dog and the other for being a cat.
OK, now we have nearly everything we need. What’s missing? Well, the learning part of course! The ability to judge which input, representing a given feature, is more important for identifying Barry as a dog in the image, can be learned. That is the beauty of artificial neural networks: no manual feature extraction is required, such as specific shapes, colors, edges etc. The only question remaining here is: How should the algorithm learn? The answer depends on you and your data. Do you want the algorithm to learn from your precious labelled data (Supervised Learning)? Or do you want the algorithm to figure out by itself what makes your data special without any feedback from you (Unsupervised Learning)? In both cases you need to define a loss function which is then minimized during the learning process. But more on this in our next post.
Back to Barry. With supervised learning, in order to correctly identify Barry as a dog and not as a cat we need to give the machine learning algorithm examples of what a dog and a cat look like. These examples are called labelled training data.
The next thing to do is to define our loss function, which allows the algorithm to learn to distinguish a dog from a cat. Basically, it is simply the mismatch between the correct label, i.e. the ground truth, and the predicted label from the algorithm. The loss is minimal when the predicted label corresponds to the ground truth label.
Supervised learning is the most common learning scheme used in Computer Vision due to its simplicity. But you might have already guessed the major issue with this approach: you need good annotated training examples in large numbers. If you wish to train generic models there are a number of open access databases that exist such as ImageNet or OpenImages.
For unsupervised learning the algorithm needs to figure out what the most characteristic features of a dog and a cat are by itself. Basically, we would give the algorithm the above pictures and it would group them by characteristics. This means, instead of a dog/cat classification we could end up with an animal in a sun chair and/or an animal wearing sun glasses categories. To avoid this, the cost function needs to be properly formulated according to the question you want to ask.
Long story short, deep learning algorithms are always made up of the same elemental bricks: input, hidden, and output layers as well as your computing units — the neurons. What makes your algorithm unique is the way you stack and train them according to the problem you want to solve.
That’s it for part 2 of AI for Dummies! Next week we’ll dive even further into the learning details and strategies of deep learning. We will take a look at how to minimize the loss function, how the parameters of each layer are adjusted, and convolutional layers. Stay tuned!
If you like what you read this week why not sign up to receive part 3 of AI for Dummies directly in your inbox next week !