- The major difference between a neuron in your brain and a neuron in a neural network is that the latter is a function that outputs a number between 0 and 1;
- The more training data you can give a neural network the more accurate it becomes (the closer to 1);
- There are layers that recognise subcomponents of what the neural network is trying to understand;
- Weightings are assigned to the connections from the previous layers and determine the probability of the network correctly solving its problem.
If I asked you to look at the image at the top of the article and tell me what it is, you’d have no issues. It’s a trivial task but we take our ability to recognise images for granted. Stop and think for a second about how crazy it is that we can do that without any effort.
Take dogs, for example. There are millions upon millions of permutations of dogs but you can still recognise them all as a dog. You group them by their common traits and recognise them as the same thing. Even if a German Shepherd and a Beagle look very different.
Every time you see a new dog, you're seeing a different animal but categorising them is almost instant. Your brain is able to represent all the different species of dog as the same idea. At the same time, we're able to see cats as a different species. That's pretty amazing.
But what if I told you to sit down and write a program that can look any animal and categorise it – where would you start? The task goes from trivial to excruciatingly difficult.
If you’re reading this article, and you’ve been paying attention to technology over the last few years, you would have seen heaps of news about the rise of machine learning, artificial intelligence and, more recently, deep learning. Whether that be Waymo's self-driving cars, or Uber's acquisition of Otto, it seems like machine learning is the thing on everyone's mind. The problem is machine learning is inherently complex and it takes some time to understand it. Let's give it a go.
Let's explore the underlying mechanics that drive machine learning, hopefully in an easy-to-understand way. So, to start, I want to show you what a neural network actually is and help you visualise what it is. It's maths but don't worry. It isn't boring.
(Neither is this video which visualises the AI journey.)
For the purposes of this post, and given the complexity of neural networks, I'm only going to be talking about the structure of a neural network and not the learning part.
I believe it’s enough to start to understand the more powerful variants.
But before we dive into the mechanics, let’s stop for a second and think about the words 'neural' and 'network'.
Neural comes from the greek word neuron, meaning nerve, and, as some of you know, a neuron is an electrically excitable cell that receives, processes, and transmits information through electrical and chemical reactions.
So, as you might have guessed, a neural network is inspired by the brain. 'Network' is just a combination of 'net' and 'work' – so I think of a network as a group doing work together. That could be computers connected through the internet to secure a blockchain or just a bunch of people – I’m using work in a very liberal sense here.
Let’s carry on that analogy to neural networks which are a bunch of neurons linked together to achieve a specific task. Below is a 3 layer deep neural network.
The major difference between a neuron in your brain and a neuron in a neural network is that a neuron in a neural network is a function that outputs a number between 0 and 1.
The output of the neuron is called its “activation”. As you can see above, the first layer of the neural network has 3 neurons and the output layer has 2. That means the output is binary, like yes or no. Neural networks can be far more complex than this; below is a neural network that takes in a 28 x 28 pixel image with a hand drawn number between 0 and 9 and outputs what number it thinks it is.
In this case there are 784 neurons in the first layer of the neural network and the value of each neuron is dependent on how bright each pixel is in the 28 x 28 pixel image. For simplicity, the neuron would have a value of one if its pixel was completely coloured.
The last layer of the above neural network has 10 neurons that represents 0 through 9, and the activation of these neurons, a number between 0 and 1, represents how much the neural network thinks the image is the corresponding number.
As an example, imagine you feed the image below into the neural network.
The neuron in the output layer corresponding to 5 may have an activation of 0.93, which means it’s pretty certain that the number is a 5.
The layer between the first and last layer is called the hidden layer or layers, if there is more than one. For now, it’s enough to know that the activations of the previous layer determine the activations in the next layer.
A good way to remember this is to think about how your own brain works. If you’ve ever studied neuroscience, you have probably heard the term “neurons that fire together wire together”. The idea is that groups of neurons fire together when you get certain stimuli, like when you see the number 5, and the more times these neurons fire together, the easier it is for you to see that a 5 is a 5. This theory is called the Hebbian theory.
Despite what see now, it wasn’t always easy for you to determine if a number was a 5 or a 6. It took your brain time to learn and for the neurons to fire together. A neural network is no different, it needs lots of exposure to training examples so it can learn to distinguish 5 from 6 or 1 from 9.
Training data for this problem may look like the image above, with additional data to tell the computer that each image corresponds to the drawn number. We’ll come back to this in another post.
The idea is that the more training data you can give a neural network the more accurate it becomes. Or in other words, the activation of the final layer will be closer to 1 as it “learns” more.
The number 5 has a specific pattern of activations in the first layer that causes the next layer to have a specific set of activations which causes the next layer to activate the 5 neuron. In theory, the neuron with the highest activation in the last layer should be the corresponding number.
Before we get into what is actually happening, let’s think about why a layered approach like this works. Why is it that when you look at an image, you can recognise what it is? When you look at a something, you piece together various parts of it until you get an idea of what it is.
If you couldn’t do that, you wouldn’t be able to tell the difference between four legged animals. A cat and a dog both have four legs but there are other parts of them that make them very different.
So if we take that analogy back to the neural network, you would hope the hidden layer in the image above would do the same sort of thing. It would break down the image into its component parts and change the output layer’s activation depending on what components it saw. Below a deep neural network (more complex than what I’m writing about, but you get the idea).
As you can see above, the hidden layers break down the car into its component parts (edges) and then the activations in each layer cause different activations to occur in the next layer until the final layer outputs the neuron associated with “Audi A7”. Obviously it’s a little more complex than that but that’s the gist of it.
If we go back to our three layer deep neural network above that outputs from 0 to 9, we can think of the hidden layer corresponding to components of a number. As an example both 8 and 9 have loops at the top so one neuron may look for the loop and another will look at whether the image has a loop at the bottom or the bottom of a 9.
The activations in that layer would then correspond to activations in the output layer depending on whether the imagine is an 8 or a 9.
Of course recognising the top of a 8 is complex too, but that can be broken down into further subcomponents like recognising the various different edges that make up a loop.
If the neural network was another layer deep, it might use the first layer to recognise subcomponents of the top of 8 and then those activations would lead to the activation of the neuron in the third layer that corresponds to the loop at the top of 8.
I know it's a little confusing but bear with me. Below is an example of different parts of a loop, so the second layer may have activations that correspond to the images below.
So when an 8 is passed through a neural network maybe the second layer's neurons break it down into small subcomponents (like the ones above) and then those activations light up the next layer that recognises the two loops, then the last layer's activation represents 8.
It’s pretty easy to imagine how being able to detect subcomponents and patterns like this could be extended far beyond just numbers (see the deep neural network recognising the Audi above).
It goes far beyond just image recognition.
Next time you use the voice assistant, try to imagine what it is doing. It’s taking in raw audio, finding certain sounds, combining those sounds into syllables which are then combined into words and the words are combined into phrases and so on.
The question then becomes how do you design a system where one layer of activations leads to another layer of activations that ultimately lead to the right answer? Well, that involves setting weights and biases throughout the neural networks.
What that means is you assign a number (weight) to each of the connections from the previous layer in the neural network, then you take all of those activations in the first layer and compute the weighted sum of that layer.
Then you pass that weighted sum through a function that turns it into a value somewhere between 0 and 1 (the activation). The sigmoid function does this, albeit it’s a little outdated.
Don’t worry if you don’t understand the graph above, all it is saying is that very negative inputs end up close to zero and very positive inputs end up close to one. The end result is a measure of how positive the relevant weighted sum was before it was passed through the sigmoid function.
Then you have to decide when the next layer is activated. That’s where a bias comes in. All a bias is is a number that you add (or take away) from the weighted sum prior to passing it through the sigmoid function.
To break that down the W is the weight and the x is the activation and b is the bias – that’s for one neuron – every other neuron in the layer is going to be connected to every neuron in the first layer. And each one of those connections has its own weight and bias associated to it.
Don’t worry if you don’t understand the math. The most important thing is to think of each neuron as a dial that can be tweaked and turned to make the neural network behave in different ways. The weights and biases are the ways that you change the dial.
So when people talk about neural networks learning, what they are saying is that they are getting the computer to find a valid setting for each of the connections between neurons so it will actually be able to solve the problem at hand. Or in other words, they’re changing the weights and biases to get the right output.
Don't worry – if you build a neural network you won't be setting all of these weights and biases by hand. But we’ll get to that in the next post. For now, check out MarI/O.