An introduction to Neural Networks

What is Machine Learning and what is a Neural Network?

Traditional programming involves writing code that then gets executed. Everything that the program does has to be clearly written out in code. This can be used to make predictions. For example: say you want to predict what the price of a house should be. You could specify some inputs (e.g. number of bedrooms, location, etc.) and then predict a price. However, there would need to be some logic in the middle and you would need to come up with it. For example, you might come up with a simple formula that combines the number of bedrooms with the distance from the city centre. When you run the program it simply uses your logic. This is still useful: computer programs run incredibly quickly and so, if you need to make lots of predictions, it’s a great help, but it relies on your understanding of how inputs are related to an output. What if it could draw its own connections?

Machine Learning aims to build a program that can teach itself the relationships between things. The program consists of a model that makes predictions. The model is then given some inputs and makes a prediction. It then compares that prediction to the real output and, if it is wrong, updates itself to make it more likely to predict correctly in the future. This way you do not need to explicitly tell the model how to interpret the data and how to draw connections, it does it itself.

Neural networks are just one machine learning method. Other methods, e.g. decision trees, not only exist but are indeed better for certain applications. However, neural networks have attracted significant attention because they seem to be the best solution to a large portion of the problems that machine learning can solve.

Neural Networks

In the following simple example we will follow all of the steps necessary to train a network:

  • Define the model
  • Make a prediction
  • Measure the accuracy of the prediction
  • Assess how each parameter affects the prediction
  • Update the parameters to make better predictions
  • The relationship we are going to learn is a very simple linear relationship. However, initially we do not know the relationship, we just have some data:

    Raw Data
    Define the model

    A neural network is essentially a set of numbers (called parameters) and a sequence of operations for how those numbers should interact with the input in order to produce the output. These operations are generally very simple but, when combined at scale, can have very powerful predictive properties.

    The neural network we will use is the simplest possible: it consists of just one parameter. The sequence of operations is just going to be to multiply the input by our parameter value to give the output.

    To start with we need to initialise the value of our parameter. We could choose pretty much anything but let’s choose 1.

    Define the Model
    Make a prediction

    This step is usually called the ‘forward pass’.

    We then pass our first bit of data to the model, in this case x=1 and y=3.

    Our model takes the x value, multiplies it by the parameter value and produces a predicted value. In this case: 1x1 = 1

    Make a Prediction
    Measure the accuracy

    We then need an objective function (sometimes called a loss function) to tell how closely our prediction matches the target. In this case our loss function would simply calculate the squared difference between the prediction and the target value. This result is called the ‘loss’. We want our loss to be as small as possible because, when the loss is zero, it means our prediction is the same as our target.

    Measure the Accuracy
    Assess how each parameter affects the prediction

    This step is usually called the ‘backward pass’.

    Here, we identify how the loss is affected by the parameter value. If we increase the parameter value slightly, does it increase the loss or decrease it? We therefore need to differentiate the Loss with respect to the parameter. Our overall function and its derivative is as follows:

    Backward Pass

    In this case we just have one parameter so we only need to do this once. However, for a larger model you would need to calculate all of the partial derivatives and update every single one.

    In this case the derivative of the loss with respect to the parameter is -4. Let’s take a step back and understand what that means. If we increase the value of the parameter very slightly then we would expect that to cause the loss to decrease. If we decrease the value of the parameter then the loss would increase.

    Update the parameters

    This process is usually called ‘gradient descent’.

    We now need to update the parameter so that it will make a better prediction next time. The gradient tells us which direction we should move the parameter in, but how much should we change it? This is determined by a constant called the ‘learning rate’ which is decided by the user. The size of this constant greatly affects the learning process and is something to tweak in order to improve the efficiency with which the model learns.

    We will use a learning rate of 0.001 in this example.

    Because we want the loss to decrease we therefore want to subtract the gradient from the parameter value. We will therefore use the following equation to update the parameter:

    Update Parameters

    In this case 1 - 0.001x(-4) = 1.004. Our new value for our parameter is 1.004. Notice that if we had chosen a larger learning rate then it would have made a larger jump. However, then we run the risk of overshooting.

    Repeat

    Our parameter has indeed got closer to the value we are hoping for (3) but it’s still not particularly close. We’ll need to repeat this process multiple times before we are able to make accurate predictions. See the table and graph below:

    Results Table
    Results Chart
    Linear Network

    This is in essence what a neural network is doing. However, for our model to have more powerful predictive capacity we need more parameters. This is where some knowledge of linear algebra and matrix multiplication is useful. A linear network (still a relatively simple kind of network) consists of a number of layers. Each layer consists of two sets of parameters (weights and biases) and does the following:

    Linear Layer

    Where X is an Nx1 dimensional input vector, W is an NxM dimensional matrix and B is an Mx1 dimensional vector

    The incoming values are multiplied (matrix multiplication) by a set of weights and then added to a set of biases. This produces a new set of values which can be passed to the next layer.

    Activation function

    Two successive matrix multiplications can always be represented by one multiplication. This is because matrix multiplications are linear transformations. As a result, adding more layers to our model doesn’t guarantee that our model becomes more capable. We therefore need to introduce a ‘non-linearity’ or activation layer that breaks the linear relationship and allows the model to learn more complex functions. This function can be very simple, for example the ReLU function which keeps all positive values the same but replaces all negative values with zero. This is enough to break the linearity and improve the model’s performance.

    Backpropagation in linear networks

    When we have multiple layers in our network working out how one parameter affects the loss is more difficult because most parameters do not affect the loss directly. However, we can use the chain rule to work backwards and multiply each of the derivatives together.

    The Backward Pass of a Linear Layer
    Summary

    This was a very high level introduction to Neural Networks and Machine Learning. Hopefully it has given you some idea about the essence of what a neural network is doing. There is lots more to learn and I’ve tried to include key terms so that you can go away and read more about them. If you're interested in learning more about Machine Learning and aren't sure where to get started head over to this other post.