Perceptron
Last updated
Last updated
In the past decade as the technology had been advanced further, many algorithms that were only available as theories became able to be implemented using advanced computers. Among them is a neural network that acts and behaves like our brain. It can perceive inputs, compute it in an unseeable manner and output (or send to another cell) a result that we see.
These days it is easy to hear or see in news about Artificial Intelligence, Artificial Neural Network(ANN), Convolutional Neural Network(CNN), Recurrent Neural Network(RNN), Generative Adversarial Network(GAN) and so on whose base structure is made up of smaller components of it called 'Perceptron'.
Let's first look at and compare images of a neuron cell and a perceptron below first.
A neuron cell consists of three main parts: One for receiving input, one for processing it through the other end and one for transmitting it to another. Same as this, a perceptron has the same structure. One for input, one for processing input to some arbitrary value, and one for output.
The processing part is called 'Activation' that it activates and computes the input in a particular way and transmits to another.
A perceptron can take in two or more inputs and outputs some numerical value and based on this value, weight vectors are adjusted appropriately. Depending on the number of possible distinct output values, it acts as a binary or multi-class classifier.
The following is how it computes the label of each training point.
When wx + b = 0
for a given data point, it means the data point is on the decision boundary (a line separating data) which we will deal with later in the post.
It is easy to see that the first and second step is just a dot product between a input vector and a weight vector.
Each weight vector is initialized with 0 values but there have been and are many on-going researches about initializing them with non-zero values to achieve faster convergence.
As explained above, this behaves as a linear classifier and there are two cases when working with a perceptron. One case is when the data points is linearly separable and when it is not.
I will not deal with the second case because for this case, we have to use gradient descent but using such for a perceptron can be inefficient and since it is just a building block for a much bigger and better model.
Let's first look at how to implement a binary perceptron.
Next we declare a perceptron and generate random points for training and testing.
The points will lie on the graph like this.
If we graph the points with decision boundary, it will look like the following.
Notice that when using a perceptron to classify data points, there are infinitely many possible solutions. For example, for the above data points, we could even use horizontal or vertical lines through the origin.
Now we check that the prediction of our data matches true labels.
Also let's try using other data values.
It works great! One problem is that all the decision boundaries go through the origin since with the lack of bias term.
Next we modify the class to add the bias term.
As seen, it works nicely and just to make sure, let's predict the X values and check pred values are same as y labels.
In the following, I will show how to implement Multi-class Perceptron.
The only difference from the binary classifier is that when binary has only one weight vector (w or w/o bias term), multi-class has one weight vector for each label.
So if we want to make a model for 3 classes, we would have three different weight vectors corresponding to each of classes. To accommodate it, we only need to change how to initialize and update.
Generated data looks like this.
If we draw a graph with distinct decision boundaries, it will be something like the following.
Now let's test the model with three points (picked by hands).
The reason that I have not generated random points is to make it easy to see and check the true label and predicted outcome. But it is fine to pick random points and test it as well and feel free to try.
This post is longer than others but it's not harder. Though a perceptron can work as a classifier, it's hard to see anyone actually using it for the purpose as it is merely a building block for a bigger model: Neural Network.
In later posts, I will talk more about this neural network with simple examples.
Thank you again all for reading and if there is any type of errors or typing errors, do let me know.
The whole process of a perceptron can be shown in three parts. 1. Each input values are multiplied with weight vectors 2. Those multiplied values are then, summed up together. 3. Compute the sign of the final value and return either or .
If you see above, I am updating a weight vector even when which means a data point is on the decision boundary. It is because we don't want to have any data points lying on the boundary to avoid being able to classify it as both and .
To introduce bias, we add the constant 1 in weight vector. So any weight vector will have . Every update in iteration, we will either add or subtract 1 from the bias term. It's fine to use other value for the bias but depending on it, speed of convergence can differ.