Modern day Artificial Intelligence?

Artificial intelligence has been one of the biggest buzz words since the great deep learning revolution of the last decade. However, with all these tech sites and businesses starting to brand everything as AI this or Machine Learning that, have we ever asked ourselves what exactly is AI? To most of the population AI started off as that cool robot ‘DeepBlue’ which beat the infamous Gary Kasparov at the game Chess. However, it really gained attention after the Go AI, ‘AlphaGo’ that beat the world champion Lee Sedol. But what about all the other applications of AI? Often going unnoticed today are the masses of practical solutions AI can deliver to real-world issues ranging from predicting eye diseases 4 years ahead of time  to predicting forest fires . With the fog of AI around us today, it is important for us to try to understand and harness such technology and use it for the better.

What problems can artificial intelligence solve?

Before understanding how to address problems with AI, one must first understand what problems can be addressed with AI. Firstly, it is useful to think of this as AI being a function taking in features of a problem and producing an answer based on these features. For example, if we were predicting a student’s grade on a test, an input feature could be ‘hours studied’ or ‘average grade’. Most problems in AI can be split up into two categories, Regression and Classification. Regression is the ‘task of approximating a mapping function (f) from input variables (X) to a continuous output variable (y)’ . A continuous output is a real quantity i.e. kilograms. An example of regression would be predicting the weight of an adult after 10yrs as we are mapping the output as a continuous quantity (like reading it from a scale of values). A Classification problem is ‘the task of predicting a discrete class label’ . A discrete class label is distinct and separate data. An example of a classification problem would be taking an image and predicting whether the image is of a cat or a dog. In this example, there are two clear groups, cat and dog. The form of the output would be as a % of the likelihood for each class, where this % sums up to 100%. Using these two problem types we can start to discuss possible solutions for each.

What is Artificial Intelligence?

There are many different types of AI, and most types of AI can be categorised into one of four categories: Supervised, Unsupervised, Semi-Supervised and Reinforcement Learning. Supervised learning is the act of learning a function based on input and output pairs. The computer, in effect, figures out the rules behind going from $a \Rightarrow b$. With unsupervised learning, the computer identifies the patterns in unlabelled data sets. This method of learning is very unpredictable as the computer is only told to separate or summarise data but not how to. An example could be an algorithm to cluster pictures of cats and dogs. However, the algorithm may use the shape of the animal’s ears to determine whether the image is of a cat and dog. When a picture of a dog with short ears is shown, the algorithm may classify this animal as a cat. Semi-Supervised learning is between supervised and unsupervised learning, unsurprisingly. This type of learning is typically used when handling large data sets. By training a model (a type of AI structure) to learn via Supervised learning we have a weak AI. By using this AI and training it in an unsupervised way, i.e. the data becomes labelled, the model becomes more accurate and can generate ‘pseudo labels’ for each data point (input feature, i.e. picture of a dog). Using the pseudo labels and original labels together the AI has a larger training set and becomes much more accurate. Reinforcement learning is used for interaction with environments. An example could be an animatronic being used in the wilderness to spy on animal life. The AI in this case is called an ‘agent’. The agent takes actions in the environment, ‘exploration’ and makes use of ‘exploitation’, where it gets rewarded or punished for certain actions in certain states. The agent then uses past memories to form decisions to whether they should act on something or not. This AI is commonly used in games but has been recently introduced onto AI car prototypes. I will go on to explain each type of AI in greater depth as these ideas are hard to grasp through just words.

Supervised Learning | Linear Regression

Linear regression is used to show the relationship between input and output values. An example of linear regression is using age to predict height.

The goal of linear regression is to find a ‘regression line’ between these two features. The regression line is a line of correlation (shown in red). In order to test how accurate the AI is, the least squares method is used. Functions that are used to test the accuracy of AI models are known as ‘loss functions’.

The least squares method involves making predictions based on the graph for every data point. The black lines symbolise the error between predicted and actual values.

1. Figure out what the error is:  Error = Real value – predicted value

2. Square this error to avoid negative and positive differences: Error = Error2

3. Sum all the errors to get an error value for the AI: ∑ Errors

Using this loss function, we can estimate the regression function as passing through the mean values of the Xs and Ys. (16, 1.74). By changing the gradient of this line we can tell whether the regression line approaches a better fit. The means by which we change the gradient efficiently could involve more complicated algorithms, e.g. stochastic gradient descent. There are variations to linear regression, including linear regression with multiple input features (multilinear regression) i.e. instead of just measuring age, sex could also be measured. As the dimensions of data increases, processing time increases significantly.

Unsupervised Learning | Clustering

K – Means Clustering is a simple use of unsupervised learning. This type of unsupervised learning is used to group data, that is unlabelled. Many of us from a young age perform this clustering algorithm. When toddlers explore the world, they may encounter plants. What they recognise is that dogs have 4 legs. This separates dogs from insects (for example). At such a young age, toddlers do not yet know what the name ‘dog’ refers to. However, they can make this distinction between what they see. This is analogous to K-Means Clustering. This example concerns species of plants.

As shown above, two distinct groups are shown in red and blue. Based on the characteristics of the plant the AI is able to differentiate between plant species.

1. The algorithm first needs to be told how many groups are going to be formed from the input data. This variable is known a K, hence K-means clustering. For this example, K = 2
2. Select K random points out of the data points. Assign these data points as ‘cluster points’
3. For each data point, work out the distance from itself to each cluster point. Assign that data point as part of the respective cluster to which it is closest.
4. After all data points are assigned a cluster, take the mean position for each data point and this point becomes the centre of the cluster.
5. Repeat steps 3 and 4 using the centres of the clusters as the new ‘cluster points’ until there is no change when assigning clusters to points.

Since there is no labelled data for the AI to train on and the AI doesn’t change the number of clusters it can create, the loss function for this AI is the ‘variance of each cluster’. The variance of each cluster is the longest distance between any two data points of that cluster. The lower the summed variation, the better the AI performs. The AI cycles through different random starting clusters until the best cluster is found in a certain period of time. Since clusters group data, they are typically used for filtering spam emails. However, more recently they have been used to summarise data. As a result, they are used in more complex AIs that feed input through an unsupervised AI to condense the input. Since the input is condensed there are fewer features to analyse and the training process is quicker.

Semi-Supervised Learning | Pseudo Labelling

Semi-supervised learning allows for a better classification data set to train the AI on. When training AI, one of the biggest problems is finding labelled data.

Labelled data:

• Takes a lot of human effort to create
• Is specialist / hard to find
• Is expensive

Unlabelled data:

• Easy to find
• Plentiful
• Cheaper

Through semi-supervised learning, we can train a model using a small portion of labelled data. Using this model, we can label the unlabelled data with ‘pseudo labels’. As a result, we have created a larger set of data points. The larger the set of data points, the more accurate the AI. Below is an example of a classifier taking in input features to classify whether something is a banana. The red data represents pseudo labels that can be derived from input features. As the red data is produced it is used to train the AI alongside the original (black) labelled data.

Reinforcement Learning | Q Learning

Q learning is an example of reinforcement learning. It is used whenever there is a need for an ‘agent’ to interact with an environment. An agent typically represents a person. Reinforcement learning tries to model human behaviour. An environment has a given number of states. In Q – Learning, an agent interacts with the environment in a given state and the agent is either rewarded or punished. The agent has now ‘explored’ – they have gained knowledge. The agent can also ‘exploit’ knowledge that has been gained for their benefit. An example of Q – Learning is an AI for a game where you have to reach some jewels. If the AI walks into a bomb, the AI loses points.

At each move assume the AI can only move up, down, left or right. Soon the AI learns that when the state of the environment is a bomb, it will suffer. With Q – Learning, states of environments are memorised and processed, and one simple way of achieving this is using memory. The AI can have a location in memory for each tile type. The memory may look like:

Tiles {

Bomb: [-25,-30,-15,-10],

Empty: [0, 0, 0, 0, 0],

Jewel: [100,80,90,80],

}

When this memory is transferred onto an environment, the environment looks like this. The AIs choices are highlighted. Each tile has its memories averaged.

In this example, the AI will move to the left to the jewel, however from there, it moves to an empty square. The AI on the next move would only see two empty squares and a bomb square. It is obvious from our view that the AI should backtrack to the jewel on the right. However, how is the AI to know that? By using more sophisticated techniques found in Q – Learning this AI can be improved. By averaging out the values of squares, the AI can prefer squares surrounding jewels. By increasing memory to more than one move, the AI could learn that by going through one bomb tile, it may be rewarded by many jewel tiles. Through the use of Q – Learning, decision trees can be formed from certain behavioral patterns. An example of this is:

If tile is jewel:

Move to tile

Else if tile is empty:

Move to tile

Else if tile is neighbours_with_jewel:

Move to tile

Else:

Move back

This was a simple example. However with longer sight, memory and processing power, this AI could develop into much more. This type of AI is typical in games due to the state-filled design of most games. Nonetheless, reinforcement learning is finding itself useful in physics-based projects involving simulating humans overcoming certain deformities in order to walk as well .

Neural Networks

Neural networks also play a huge role in artificial intelligence. Neural networks themselves lie closer to machine learning and deep learning. The fundamental theorem behind a neural network is the same as any other AI. Neural Network use functions to determine results. A simple neural network would be a node. Two nodes can plot an input y = mx+ c on a grid

The nodes gather input data, multiply it, and add a constant. The error of each node can be worked out as (real value – predicted value)2 = error. As a result, the multiplication and addition constants can be adjusted such that the error is lowered, similar to Linear Regression.

The problem is that certain problems cannot be solved with just two nodes. Take for example an XOR gate. The XOR function/mapping cannot be separated with a simple line due to its nature.

It requires a different type of math to solve. By using what is known as a sigmoid function we can solve this problem. The math is irrelevant but by introducing a more complex function we achieve the result. By combining more complicated nodes we can come up with even more complicated functions and this in high-level terms is how a neural network works. Certain layers contain certain functions and these functions add up to complex functions. The final layer of functions returns probabilities or numbers depending on whether the problem was to do with regression or classification.

Even with such technology, there are still problems. The sheer amount of processing for neural networks is crazy, so most of their training is done on the cloud. The main problem however is that they are not intuitive/replicable like the other algorithms, and with that comes a dependency on the computer at the user’s end. As a result, there is an uncertainty as to how the computer came to the answer which is what worries most people with such cutting-edge technology. However, having said that, there is still much potential for neural networks given their current work in unmasking deep fakes and their superior pattern-solving capabilities.

Conclusion

In conclusion, there is still much to learn about artificial intelligence but using the tools available today, there is no limit to what can be achieved using AI, ML and deep learning. By recognising the problem and understanding different learning models, we are halfway towards a solution and the more people that can understand these relatively basic concepts the better, as the process of coding AIs becomes easier and more accessible to the world.

References