Neural network (NN) models are the brains that comprise AI algorithms. These models are inspired by how a human brain processes information to identify things. They are comprised of two types of elements, nodes (neurons) and their connective pathways. Different models are used to solve specific problems like image recognition or processing sequential patterns like speech. There is a myriad of different models available and new ones are constantly being created. While the growing number of available models is staggering, there are several standard models which are used as a starting point when designing more complex models. It is also important to understand what types of nodes are available and how to train the neural network to process accurately.
Neurons, the building blocks
There are several types of nodes that comprise NNs and each has a specific function. The most important two are input and output nodes. Your data element to process is divided into its individual variables which represent what is provided to the input nodes. The output nodes represent the possible solution types available. The goal of any NN is to take the variables from the input nodes and calculate which output node is most-applicable. The other types of nodes provide unique types of complexity where more articulate problem-solving is required. While it is most common for models to start with a form of input node and end with a form of output node, there are models that are composed purely of intermediary nodes, such as the Markov Chain.
Training a New Neural Network
A freshly-built neural network is instantiated with randomized weights and biases, so it is initially impossible for it to accurately compute solutions. These parameters must be tuned to produce accurate results for the set of data intended to be processed. While the builder of the NN could manually adjust these parameters, it is a painstaking and tedious process that is better left automated. In order to improve the NN’s ability to correctly pick the appropriate output node for the input data, it must be “trained” with labeled data (datasets that have already been analyzed). In the case of the handwritten number recognition example, the average output across all the output nodes compared to the expected output is considered the “cost-value”. By using the average cost-value from several tests, the weights and biases can be subtly adjusted to better calculate what the input data represents.
The NN training process incorporates a process called “back-propagation” where the training adjustments are passed back one layer at a time and the weights and biases are adjusted according to the learning rate, which throttles the changes to decrease over-adjustments. The adjustments are based on the calculated gradient descent, which represents the slope from the current error value to the local minimum. The goal of the training process is to reduce the output’s error percentage.
Which Neural Network Model Should You Use?
Each model is well-suited to a specific type of problem-solving, but they can be mixed to create new hybrid models or combined with other models to comprise a composite solution. The simplest and oldest NN model is the “Perceptron” (P), which consists of several input nodes and a single output node. This model is best-suited to binary classification (is this a match?). While this model has its uses, its pattern is best leveraged as part of a more complex and capable hybrid NN model, such as the “Feed Forward” (FF) neural network, which consists of two layers of Perceptrons (One layer of input nodes, one layer of hidden, and one layer of outputs) and can resolve to several output nodes. A popular example use of this model would be a handwritten number analyzer that interprets an image of a handwritten number and computes which digit was drawn.
In the handwritten digit analyzer example using the FF model, each input node represents a pixel with a brightness value of the scanned image and each output node represents a number from 0 to 9. Each output node will have a percentage value of how certain it is of the value. Between the input and output node layers is the hidden node layer, which takes the weighted values from the input nodes, mutates with a bias (an offset multiplier for the incoming values from its input connections), and filters with an activation function. The connections between the nodes have an associated weight of how important its value is in the calculation. In the case of a number recognition algorithm, the pixels along the edge will not be weighted as highly as those closer to the center. Each output node also has a bias, much like the hidden nodes.
The P, FF, and DFF NN models are considered classification models because they take a set of parameters that represent an unknown and seek to identify it through their linearly-layered filtration processes. There are other models used for other common problems like predictive analysis where the input represents the current state of a system and the NN computes the expected outcome based on environmental variables and historical patterns. The basic type of this model is the “Recurrent Neural Network” (RNN).
There is one main characteristic that sets the RNN apart from the classification models. The algorithm loops over the hidden layers to analyze the input as opposed to linearly proceeding through the process. This is beneficial when processing sequential information, like audio or text. This is achieved by replacing the hidden I/O nodes with special hidden recurrent nodes which retain states from previous iterations and use it to calculate the current state. There are two variations of the RNN, the “Long/Short Term Memory” (LSTM) and “Gated Recurrent Unit” (GRU) models. These vary only in the type of recurrent nodes utilized. The LSTM implements the “Memory” variant and the GRU uses the “Gated Memory” node. Each model variation is useful for different applications. It is important to determine the problem you wish to solve and which NN or combination of NNs is appropriate to achieve the intended results.
We have assessed a handful of the many neural networks available. With proper training, they are capable of human-like problem-solving through imitating patterns found in the human brain. With the use of nodes, weights and biases and auto-calibration, the network can learn to achieve accurate results. There is no perfect model that can handle every problem available, so it is important to succinctly classify the problem steps that comprise your solution and utilize the model, hybrid model, or combination of models that best fits your process.
Liquid Analytics works with clients to deliver AI decisions that provide high ROI for business initiatives. If you’re looking for more information on building the right model for an AI solution, contact us to get started today.