The ubiquity of AI begs for a good definition. But there is no simple definition—a myriad of different technologies huddle under the AI umbrella.
What’s at stake: With all the claims and commentary about AI systems, you can’t tell the insightful from the frightful without a program. That means digging into the different things hiding under the AI umbrella.
Artificial Intelligence (AI) is everywhere. From medical research labs to your car, from police stations to your vacuum cleaner, there is no escaping it. This ubiquity begs for a good definition. But there is no simple definition — a myriad of different technologies huddle under the AI umbrella.
Unfortunately, the term itself can add perceived value to a product. So AI gets stretched beyond its natural bounds, to fit any situation where it might improve profit margins. But to understand what is really happening with AI today we need a more precise definition.
Let’s start with common sense: an algorithm shows artificial intelligence if it does a task we associate with human thought. We can roughly divide such tasks into three categories: pattern recognition, adaptive control, and pattern generation. There are many different algorithms that can fit this definition, but we can subdivide them further.
Let’s start with a binary distinction. George Mason University computer science professor Missy Cummings says there are algorithms that employ if-then-else rules, and there are algorithms that use neural networks. Her technical terms are symbolic and connectionist, respectively.
Cummings emphasizes that we are not talking about systems that think or have knowledge. We are talking about systems that use mathematical algorithms — usually very large matrix arithmetic operations — to transform input data into output data. And essentially all AI is focused on one or another variety of recognizing patterns.
Our first category is of algorithms based on rules. Such algorithms apply a list of rules to a set of input data in order to reach conclusions about that data. A trivial example from vision processing might be a rule that if all the pixels in a scene have the same value, then there is no object in the scene. More complex rules might identify edges, shapes, and features.
Early machine-vision systems were usually rule-based, and many still are today. In a constrained environment such as parts inspection or motion detection, rule-based systems can be small, fast, exceedingly accurate, and transparently easy to understand. But as the range of input data gets richer and the desired output more discerning, the problem of devising the rules grows from daunting to just impossible. Imagine, for example, coming up with a list of rules to locate all the faces in a photo of a crowd, and then extract features from each face for identification.
The second category of systems under the AI umbrella is machine-learning systems. Conceptually, these are systems that can alter their rules on the basis of the data they encounter. Historically, this was the topic of adaptive control theory. But lately, the action has been mostly under the name of neural networks. Again conceptually, neural-network systems are like rule-based systems in that they take in data and decide what to do about it based on stored information. But instead of having fixed rules stored in If-Then-Else format, the rules — if we can call them that — in neural networks are implicit in a vast array of parameters stored in the network. The rules are applied not by looking them up and making decisions, but by performing large numbers of mathematical calculations — matrix arithmetic — to calculate an answer.
Neural networks provide an alternative that does not require humans to create rules. While the name suggests a mass of interconnected neuron cells, in practice neural networks are nothing of the sort. They are computer simulations made up of many small data structures interconnected in an orderly fashion. The individual data structures — the artificial neurons — take in data from all of the neurons connected to their inputs, multiply each input by a fixed parameter — a weight — and perform some kind of summing process to bring together all the products. This usually gets implemented as a series of very large matrices stored in computer memory, and manipulated with matrix arithmetic.
The importance of neural networks is that by training them — subjecting them to very large data sets and repetitive computations that derive patterns from the data — you can get the output side of the network to provide valuable information about the data on the input side. For instance, in vision-processing applications, neural networks can be trained to produce outputs identifying specific objects in the input image.
A key attribute of neural-network systems is that they learn through training — an unfortunate but handy anthropomorphism. They derive from their input data information that allows them to reflect key characteristics of the input data set. But just how that training is done varies greatly from system to system. This allows us to subdivide the category of machine-learning systems into three categories: supervised learning, reinforcement learning, and unsupervised learning.
These three categories influence what an AI system does well, its weaknesses, and how it is deployed, as well as its internal structure.
Supervised-learning neural networks were the first of the three forms to be widely discussed. This was due in part to their crushing victory over rule-based systems in a public challenge to classify a set of static images. If you want to know that a photo contains a Persian cat on a trail bike, a properly trained neural network was, and still is, your tool of choice.
Structurally, these networks are usually made up of many — from a few to hundreds — of layers of artificial neurons. Deciding just how many layers to use, how each layer functions, and how the neurons in successive layers will be interconnected has become a mix of trial and error, experience, and art. Consequently, there are many varieties of these deep-learning networks, and rich sets of tools for constructing, training, and testing them.
The networks are trained by simultaneously applying an input — say, the pixels of a cat photo — to the input side of the network, and the desired output to the output side of the network. At each neuron you then use statistical calculations to adjust the weights in that neuron to increase the probability that, given this input, the network will calculate the desired output. These calculations ripple all the way through the network from output back to input. Repeat this process for a million or two well-chosen examples — each example labeled by a human with the correct output values — and you have a trained neural network. Because each input — each image, if we are talking about image classification — must be selected and labeled by hand by a human, and then fed individually into the network during training, this is called supervised learning.
Supervised-learning networks, if properly designed and trained, can be quite accurate at pattern recognition — telling whether a pattern on which they have been trained is present in the input data. Variations on these network designs can include memory of previous inputs and outputs, allowing the networks to spot gradually emerging patterns in streams of data such as audio for speech recognition, video, or financial-market data.
The relatively simple structure of these networks once they are trained often makes it possible to significantly shrink the network — eliminating individual neurons that rarely fire or that duplicate other neurons, or removing whole layers that don’t seem to do much — with only minor impact on accuracy. Often these reduced networks can be simulated not only in big datacenters, but on a smartphone or even a small embedded microcontroller.
The great weakness of supervised learning is that such systems can only recognize patterns on which they have already been trained. Presented with a novel object they are unpredictable. In practice this implies either they must be used in a very constrained environment, or they must be exhaustively trained on every possible object they can be expected to encounter, or users must accept an occasional wildly incorrect result. But there are other machine-learning approaches that try to avoid this uncomfortable choice.
One such is reinforcement learning. Here the system does not take in an input and classify its contents. Instead, the system continuously interacts with its environment — as far as the system is concerned, just a massive pool of data — to achieve some goal, based on a set of rewards or punishments. Instead of being hand-fed examples, the system explores its environment on its own, getting feedback on how it is doing and altering its behavior accordingly.
The structure of reinforcement-learning systems is quite different from that of supervised-learning systems. Basically, the reinforcement-learning system has three parts. An estimator looks at the environment and estimates its current state. A reward function compares that state to goals defined by the developers, and produces a reward value. And third, an agent takes in the current state from the estimator, the reward value, its own memory of its previous actions, and a policy table, and chooses another action to perform upon the environment. It’s a closed loop. The machine-learning part of the system is in the fact that the agent updates its policies based on mathematical computations using the reward function and state. Thus the agent gradually learns how to act so as to maximize its reward function. This learning function may be implemented in a neural network, or it may not.
The system explores its environment, trying actions and learning policies as it goes. It requires no human supervision. But obviously it must learn in an environment that can tolerate errors. Reinforcement-learning systems have proven effective in areas as diverse as playing board games, in controlling navigation or movement of appendages in robots, in industrial process control, and in modeling financial transactions.
Recently, an idea called inverse reinforcement learning has become important too. This approach uses a similar architecture. But instead of the agent learning a policy by trial and error, the inverse reinforcement-learning system watches an already-trained agent — often a real human — interact with the environment, and tries to infer the agent’s policies. That data can then be used to help understand how the agent works, for instance in neuropsychological studies, or to generate policies for another reinforcement-learning system to use.
The third category of machine-learning AI is unsupervised learning. In these systems there are neither training sequences with labels nor human-devised reward functions. The system explores the data on its own, discovering whatever patterns are there by itself.
We should note that some systems placed in this category are not machine-learning system at all, but merely big-data analysis programs that identify data clusters. These tools have been used since well before the reemergence of AI for market segmentation, customer profile analysis, or some mathematical purposes.
But many unsupervised-learning systems do use neural networks, although quite differently from the way supervised-learning systems use them. In these systems, the network explores a massive set of data and attempts to encode it internally in a highly compact form. The network then tries to regenerate the original data from its internal representation, in effect trying to mimic the original data.
The system then compares the original data with the mimicked data, and sends feedback back into the network based on the differences. This is similar to the way the correct answers are back-propagated into a deep-learning network. The network then does statistical calculations at each node to attempt to minimize the error — to make the mimic more accurate.
Through magic — unless you are a mathematician, in which case the process is probably quite obvious — repetition of this training process leads to an internal representation of the data that reflects the patterns and clusters in the original input data set. Thus at the end of training, accurate information about the patterns and structure of the input data set is sitting right there in the network, if you can figure out how to interpret it. This makes these networks valuable for both classification problems — although they tend to be larger and slower than supervised-learning networks at this — and for things like cluster analysis. Since these networks use their internal representation to generate data that mimics the original data set, they are called generative networks.
Applications for generative networks have gone far beyond cluster analysis. Once the network has trained itself, you can get it to generate not just the original training data set, but also entirely new data based on the patterns it has learned. For instance, a network that has extracted patterns from images of cats would have no trouble generating an image of a cat that was purple and tailless, even though there were no purple or tailless cats in the training set.
One interesting variant — the generative adversarial network, or GAN — takes this idea further. It sets two networks against each other. One, the generative network, works as we have described. The other, a critic if you will, has been trained on real data to identify fake data. In use, the generative network takes in a random input and generates a set of fake data based on its pattern learning. The critic network then judges whether this data is fake or not. If it says fake, a signal back to the generative network trains it to improve its mimicry. If the critic says not fake, a signal to the critic network trains it to improve its ability to detect fakes. The two networks go on like this indefinitely, the generative network becoming a better fraud artist, and the critic network a better detective, until they have made as much progress as the original training data will allow.
Another important variant on unsupervised learning is the transformer — GPT4 and other chat engines, for example. Transformers are designed to work with streams of data, such as conversations, essays, music, or the like, transforming an input stream into an output stream, based on the patterns they have learned.
They are trained much as other generative networks, by traversing immense volumes of data and extraction patterns. In this case, the patterns they build up internally predict, given a string of symbols, what the next symbol is likely to be. Thus they learn, given an input — a question, perhaps, or a conversational statement — to generate the closest response they can based on all the data on which they have been trained. If that data includes half the internet, their conversational abilities and skills at generating plausible-sounding essays can be quite impressive indeed. Transformers have also shown quite good success at language translation.
Often a transformer network will be paired with a critic network that attempts to recognize when a proposed word doesn’t make sense, or is heading down a blind alley. The critic can cause the transformer to go back and pick a different likely next word. The tumult of surprise and unease triggered by the public debut of transformers attests to how successful this architecture can be with sufficient training.
But it’s not thinking
We have seen a wide variety of algorithms hiding beneath the label of AI. Mostly, they depend on the ability of artificial neural networks — essentially huge arrays of parameters in computer memory — to adjust their internal parameters through statistical calculations in order to converge on some desired behavior, based on patterns in the training data. But it is important to remember that what these systems do is to converge on patterns. No model building or reasoning is involved.
We can use this pattern-finding capability to detect patterns in data. We can use it to identify policies that lead to rewards in performing a series of actions. Or we can use it to generate new data that conforms to the patterns the system has learned. But at the end of the day, the action of AI is widely valuable. But it is not intelligence. It is applied arithmetic.
There are different kinds of AI structures, each with its own abilities, limits, and applications. Before you can evaluate a claim or a speculation, it is important to have some idea what these tools actually are. But none of them are replacements for humans.
Contributor, The Ojo-Yoshida Report
This article was published by the The Ojo-Yoshida Report. For more in-depth analysis, register today and get a free two-month all-access subscription.