The magic of neural networks

“Your problem, dear colleague, is that you want to understand neural networks. However, they cannot be understood, you have to accept that they are magical.” I took a good look at my guest. Was he a scientist or did he want to be a wizard? He invited me to join him on a neural network application. These were very popular at the time, in the early 90’s. Chances were we could find a scholarship. However, I wanted to understand how a system could work that ignored all the basic rules of a reliable recognition system before I started using it. I showed my visitor the door and wished him luck with his sorcery.

More than 30 years later, I must admit he was right. Neural networks cannot be understood. Their multiple successful performance is magical. All the basic elements are simple and they work in a transparent way. They are proposed and studied long before: neurons, similar to perceptrons (Rosenblatt, 1958) , the logistic transfer function (D. Cox,1958) the mean squared error criterion (Fisher, 1936). Minsky and Papert highlighted the poor performance of single perceptrons in 1969, after which their use became unpopular for nearly two decades.

Neural networks combined these elements in, from a traditional point of view, risky large non-linear systems containing many neurons in a series of layers. Thanks to the smart back-propagation rule the large set of parameters, the neuron weights could be trained, albeit slowly. Trainable systems with many free parameters may have solutions that perfectly recognize the training set, but that may not generalize, due to the problem of overtraining. The solutions found by the ‘wizards’ that advocated these systems are various sets of regularization rules to slow down the adaptation of the network to the training examples. These rules are different, or have different settings, for every implementation. Researchers play around with them to get good results for the application under study. Off-the-shelf procedures that can be used for arbitrary problems without the need of knowledgeable analyst hardly exist.

After some years the popularity of neural networks slowly decreased as other, slightly more transparent systems showed up: e.g. support vector machines, adaboost and random forests. These systems are better understood, but still suffer from the need to set a number of parameters for which the values do not directly follow from the prior knowledge available in the problem under study. In the mean time the neural network technology was further developed: many more layers with various settings depending on the data characteristics, larger training sets, including unlabeled data. The performance of these ‘deep learning’ networks was boosted by a setup proposed by Hinton in 2012. A large step In the application of such systems is the possibility to make them generative by which new objects, similar to the training set, could be created.

Recently various tools for two application areas have been made publicly available: chatbots (e.g. ChatGPT) and image generators (e.g. Dall-E), both based on AI and deep learning neural networks. The systems are very large and partially based on pretrained networks. They thereby offer a general model for natural languages and natural images.

The results are surprisingly good for many users, including researchers in this field, and even for the designers of the systems in question. All their basic elements are deliberately designed and well understood in part. However, the complicated way enormous amounts of these elements are combined is difficult to understand as a whole. Here is what ChatGPT answered when I asked whether its own behavior is fully understood:

As an AI language model, ChatGPT’s behavior is determined by its training data and its underlying architecture. While the architecture and training methodology used to create GPT-3 are well understood by the developers who created it, the exact processes by which GPT-3 produces its output and the full scope of its capabilities are not fully understood.

That being said, extensive research has been conducted on GPT-3 and other language models, including studies on their performance, biases, and ethical implications. These studies provide valuable insights into the behavior of language models like ChatGPT, and ongoing research is continually expanding our understanding of these models.

Overall, while we may not fully understand the inner workings of ChatGPT or other language models, there is a wealth of knowledge and research available to help us understand and work with them effectively.

The overall behavior of a chatbot like ChatGPT has characteristics very close to reports by or discussions with knowledgeable experts: clear sentences with almost no errors in grammar or spelling, clear arguments and explanations, the willingness to accept mistakes in earlier parts of the discussion, the ability to absorb new information. Often statements look convincing. However, sometimes obvious errors arise in the reported facts. Some never happened, others are attributed to the wrong people on mistaken locations or refer to inappropriate circumstances. It looks good, but fact-checking is a must before the answers can be applied.

Again we have a miracle. Should we accept and study responsible applications, or do we want to understand it fully? Let us consider this issue from the point of view of emergence in large, complex systems. Emergent features are observations made on the behavior of such systems under specific circumstances. These features are not properties of the system as such, but of its outcomes. They depend for the deep learning based chatbots on the training data and the user input.

The magic of the neural network is not its architecture, but its possibility to reveal properties of language and images without the need to make these explicit by expert defined rules. It might thereby be a tool to derive these from their results. However, it appears to be difficult to derive arguments and rules from neural network outputs that are useful for experts.

The redundancy and vividness of languages and images seem to be the source of the emergent behavior. Complex hardware and software are used to reveal them. We can try to understand how this is possible, but that brings us to the basics of languages and nature in general. Instead, we can also focus on using it properly, in a way that is beneficial to society. The latter now seems much more urgent.