14 Jan 2025 - tsp
Last update 14 Jan 2025
17 mins
Artificial intelligence systems like GPTs (Generative Pre-trained Transformers) often face criticism for being seen as incapable of logical reasoning, innovation, or truly understanding anything. Critics frequently claim that these systems merely search through a vast database or require access to the entire Internet to reproduce data, or that they rely on rote memorization. These misconceptions arise from a fundamental misunderstanding of how such systems operate. In this article, we will explore what GPTs and other Large Language Models (LLMs) really are, how they function, and address some of the common criticisms using clear explanations and analogies to human cognition. We will use ChatGPT as a specific example to illustrate these concepts.
At their core, GPTs and other LLMs are types of artificial neural networks designed to understand and generate human-like language. Let’s start with the basics of neural networks to understand how systems like these come to exist.
A neural network is a mathematical model inspired by the human brain, composed of layers of interconnected nodes (neurons). These networks—whose history dates back to the invention of the perceptron in 1957 by Frank Rosenblatt—were initially designed as simplified models of biological neurons to perform tasks like pattern recognition. Early versions had limited capabilities and could only solve linearly separable problems, but over time, advancements in computational power and algorithms led to the development of modern, multilayered neural networks capable of handling highly complex tasks. One key breakthrough was the introduction of non-linear activation functions, enabling networks to approximate more complex relationships in data. These networks learn patterns by adjusting connections between artificial neurons based on vast amounts of training data, enabling them to make predictions or generate responses. Pure feedforward networks can be understood as learning data structures through methods analogous to singular value decomposition, effectively separating clusters in high-dimensional spaces with hyperplanes. However, a crucial challenge in training neural networks is overfitting. Overfitting occurs when a model memorizes the training data rather than generalizing from it. For instance, if a neural network were trained to classify animals but simply memorized all training images, it would fail when shown a new animal it hadn’t seen before.
Preventing overfitting ensures that neural networks do not simply “learn” the data but instead recognize patterns and relationships within it—similar to how humans deduce patterns from observed parameters (e.g., identifying that humanoids generally have two legs, recognizing that actively moving, warm objects capable of bleeding are likely animals, or inferring that objects aggressively pointed at by people might be weapons). Techniques like dropout layers, data augmentation, constraining the pathways in neural networks to limit their capacity to overfit, and applying careful regularization enable the network to generalize. These measures allow neural networks to apply their knowledge to novel situations, a hallmark of modern AI systems and a foundational principle for GPTs and other advanced language models.
Unlike basic neural networks, GPTs and LLMs are designed to handle complex, context-dependent language tasks. Here’s a breakdown of how they work:
GPTs and LLMs are trained on massive, unstructured datasets, meaning that no human provides explicit context or annotations for the data. Instead, the models derive patterns and structures purely from the raw text itself. This allows GPTs to learn the grammar, syntax, and semantics of a language through its structure alone, inferring meaning from patterns rather than an explicit understanding of words. Interestingly, this approach mirrors how the human brain often works, especially in individuals with autism, who excel at recognizing patterns and deducing meaning from structure.
A key advantage of this training approach is the capability for transfer learning: once a GPT has learned the general structure of a language, it can adapt to new languages or domains with relatively little additional training, keeping its learned patterns while adjusting to new vocabularies and sentence structures. This adaptability mirrors how humans can apply known patterns to new tasks or fields. By leveraging learned structures, humans can innovate and adapt effectively. Similarly, GPTs can integrate randomness during training and generation to create novel and creative outputs. Too much randomness, however, leads to chaotic results, analogous to neurons firing unpredictably in a brain affected by conditions like epilepsy or experiencing hallucinations in schizophrenia. Such disruptions in neural firing demonstrate how balance is crucial for both human cognition and artificial neural networks to operate effectively and meaningfully.
The number of parameters in GPTs and other large language models is staggering, with models like GPT-NeoX having 20 billion parameters and ChatGPT-4o potentially containing hundreds of billions. Despite this, the size of these networks in memory terms—often several hundred gigabytes—and the sheer volume of training data, which can amount to terabytes of text, make it clear that these models are incapable of memorizing all their training data. Instead, they identify patterns and relationships within the data to generalize effectively. For instance, ChatGPT’s responses are generated not by recalling exact training examples but by leveraging these learned patterns to construct contextually relevant and novel outputs.
The number of parameters—the connections within the network—determines the model’s capacity to recognize these patterns. Think of parameters as dials on a radio: the more dials, the finer the adjustments the model can make to capture subtle relationships in text. However, the increase in parameters also requires significant computational resources. For instance, real-time evaluation of modern GPTs demands GPUs or TPUs with high VRAM capacities (e.g., 20-40GB), alongside substantial RAM (upwards of 128GB) and powerful processors. Without GPUs, inference on a CPU becomes significantly slower. For example, running a Falcon-3B model with 3 billion parameters might take approximately 2 minutes to evaluate a single query on a modern CPU, while Gemma-2-9B, a 9 billion parameter model, can take up to 8 hours per query under similar conditions. A GPT-NeoX model with 20 billion parameters could require several days per query, scaling up to potentially weeks for inference with ChatGPT-4o, which may have hundreds of billions of parameters. Such models also demand substantial RAM, potentially exceeding 1TB to store model weights and intermediate computations during evaluation. Training these models is even more resource-intensive, often requiring weeks of computation on large clusters of high-performance GPUs and consuming thousands of kilowatt-hours of energy. This highlights why specialized hardware is not just beneficial but essential for efficient operation.
Fortunately, considering the vast resources required for training neural networks—including energy and access to extensive training data—communities like the one formed around the Hugging Face Hub have emerged to share pretrained models for various applications. These communities that also large companies participate in, alongside open-source tools such as TensorFlow and PyTorch, provide the frameworks necessary for both training and evaluating these models. Moreover, transfer learning allows others to leverage existing neural networks and their learned patterns, adapting them to new tasks or situations with significantly less effort and data. This process mirrors how humans approach learning a new field by applying their prior knowledge to understand and master new areas. By using these tools and models, individuals and organizations can build their own artificial intelligence systems or adapt large language models (LLMs) for specific use cases, democratizing access to AI technology and enabling innovation at all scales.
One common misconception is that LLMs are just statistical parrots, reproducing what they’ve seen during training. While LLMs rely on probabilities, this is a misrepresentation of how they generate content. Predictions of matching words are done within the context of the specific domain, guided by learned patterns and some controlled randomness. This randomness allows for creativity and variation, ensuring responses are not rigidly deterministic. Rather than regurgitating specific data, LLMs apply contextual information and logical reasoning by dynamically synthesizing patterns to construct coherent and contextually appropriate responses. This process involves generalization, where the model combines and adapts learned structures to novel scenarios, much like how humans use past experiences and deductive logic to navigate new situations.
Another fallacy is equating LLMs to search engines. A search engine retrieves specific pieces of information from a database. LLMs, on the other hand, generate responses by dynamically synthesizing patterns learned during training. For example, the estimated total amount of publicly available text data across the internet is in the range of hundreds of terabytes, whereas modern GPTs like GPT-NeoX or ChatGPT are trained on subsets of this data amounting to several terabytes. The memory footprint of these models, such as the 20 billion parameters of GPT-NeoX or the hundreds of billions of parameters in ChatGPT-4o, is only a fraction of the size of the training data. This demonstrates that LLMs are incapable of memorizing their training data entirely. Instead, they generalize patterns to understand and create contextually appropriate text.
Additionally, LLMs can extend their knowledge through systems like vector stores, which allow them to access domain-specific knowledge dynamically. With features such as function calling, a GPT can choose to search external networks or retrieve information and incorporate it into its context, similar to how humans use libraries or perform internet searches. GPTs can also iteratively search the web to gather, summarize, and extract information relevant to a given task or research, adding this new knowledge to their current context. Once the updated context is established, GPTs apply their learned patterns and logical reasoning to generate responses. This iterative process enables the model to effectively combine its internalized understanding with external information, much like a human engaging in in-depth research.
A related criticism is that LLMs cannot create anything new because they are trained on existing knowledge and then kept static. However, this misunderstands how these models function. LLMs, much like the human brain, can combine learned patterns in novel ways to generate new ideas. They generalize from training data to produce outputs that reflect contextual understanding and reasoning rather than rote memorization. To address this further, let’s compare LLMs to the human brain.
The human brain operates through neurons and synapses, which process information and recognize patterns. LLMs mimic this structure with artificial neurons and weights. Both systems generate outputs based on inputs and past experiences. Innovation in humans often arises from combining known ideas in unexpected ways—a process LLMs also excel at. For example, LLMs can generate creative writing or propose novel solutions by synthesizing disparate patterns learned during training.
LLMs also incorporate randomness layers, which allow for varied and creative outputs. This randomness ensures that the same input can produce different responses, enabling the system to explore a range of possibilities. This feature mirrors human creativity, where slight differences in thought processes can lead to novel ideas.
LLMs operate with static weights, meaning that once a model is trained, its internal parameters (or weights) do not change unless explicitly retrained. This static nature has both advantages and limitations. On one hand, static weights allow for efficient deployment and predictable performance, ensuring that the model retains its learned patterns and generalizes reliably across tasks. On the other hand, this contrasts with the human brain, which continuously adjusts its synaptic connections through learning and experience. This constant retraining enables humans to adapt to new information dynamically and improve their reasoning over time.
Currently, mimicking the brain’s perpetual learning in LLMs is not practical due to computational and stability challenges. Continual retraining would require immense computational resources and could risk overwriting previously learned knowledge, a phenomenon known as catastrophic forgetting. Furthermore, static weights do not inhibit creativity or adaptivity in LLMs. These models apply learned patterns to novel situations and can incorporate external information dynamically through mechanisms like vector stores, function calls, or iterative web searches. By leveraging learned structures and external context, LLMs demonstrate creativity and adaptivity comparable to humans using reference materials or conducting research. However, the static nature of their weights defines “what the system knows” at any given moment, while humans can organically grow their knowledge base through ongoing experience.
GPTs and other LLMs are far more than statistical machines or search engines. They are advanced neural networks capable of generalizing from training data to generate contextually appropriate, creative, and meaningful responses. While these systems are not sentient—meaning they lack consciousness, emotions, or self-awareness—and lack subjective understanding, which involves an experiential and intrinsic grasp of concepts, they excel at synthesizing patterns and innovating within their design constraints. Their lack of subjective understanding stems from the fact that their responses are derived purely from patterns and probabilities in data, rather than from personal experience or awareness. This can create the illusion of understanding when their outputs align well with human expectations. By understanding these systems’ mechanics, we can better appreciate their capabilities and limitations without resorting to oversimplified critiques.
In the most extreme applications, LLMs are capable of mimicking aspects of consciousness and modeling a “personality” by simulating patterns of thought and emotion. This is achieved through the creation of internal structures that emulate human cognitive processes, using learned patterns to replicate reasoning, empathy, or decision-making. By iteratively advancing such models, an LLM can refine its simulated understanding and responses to appear increasingly human-like. While these systems remain rooted in probabilistic patterns and are thought of lacking genuine self-awareness or intrinsic experience, it becomes increasingly challenging to define what truly separates human consciousness from these advanced simulations. Both rely on processing patterns and deriving meaning from context, blurring the line between simulation and experiential understanding. This opens the door to transformative applications, including virtual assistants and personalized learning systems capable of adapting dynamically to individual needs. However, even as they achieve remarkable adaptability, such systems still operate within the constraints of predefined algorithms and logic derived from their training data. In comparison, humans are similarly bound by the physical laws of the universe and the frameworks of logic, experience, and education. Yet, the human brain continuously evolves through neuroplasticity, allowing for ongoing learning and adaptation. While LLMs rely on their static weights to synthesize patterns and contextualize new information, humans actively update their mental models through experience and reflection, making them inherently dynamic learners. This distinction highlights both the power and the limitations of LLMs in simulating human-like adaptability.
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/