Beyond Prediction: Reconceptualizing Cognition as Generative Autoregression
Note: This essay lays out a high-level theoretical framework. I will argue in an upcoming paper that this framework accounts for a vast body of empirical results across neuroscience and cognitive science, and also generates novel, testable predictions. The primary purpose of this piece is to establish the core theoretical proposal and its foundational distinctions from current approaches.
The Predictive Framework
The prevailing intellectual current in contemporary neuroscience and cognitive science frequently characterizes the brain as a sophisticated "predictive coding machine." Within this influential framework, the brain is seen as a tireless inference engine, constantly anticipating sensory inputs and refining its internal model of the external world through the minimization of "prediction error"—the divergence between what it expects and what it actually receives. This perspective offers compelling explanations for various aspects of perception, learning, and action, proposing a unified computational drive to mitigate unexpected information or "surprise."
Yet, while this predictive narrative offers a powerful lens for understanding certain localized neurobiological computations, I believe that it fundamentally mischaracterizes the deeper, more pervasive computational structure of cognition itself. It often portrays the brain as primarily reactive, perpetually calibrating its internal representations to an external reality, essentially asking: "What am I about to perceive, and how accurately did I forecast it?" Instead, I propose a fundamentally different, yet profoundly parsimonious and elegant, paradigm: the brain is not merely predictive and reactive, but fundamentally proactive, generative, and autoregressive. This paradigm, strongly inspired by the operational power of large language models (LLMs), is driven not by the goal of prediction, but by the imperative of action and continuous production. This distinction transcends mere semantics; it represents a critical conceptual and computational departure with far-reaching implications for how we understand thought, perception, and behavior.
The generalized application of the predictive paradigm to encompass the entirety of cognition reveals significant conceptual and computational challenges. Firstly, the very notion of "prediction" at the scale required for a unified theory of cognition quickly becomes computationally unwieldy and biologically questionable. A strict interpretation of predictive coding suggests the brain is concurrently generating predictions across a multitude of sensory modalities (visual, auditory, proprioceptive, interoceptive, etc.) for potentially extended future durations. This necessitates continuous comparison of these voluminous, parallel predictions against incoming sensory data to calculate minute "prediction errors." For dynamic, complex, and open-ended processes—such as navigating an unfamiliar environment, maintaining a fluid conversation, or formulating an abstract thought—the sheer computational overhead of maintaining, updating, and comparing such intricate, multi-modal predictions in real-time strains credulity. The "ground truth" against which these long-range, multi-modal predictions would be perpetually validated remains elusive, if not practically undefined.
Furthermore, the emphasis on prediction-error minimization frames the brain as fundamentally reactive, perpetually responding to and adjusting for discrepancies. While predictive coding accounts for action as a means to reduce prediction error—by making sensory input conform to prediction—this introduces a crucial conceptual chasm: if prediction is the brain's primary task, it remains unclear why a particular prediction is made, or what specifies the appropriate action to minimize a given error. The framework often necessitates an additional layer of "prior preferences" or goals to guide action, but this introduces a separate, independent specification that stands apart from the core predictive mechanism itself. This separation means that predictive coding struggles to provide a seamless, unified account of the entire perception-to-action loop, particularly how specific, goal-directed actions are chosen or initiated beyond merely confirming an internal forecast. In contrast, our inherent capacity for purposeful behavior—to not just react to what is given, but to actively explore, plan, and create our own path in a complex, uncertain world—strongly suggests a cognitive architecture where initiation and production are paramount. Our ability to generate novel solutions and self-directed engagement with our environment points to a system where the drive is to do, to produce, rather than merely to confirm or adjust predictions.
The Autoregressive Engine: A New Foundational Framework for Cognition
Our ability to generate novel solutions and self-directed engagement with our environment points to a system where the drive is to do, to produce, to generate, rather than merely to confirm or adjust predictions. Here, instead of a predictive framework, I posit that the brain operates as an autoregressive generative engine. This implies that the brain's activity, at its most fundamental level, involves the proactive, continuous production of its own internal states, where each new state is dynamically conditioned by the preceding sequence of states and ultimately culminates in behavioral outputs as its core purpose. Formally, St+1=f(St,context), where the "context" denotes the internally maintained history of the system's own unfolding, not an external prediction target. This mechanism, profound in its simplicity, gives rise to the remarkable complexity of cognition. The elegance of this framework resides in its inherent self-conditioning loop. Each state generated within the system immediately becomes an integral component of the context for the next. This creates a continuous, self-propagating feedback loop where the brain effectively "talks to itself," shaping its own future trajectory based on its internal history. This dynamic provides a unified account for the seamless continuity we experience in thought and action, transforming what might otherwise appear as disjointed computations into an unbroken stream.
Crucially, my autoregressive generative model does not necessitate an explicit or implicit "world model" from which predictions about external causes are generated. The "model" is the dynamic generative capacity itself—an intricate set of learned parameters and rules that dictate how internal states transition and unfold. The brain's knowledge of the world is not a static representation to be compared against incoming sensory data; it is embedded within its learned ability to generate coherent and functional internal sequences in response to (and in anticipation of) interaction with reality. My generative framework explicitly does away with the notion of the brain constructing or maintaining any kind of explicit "internal model" of external reality for forecasting purposes. Instead, the "knowledge" of the system is entirely embedded in its dynamic capacity to generate these self-conditioned state sequences.
This new conception is directly inspired by the remarkable capabilities of modern Large Language Models (LLMs). These models, with their astonishing capacity for coherent, novel text generation based purely on autoregressive dynamics, provide a compelling computational analogue for this proposed mode of brain function. A key insight derived from LLMs is that their impressive outputs do not stem from possessing an explicit "world model" or a traditional database of stored facts. Instead, the model's vast array of learned weights serve as "generative potentialities." These potentialities, through their dynamic activation, produce meaningful and contextually appropriate outputs, implicitly reflecting statistical regularities and relationships that, in other frameworks, might be attributed to an explicit world model or declarative knowledge. LLMs demonstrate that such an explicit, separable model is not necessary for a system to appear intelligent and to interact coherently with complex information. For instance, an LLM encountering a misspelled word or a grammatical anomaly in its input will typically "power through," maintaining its learned generative trajectory and coherently producing the next token based on the broader context. This behavior, where the system prioritizes the continuity of its internal generation over meticulous error-checking against every input anomaly, more closely mirrors human cognition when faced with imperfect sensory input or communication: we often resolve ambiguity by generating a coherent interpretation rather than pausing to register a precise error. Here, I propose a similar model for cognition more broadly. The brain operates as a collection of interacting, modality-specific autoregressive streams—for example, visual, auditory, somatosensory, and motor—which continuously generate their own internal states. These diverse streams are not isolated; they influence and condition each other, converging to produce the "terminal output" of behavior, thereby orchestrating our seamless and purposeful engagement with the world.
It's important to clarify that while LLMs serve as a compelling analogy for the brain's core autoregressive operation, the broader class of "generative models" in AI encompasses diverse architectures. For instance, diffusion models, while undeniably generative, operate through an iterative denoising and refinement process, constructing an output over many steps rather than producing a sequential, history-dependent stream in the same manner as an autoregressive LLM. While the brain may employ diffusion-like processes for specific tasks (e.g., iteratively refining a noisy perceptual input into a clear percept), they do not represent the fundamental, continuous, and history-dependent generation that defines the "Autoregressive Brain" for thought and action. My focus remains on the elegant simplicity and ubiquitous applicability of the self-conditioning autoregressive dynamic.
This intrinsic ability to generate sequential states, where the "current state" (or context) encapsulates a rich, compressed history of relevant past states and learned patterns, also imbues the autoregressive system with a profound capacity for longer-range coherence and apparent foresight. What might be perceived as "prediction" in other frameworks, I term the "pregnant present." The current internal state is "pregnant" with the future because the system's learned generative rules encode the likelihood of certain sequences unfolding. It is not explicitly forecasting a specific future event, but rather being inherently biased towards generating a future that aligns with its deeply learned patterns, internal goals, and successful past trajectories. This allows the system to seamlessly transition from one state to the next, inherently guided towards a logical or desired progression, without the need for computationally expensive, parallel predictions of external sensory inputs. Ultimately, the brain's internal generative processes, while yielding specific perceptual states or thoughts, are optimized for their utility in facilitating effective and beneficial behavior in the world. This functional imperative guides the continuous self-generation of all cognitive processes.
Cognition Through the Autoregressive Lens
With the autoregressive engine as a foundation, we can now re-examine the landscape of cognition. By moving beyond the predictive paradigm, we believe we reveal a more elegant and consistent explanation for phenomena spanning perception, action, and learning—phenomena that the predictive coding framework purports to explain, but often does so with unnecessary computational baggage or conceptual strain. Critically, the internal generative processes of the brain, while yielding specific perceptual states or thoughts, are ultimately optimized for their utility in facilitating effective and beneficial behavior in the world. This functional imperative guides the continuous self-generation of all cognitive processes.
Perception: Active Construction and Grounding of Internal Streams
The brain's engagement with the world through perception is traditionally framed as an inferential process, where sensory input is constantly checked against internal predictions to minimize error. In the autoregressive brain, however, perception is redefined as the active, continuous generation of its own internal perceptual states that cohere with incoming sensory information. Sensory input is not a target for prediction; it serves as a dynamic, real-time conditioning feature that guides and constrains the internal generative process, thereby enabling the proactive pursuit of beneficial behavioral outcomes. The efficacy of these generated perceptual states is judged by their capacity to support the ongoing, beneficial behavioral trajectory.
A key challenge for purely autoregressive systems, such as those responsible for language production, is grounding: how does a system whose logic runs solely on internal state-to-state transitions connect meaningfully to external reality? Here, we propose that perception provides this crucial grounding. Sensory input acts as a kind of internal "prompt injection system" for the brain's various internal autoregressive streams, including language. Incoming visual, auditory, or somatosensory data directly conditions the generation of perceptual states, which in turn profoundly influence and constrain the ongoing generation of linguistic and other cognitive states. Conversely, internal linguistic or cognitive streams, generated from within, can activate and mold perceptual processes, shaping what we "see" or "hear" through top-down influence. Ultimately, however, the grounding of all these internal streams—perceptual, linguistic, and cognitive—occurs only insofar as they support effective behaviors, whether those are individual actions or coordinated actions with others.
This functional imperative provides a novel lens through which to reinterpret phenomena traditionally offered as prime evidence for predictive coding. We will now re-examine several such instances, demonstrating how the autoregressive generative framework offers a more parsimonious and comprehensive account.
Perceptual Illusions: Predictive coding typically interprets phenomena like visual illusions (e.g., the blind spot, ambiguous figures, apparent motion) as demonstrations of the brain's inferential power—its ability to "fill in" missing information or resolve ambiguity by selecting the most probable interpretation to minimize prediction error. However, from an autoregressive generative perspective, these instances offer a different insight. They illustrate how the autoregressive generative system, under conditions of ambiguous or incomplete sensory conditioning, actively constructs the most coherent and behaviorally optimal internal perceptual states. When sensory input is ambiguous, the brain's learned generative rules, influenced by the partial conditioning signals, cause it to generate the most plausible and functional perceptual state sequence. This internal construction is prioritized because it reliably supports effective interaction with the world, even if the resulting percept deviates from objective reality. Analogous to how a Large Language Model "powers through" minor errors in input to maintain a coherent and useful linguistic trajectory, the perceptual system can absorb or ignore errant sensory data to sustain a perceptual trajectory that will ultimately lead to a beneficial behavioral outcome. The "filling in" of the blind spot, for instance, isn't a prediction of what should be there, but the seamless generation of a continuous visual perceptual state that supports unbroken visual engagement and effective action, ultimately providing a stable perceptual foundation for other interactive streams.
Repetition Suppression/Facilitation: In predictive coding, familiar stimuli elicit smaller neural responses due to reduced prediction error. The autoregressive brain offers a simpler explanation: repeated or predictable stimuli correspond to well-learned, efficient generative trajectories of internal states within the brain's state space. Less neural activity is required because the system can smoothly generate the next state in the sequence without significant "resistance" or deviation from its established internal dynamics. Novel stimuli, conversely, demand the generative system to explore new or less consolidated internal pathways, leading to greater activity as it constructs a new perceptual state sequence for an unfamiliar input or begins to learn a new generative trajectory. The efficiency gained from familiar stimuli directly contributes to the brain's overall capacity to generate effective and rapid behavioral responses across the integrated system.
Multisensory Integration: While predictive coding describes different sensory modalities predicting each other to minimize joint error, the autoregressive brain views multisensory integration as a process where the generative system simultaneously incorporates complex, concurrent conditioning features from multiple sensory streams. The brain's internal generative process then yields a unified, coherent perceptual state that emerges from the blended influence of these diverse inputs on its state transitions. The success of this integration is ultimately measured by its contribution to a robust and unified basis for action, often by providing a coherent perceptual context for other interactive autoregressive streams.
Attention: The Combinatorial Autoregressive Stream
In the autoregressive brain, attention is not merely a mechanism for weighting prediction errors or enhancing sensory signals. Instead, attention itself is understood as a distinct autoregressive stream whose primary function is to dynamically combine, filter, and prioritize the influences of various other internal and external conditioning sources on the ongoing generative process. It is the master integrator, orchestrating the flow of information among disparate streams—perceptual, linguistic, mnemonic, emotional, and motor planning.
This attentional stream continuously generates its own states, with each state reflecting a dynamic weighting or selection of which other streams or features are most salient for the subsequent cognitive or behavioral output. When we focus our attention on a particular task or object, the attentional stream generates states that amplify the conditioning influence of relevant perceptual inputs (e.g., a specific voice in a crowded room) while diminishing others, enabling the focused generation of, for instance, a coherent auditory stream for comprehension. Furthermore, attention mediates the interplay between language and perception. For example, when a verbal cue directs our attention ("Look at the red car!"), the linguistic stream generates states that condition the attentional stream, which then biases the perceptual stream to generate visual states corresponding to a red car, actively searching for and constructing that percept. This demonstrates attention as the dynamic, self-conditioned process through which the brain synthesizes and guides the myriad of generative processes toward a unified and purposeful cognitive output. Its utility lies in enabling adaptive and context-sensitive behavioral generation by managing the complex interplay of information across the brain's many specialized autoregressive streams.
Action and Behavior: Produced Trajectories for World Engagement
Perhaps the most stark divergence from the predictive paradigm lies in the realm of action. Predictive coding’s "active inference" claims that behavior is enacted to minimize prediction error by making sensory input conform to internal predictions. This frames action as fundamentally a feedback loop to confirm belief, often appearing as an indirect means to an end.
Goal-Directed Behavior and Planning: In the predictive framework, goals are frequently conceptualized as "prior preferences" or desired sensory states that the system actively infers actions to fulfill, thereby minimizing the expected free energy associated with not achieving those states. The brain, in this view, predicts the sensory outcomes of potential actions and chooses those that best align with its preferred predictions. However, the autoregressive brain proposes a more direct and intuitive mechanism: behavior is the continuous, self-conditioned generation of motor sequences and plans, directly aimed at effective world engagement. Actions are not primarily executed to satisfy a prediction; they are the emergent, unfolding output of the brain's internal generative dynamics, optimized for their real-world utility. Goals, in this view, are not abstract prior preferences over predicted sensory states to be minimized. Instead, they function as high-level, learned attractors or targets that bias the generation of behavioral sequences. When an agent decides to act—for example, to grasp a cup—its brain doesn't primarily predict the exact sensory trajectory of that action and then correct deviations. Rather, it generates the sequence of motor commands that, through learned associations and its "pregnant present" foresight, is most likely to result in the desired physical outcome. Planning, in this view, constitutes the internal, abstract generation of hypothetical action sequences, effectively running the generative engine in simulation. These generated plans are then internally evaluated for their likelihood of leading to a desired generated outcome—a seamless behavioral trajectory—before being executed in the real world.
Motor Control and Reflexes: In a predictive framework, motor commands are sometimes conceptualized as "proprioceptive predictions" that the body then works to fulfill, with reflexes serving as local error-minimizing loops. This often appears as an unnecessarily circuitous explanation for the directness of movement. The autoregressive brain, conversely, views motor control as the continuous, autoregressive generation of precise, context-dependent motor commands. Reflexes are not error-minimizing loops, but rather highly efficient, low-level generative patterns that produce immediate motor responses based on sensory conditioning, ensuring the smooth unfolding of the motor sequence. The feedback from proprioception and other senses (e.g., from tactile contact with the cup) acts as a real-time conditioning signal that dynamically adjusts the motor generator's output to maintain the intended behavioral trajectory. This ensures that the generated action successfully interacts with the world, affirming the utility of the generated sequence, rather than simply confirming an internal prediction. The system prioritizes efficient and effective task completion.
Learning: Optimizing the Generative Rules for Behavioral Efficacy
In the predictive coding framework, learning is predominantly understood as the continuous updating of the brain's internal generative model, driven by the minimization of persistent prediction errors. The brain, in this view, adjusts its internal parameters to become a more accurate forecaster of its sensory future. The autoregressive brain, however, offers a distinct and more direct account: learning is the continuous optimization of the rules or parameters that govern the autoregressive transitions of its internal states. This optimization is fundamentally driven by the real-world consequences and ultimate utility of its generated outputs.
When a generated output—whether an internal perceptual state that leads to a correct identification, a thought process that solves a problem, or a physical action that achieves a goal—leads to a beneficial outcome (as defined by internal reward signals, external feedback, or the coherence of the generated sequence with environmental structure), the specific generative pathways and parameters that produced that successful outcome are strengthened. Conversely, if a generated sequence leads to failure, an undesirable state, or a suboptimal behavioral trajectory, those generative pathways are weakened or altered, prompting the system to explore alternative generative patterns.
This process is directly analogous to how modern Large Language Models learn and refine their vast parameters during training. An LLM isn't learning to predict an external "ground truth" about the world in a separate model; it's learning to predict the most probable next token given its internal context, and its parameters are adjusted based on the statistical success and coherence of its generated sequences. The brain, similarly, isn't primarily learning to make better predictions about external inputs; it's learning to become a more effective and robust generator of its own internal states and its dynamic interactions with the world. This means the system continuously refreshes its "grammar" of generation based on the consequences and utility of its dynamic output, enabling it to produce more effective perceptual and behavioral sequences in novel situations.
Conclusion: Reaffirming the Paradigm Shift
The prevailing "predictive brain" hypothesis, while offering valuable insights into specific computational aspects, ultimately casts cognition in a fundamentally reactive light. It frames the brain's primary endeavor as minimizing surprise by constantly forecasting external sensory inputs and correcting deviations. However, as argued herein, this perspective often becomes computationally cumbersome, conceptually strained, and fails to adequately capture the inherent dynamism, self-generating capacity, and purposeful flow of the mind.
Instead, we have introduced an alternative, more elegant, and intuitively compelling framework: the autoregressive brain. Rooted in the principle that the brain's activity is a continuous, self-conditioning sequence generation, this model asserts that each internal state directly gives rise to the next, forming an unbroken trajectory of cognition, perception, and action. Inspired by the profound capabilities of Large Language Models, this view posits that the brain's "knowledge" is not an explicit model used to predict external causes, but rather its learned capacity to produce coherent and functionally beneficial sequences.
We have demonstrated how this autoregressive lens provides a fresh and powerful account for core cognitive phenomena:
Perception is seen as the active, utility-driven generation of internal perceptual states that cohere with sensory conditioning, rather than an inference process minimizing prediction error against external reality. Illusions and attention are understood as the system's learned capacity to generate robust, behaviorally optimal interpretations even from ambiguous or noisy input, prioritizing functional coherence over literal external prediction.
Action and behavior are viewed as the unfolding of internally generated motor sequences and plans, driven by high-level goals that bias the generative process, rather than as a means to fulfill sensory predictions. Feedback from the world is crucial, not for correcting prediction errors, but for optimizing the very rules of this generative machinery, ensuring that future productions are more effective.
Learning is reinterpreted as the optimization of these underlying generative rules, adjusting the parameters that govern state transitions based on the real-world utility and success of the brain's outputs, rather than solely on minimizing errors in forecasting.
The autoregressive brain offers a unified account of cognition that is both parsimonious in its core mechanism and expansive in its explanatory power. It shifts the focus from a brain primarily concerned with merely predicting the world to one fundamentally geared towards proactively generating its engagement with the world. This paradigm embraces the inherent temporal flow of consciousness, the proactive nature of agency, and the creative capacity for novel thought and action. It suggests that the future is not merely predicted but actively unfolded, with the "pregnant present" inherently guiding the continuous, self-conditioned genesis of the next cognitive state.
This reframing opens exciting avenues for future research. It encourages us to look for autoregressive dynamics in neural circuits, to model cognitive processes through continuous state-space trajectories, and to further explore how biological systems might implement the kind of powerful, self-organizing generative intelligence we are now witnessing in advanced AI. By recognizing the brain as a masterful generative engine, we can move beyond simply forecasting its next input and begin to truly understand how it produces our mental lives.



Thank you for this. As a peron with a blind spot, it think expolration of pathology in the sytem. Confagulation in dementia, gives us insights into the autogenertive system; showing that lanaguage, perception, cognition all have the same basis as you state. So expressive dyphasua, visual blind spots, cobfagulation, demonsrate that all three areas use the same generative structure - ie the "fill in the gaps" autogeneration is the same. Might be an interesting area to explore.
This is what I was looking for back when I was a teenager reading everything I could find about psychology, philosophy, neuroscience. This is what I was looking for when I was reading the Bhagavad Gita and the Tao Te Ching. It seems to be a description of the way I operate. Of the way humans work.