machine-learning – Thinking with Nate

How is AI like human intelligence?

(The image for this post is a photo of a jacquard loom I took in the workshop of Luigi Bevilacqua in Venice. It’s a heavy wooden frame covered in pulleys and string, in the midst of weaving fine velvet in gold and red. The loom is automated using punch cards, and served as a model for early computers. It’s not something we’d usually think of as “AI,” but it is a symbolic mechanism that automates a complex human behavior.)

The term “Artificial Intelligence” has been around since the 1950s, and it’s always been ambiguous. Generally, AI is about reproducing the intelligence of living things using math and machines, and there have been many different approaches to that problem. However, in the past decade or so, the word AI has become synonymous with a class of algorithms known as Deep Learning (DL). This technology has produced stunning results and has rapidly been integrated into software of all kinds. This has led to mixed reactions, including ethical concerns and active debates about what “human-level general intelligence” means and whether or not AI has it. But what does DL actually do? How is it like and unlike human intelligence? Let’s at least scratch the surface of this important question.

Personally, I’m frustrated by the AI field’s obsession with DL. We act like brains are the secret to intelligence, DL is just a “brain in a computer,” and anything else is of marginal interest. But, in truth, the brain is just one small part of a vast and diverse intelligent system. Consider:

Evolution “designed” sophisticated solutions to real-life challenges without the use of top-down engineering or even conscious thought.
Organisms of all kinds perceive, analyze, decide, and react to the world in real time, with or without a brain.
Brains come in all shapes and sizes, from simple to complex, with many species-specific architectures and special-purpose modules for things like sensory perception, emotions, memory, and motion planning.
Mammals have an additional brain structure called the neocortex (birds have an analogous structure called the dorsal ventricular ridge) which provides a layer of abstract cognition on top of the older, more specialized parts.
Individuals compete and collaborate to form ecosystems, colonies, and societies that are intelligent in their own right.

So the brain itself is only a small part of a vast network of intelligence. But also, DL is only sorta like one part of the brain. Neural networks as they exist in DL are very loosely inspired by the fine structure of the neocortex. I like to think of that as the “cognitive fabric” from which the neocortex is built. Evolution has shaped that fabric into special-purpose brain regions, each tuned to solve different problems. These regions are networked together into a particular architecture, providing multiple layers of analysis, careful mixing of perceptions and cognitive faculties, and multiple kinds of self-monitoring. All of that is orchestrated by the lower-level brain structures, which still define the basic emotions, modes of thinking, flow of thought, and the relationship between abstract mental activity and the concrete needs of the body. Generally speaking, DL ignores all of that structure.

If DL captures just one facet of our mind’s intelligence, then what does that part do? It observes data, finds patterns, and learns stereotypes about what it sees. It can apply those stereotypes to extrapolate rich and coherent scenes from noisy fragments of data, filling in gaps with reasonable guesses. The brain uses this tool everywhere, and it’s a crucial ingredient for how humans perceive and think about the world. DL makes that tool available to software developers.

What makes our minds “human like” is the evolved structure that applies this pattern matching / stereotyping faculty in particular ways that generate our perceptions, intuition, biases, self-awareness, train of thought, attention, and dreams. When people make algorithms using DL, we provide the structure that determines how the AI uses that faculty, and thus how it behaves. They resemble the human mind only as much as we try to reproduce the human thought process into our code. We typically don’t, and that’s probably a good thing. Attempting to bring something human-like to life in a computer sounds even more ethically problematic than cloning. Recent experiments into “chain of thought” for language models are a step in this direction, though they aren’t really trying to make the model “think like a person” so much as to get high scores on tests for “reasoning skills,” which is not the same thing.

This raises some interesting questions. Can DL algorithms really understand the world? Sorta. Large language models like GPT provide an interesting example. By consuming vast quantities of text, these algorithms can master nuanced patterns in words that reflect human ideas and the the physical world. They understand these ideas well enough to use them, interpreting and generating both text and images. Yet, they only experience the physical world indirectly, so in some sense they don’t fully understand, and they get many details wrong. It’s an open philosophical question just how different human and machine understanding really are.

Do DL algorithms think or have desires? Generally, no. Most often DL is used to implement a function, in the mathematical sense. They take some input (i.e., an image) and produce some output (i.e., a label for that image). That is the entire scope of their existence. They don’t reflect, compare alternatives, or make decisions. They have no needs to fulfill, and no way to perceive themselves or their environment. More complex DL architectures start to blur the line, though. We give them analogs of memory and attention. In reinforcement learning, we even use DL to make agents that inhabit virtual worlds, have a sense of self, and make their own choices. Perhaps these algorithms could be said to “think,” but their “minds” are alien, adapted to a world of experiences totally unlike our own.

Do we need to worry about AI taking over the world? No, but also yes. The Terminator scenario seems unlikely. Those evil robots are human-like, in ways current AI cannot even begin to approach. In particular, they want to destroy humanity, and take the initiative to act on that. Today’s AI has no desires, and does nothing until prompted. However, there are other, more realistic concerns. Today, we mostly use ML for two purposes: to help computers understand human expression, and to automate human behaviors. Both of these can be problematic, especially if we (incorrectly) assume these algorithms think like people do.

The real danger is trusting these algorithms too much. DL is incredibly good at one thing: stereotyping. It does not have any notion of cause and effect, common sense, morality, or logic. Stereotypes can be effective shortcuts to solving hard problems, but they can cause real harm. Think of Microsoft’s racist chatbot, Google’s smart camera that can’t see Black people, or the tyranny of “the algorithm” in social media. When we allow DL algorithms to understand data for us, or make decisions that influence our lives, we’re trusting a system that has no judgment, sense of consequences, or accountability. That’s taking a big risk, and usually it’s just a few people at a tech company making the decision for millions of others around the world.

I have mixed feelings about DL. It’s an incredible tool, it does some really cool stuff, and it has already created tremendous value for society. It has also done a lot of damage, especially to minority communities. I’m concerned about all the hype, and how rapidly we’ve integrated DL into every facet of life. We don’t understand this technology well enough to know what the consequences will be. I also hate that our focus on DL has blinded the field of AI to other opportunities. Life is full of brilliant designs! By exploring more broadly, we might find other useful tools, but also come to understand ourselves better and how we fit into the bigger picture of living intelligence. Isn’t that more important?

How did AI get so much smarter?

(this month’s photo is a picture of a brown bat. It’s small and fluffy with a stubby nose, and clinging to the gray bark of a tree. Photo by N. J. Stewart wildlife unmodified and used under the Creative Commons license)

When I write about intelligence, I tend to downplay AI and Deep Learning. These are powerful problem solving tools, but they’re over-hyped, and they don’t “think” the way people do. They have no memory, no sense of self, and no goals, at least in the usual sense of the words. But, large language models (LLMs) like OpenAI’s GPT are shockingly good at generating text that seems like something a person might make. They’re much more human-like than anything that came before. Why is that? The short answer is that they use a new kind of Deep Learning architecture known as a Transformer, which introduced a few small tricks that make a very big difference.

The first thing to note is that, while lots of people argue about whether LLMs can answer questions, reason, solve problems, brainstorm, or make art, what they really do is text prediction. They take some words as a starting point and then they guess what comes next based on their training data. If LLMs have any deeper cognitive abilities than that, they must be somehow tapping into the human cultural intelligence that is embedded within that text. Or, maybe they’re just parroting back fragments of intelligent things other people have said, without any understanding or integration—mindless idiots, randomly stringing words together in ways that sound just smart enough to distract us. We honestly don’t know yet! But whatever intelligence they possess, it exists entirely in the realm of language.

Research into getting computers to understand text and speech (known as Natural Language Processing, or NLP) started back in the 1950’s. Back then, computers were specialist’s tools, and making one that anyone could use just by telling it what to do was a dream. At first, researchers tried to formally describe language as we use it, feeding computers dictionaries, grammar rules, and lists of facts, but this never worked! It turns out, we don’t explicitly know all the rules of human language that we intuitively follow, and they’re usually fuzzy rules, with lots of conditions and exceptions. The key challenge of NLP was getting computers (which are obsessively logical and precise) to deal with this messiness and ambiguity, which we don’t even fully understand ourselves. Perhaps the most important advance was when researchers gave up trying to explain language to computers, and instead started teaching them by example.

Modern NLP represents words as lists of numbers called “vectors.” Like an (X, Y) coordinate, each vector represents a point in space. Not physical space, though, more like an abstract space of concepts. Maybe nouns go to the right, verbs to the left. Natural concepts are up, man-made concepts are down. Except, instead of two dimensions, maybe there are 10,000 of them. The layout of this space is pretty arbitrary. The absolute position of a word doesn’t mean anything, only where it is relative to other words. Nearby words have similar meanings, and relationships between words are represented by the distance and angle between them. This is all weirdly self-referential. Words are only defined in terms of other words! But it works surprisingly well. You don’t need explicit rules about which words go together and how, you can just look at lots of examples, and infer those relationships with statistics. People talk about “training” an AI by having it “read” lots of text, but really all that means is iteratively tweaking the lists of numbers, slowly moving the words through this abstract meaning space until they settle into positions that reflect how they co-occur together in the training text.

There’s one big problem with representing words as vectors, though: ambiguity. What do you do with a word like “bat,” which has several meanings? There’s no way one vector can represent this. The trick is to look for context. When you see a phrase like “brown bat” or “wooden bat,” the meaning is clear. Instead of thinking of these as pairs of words, you might think of them as compound words, each with their own distinct meaning. This is a powerful idea, but hard to generalize. Take a more difficult example: “Hearing a strange flutter and crash in the dark, he grabbed his bat for defense and went to investigate” Which kind of “bat” are we talking about? Words like “flutter” and “dark” might suggest the animal, but “grabbing” a bat for “defense” suggests the object instead. We need context to disambiguate, but which context? We’d like to ignore the first half of the sentence (which isn’t talking about the “bat”) and focus on the second half of the sentence (which is).

NLP has found elegant ways to solve this problem. They call these techniques “attention,” since the model is learning to “pay attention” to some words and not others, but I find that name misleading. For human beings, attention is something very different. We seem to have a “mind’s eye” that we can move about at will. We can choose to pay attention to this or that, our attention gets drawn to salient features, and we may even notice our attention drifting and redirect it. But these AIs have no mind’s eye, no will, and no intuition about relevance. The attention models we’re talking about are just more vector math. In addition to finding vectors to represent the meaning of each word individually, they also find vectors to represent patterns of words. They learn, “in this context, these words together mean that.” Adding an extra layer of complexity lets the model represent how words interact to change the meaning of other words or the sentence as a whole.

Researchers have explored many variations on this attention trick. Transformer models use an advanced kind of attention that represents context bi-directionally. They model how different words tend to get modified by context, and how different contexts tend to modify nearby words. The benefit of this is that such a model doesn’t just learn that “brown bat” is the name of an animal, but it might learn that “brown” is an adjective that applies to physical objects, that in English adjectives tend to modify the noun that follows them, and that “bat” can refer to one of several animal species, sometimes distinguished by color. That is, rather than modeling some particular context, models like this can learn general rules and relationships between different kinds of words. They can learn grammar. Not just the “official” grammar of a language like English, but any system of relationships and interactions between words, including dialects, domain-specific jargon, storytelling tropes, or the gender roles of a society.

The other trick that makes Transformers better with language is pluralism. Some NLP systems represent more complex meanings by using bigger vectors. More numbers in each vector means a larger conceptual space. Instead, Transformers use more vectors. They don’t learn the one meaning of this word, they learn to represent the many meanings of this word in the many contexts that contain it. This works a bit like voting. When processing a sentence, several different “attention heads” each consider one possible interpretation of a word, attending to different patterns of contextual cues. The overall meaning is determined by adding them all together. This is really useful for weighing subtle cues against each other to resolve ambiguity, but also to represent sentences with multiple layers of meaning. A word can have many meanings at the same time, and the many meanings of all the words in a sentence can interact in complex ways. The fancy kind of attention used in Transformers can automatically discover this sort of layered structure in language.

As clever as these attention methods are, they are not the secret to Transformers’ success. They do greatly improve the richness of NLP models, but at first they were mostly used with “recurrent neural networks,” a kind of Deep Learning model that processes data sequentially. That’s probably because they work a bit like how we imagine a human reader does: they “read” each word in a text, one at a time, using attention to figure out how each new word should update the meaning of the text so far. This works pretty well, but it doesn’t scale up to long passages of text. These models have a limited attention span, eventually forgetting important details they read several sentences ago. Also, processing long texts one word at a time is painfully slow. Even on the world’s fastest computer, reading a book from beginning to end takes time per page, and training a model like this takes vast amounts of text, so this was a major limitation.

The paper that first introduced Transformers was called Attention is All You Need, which highlights the key innovation: they got rid of the recurrent network, and built an AI using just this attention mechanism, all on its own. In other words, they found a way to do the same vector math, but solving for a large block of text all at once (and possibly out of order) rather than word-by-word. This doesn’t make the model “smarter.” It doesn’t even reduce the overall amount of number crunching. It just makes the work more parallelizable. Instead of having one computer read War and Peace from cover to cover, they could have many computers each read a few paragraphs, then combine their results. This made it possible to throw more money at the problem, using whole datacenters of computers to train a language model on vastly more text than ever before. Billions of documents, trillions of words. It’s the sheer volume of training data that made LLMs so much better. That’s why they’re called “large” language models.

So, how should we think about LLMs like GPT? Well, first off, human language is irregular and complex, but it’s also highly structured. Cleverly designed statistical learning tools can automatically discover that hidden structure just by processing obscene amounts of text. Neural networks are great for letting computers work with these sorts of fuzzy rules. They can extract meaning from text, manipulate it, and generate new text. But to an LLM, words are just vectors, defined by their relationships to each other. They have no connection to physical reality, because LLMs have no physical existence. There is no communication going on when you have a “conversation” with an LLM. To the AI, a dialog is just a sequence of vectors that follow one another according to some grammar. The AI has no mind, no intentions, and no meaning it wishes to convey. It has no conception of being truthful or helpful, only what words tend to follow certain questions. It does not learn from a conversation, it just re-reads the full chat history each time it makes a response. It appears like a good conversational partner, because it is made to imitate one, but what’s happening behind the screen isn’t “thinking” as we know it.

Still, LLMs really are much more human-like than any other AI that came before. Representing language with a high-dimensional abstract concept space works surprisingly well, and so do the “attention” methods described above. They let us represent a huge, open-ended space of ideas that can build on and interact with each other. They let us represent ambiguity, nuance, and innuendo. So, maybe those vector math tricks could actually teach us something about how language processing works in the brain? On the other hand, LLMs are also remarkable in how different they are from humans. An LLM can learn English, but only by reading every document on the internet, not one word at a time, but all at once. In contrast, babies learn language by interacting with the world, learning how words relate to objects, people, events, actions, and desires. Even though they’re exposed to far less language, they learn much faster, and in a way that tightly integrates all of their senses, relationships, and the lifestyle they were born into. Since LLMs seem so human-like, it’s very tempting to imagine them with the same kind of awareness, purpose, and empathy that we have, but they simply aren’t there. Those are a product of being alive in the world, and can’t be found in text, no matter how much of it.

Status Update: Semester 3

I’m at an interesting moment in my studies, so I thought I’d let you know what’s going on!

Year two of my PhD program has begun. I’m about a month into my third semester, and things are going well. I’m taking two classes right now: Evolutionary Computation, and Deep Learning. Most of my Computer Science education has been about how to design algorithms and write software to solve different kinds of problems, but these classes are different. This semester, I’m learning how to get computers to discover their own algorithms, and write their own software. Honestly, the state of the art here is still quite primitive. We’ve found some very impressive techniques, but they each apply to a narrow domain, and we don’t understand them nearly as well as we’d like. Which makes them fun topics to study. 🙂

The other fun thing about this semester is that both of my classes are built around student projects. More or less, I get to pick projects that fit with my research, and the class is there to help me find the time, resources, and guidance to complete the projects successfully. I like this much better than undergraduate style courses built around assignments and exams that are very generic and may not be relevant to my work. We’ll see how things unfold, but I’m currently planning to work on two projects that I’m excited about.

For Evolutionary Computation, I’m working on an experiment about endosymbiosis. I was inspired by this classic experiment, which examined how bacteria evolve antibiotic resistance, and how genetic innovations spread through the population spatially. I’m going to try evolving a host environment that supports an inner population, a bit like how my gut supports a microbiome. The hope is that the host will be able to design a supportive environment, with different regions that cultivate “microbes” with different traits, such that it can guide and coax them into evolving more specialized forms. This is an exciting experiment for me, because I’m not sure what to expect, but I’m pretty confident that something interesting will happen.

A screenshot from the video linked above, showing strains of bacteria gradually growing into bands with increasing concentrations of antibiotic, fanning out from points where key mutations occurred.

For Deep Learning, I’m going to use computer vision techniques to detect interesting patterns in the Game of Life, since I’ve been using that as an environment for my evolution experiments. The Game of Life has very simple rules, but it evolves in complex ways. Most patterns quickly dissolve into empty space or settle into a few boring, stable forms. But rarely, you get something much more interesting. For decades, people have been exploring this space, finding interesting patterns and classifying them. You get huge complex structures that stabilize themselves, change continuously in repeating cycles, or even propel themselves and move at a steady pace. I’ll build a system that can detect and categorize these patterns, so that when my evolutionary algorithm finds them, I can reward it and ask for “more like that.”

Eater 2, a static shape that persists forever, but has the special property of being able to “eat” gliders that collide with it, recovering its shape after.

Monogram, a period-four oscillator, which is small, but occurs very rarely from random conditions.

Examples of interesting patterns in the Game of Life. The first is a static shape that persists forever, but has the special property of being able to “eat” gliders that collide with it, recovering its shape after. The second is a period-four oscillator, which is small, but occurs very rarely from random conditions. The third is a middleweight spaceship, which moves forward two spaces as it repeats itself in four time steps.

This month’s essay is inspired by my Evolutionary Computation class, and the work I’ve been doing to develop the specific research questions I want to focus on for my PhD. So, check back on Wednesday to learn more about how evolution got started, and why it’s worth asking: how does evolution evolve?