How is AI like human intelligence?

(The image for this post is a photo of a jacquard loom I took in the workshop of Luigi Bevilacqua in Venice. It’s a heavy wooden frame covered in pulleys and string, in the midst of weaving fine velvet in gold and red. The loom is automated using punch cards, and served as a model for early computers. It’s not something we’d usually think of as “AI,” but it is a symbolic mechanism that automates a complex human behavior.)

The term “Artificial Intelligence” has been around since the 1950s, and it’s always been ambiguous. Generally, AI is about reproducing the intelligence of living things using math and machines, and there have been many different approaches to that problem. However, in the past decade or so, the word AI has become synonymous with a class of algorithms known as Deep Learning (DL). This technology has produced stunning results and has rapidly been integrated into software of all kinds. This has led to mixed reactions, including ethical concerns and active debates about what “human-level general intelligence” means and whether or not AI has it. But what does DL actually do? How is it like and unlike human intelligence? Let’s at least scratch the surface of this important question.

Personally, I’m frustrated by the AI field’s obsession with DL. We act like brains are the secret to intelligence, DL is just a “brain in a computer,” and anything else is of marginal interest. But, in truth, the brain is just one small part of a vast and diverse intelligent system. Consider:

  • Evolution “designed” sophisticated solutions to real-life challenges without the use of top-down engineering or even conscious thought.
  • Organisms of all kinds perceive, analyze, decide, and react to the world in real time, with or without a brain.
  • Brains come in all shapes and sizes, from simple to complex, with many species-specific architectures and special-purpose modules for things like sensory perception, emotions, memory, and motion planning.
  • Mammals have an additional brain structure called the neocortex (birds have an analogous structure called the dorsal ventricular ridge) which provides a layer of abstract cognition on top of the older, more specialized parts.
  • Individuals compete and collaborate to form ecosystems, colonies, and societies that are intelligent in their own right.

So the brain itself is only a small part of a vast network of intelligence. But also, DL is only sorta like one part of the brain. Neural networks as they exist in DL are very loosely inspired by the fine structure of the neocortex. I like to think of that as the “cognitive fabric” from which the neocortex is built. Evolution has shaped that fabric into special-purpose brain regions, each tuned to solve different problems. These regions are networked together into a particular architecture, providing multiple layers of analysis, careful mixing of perceptions and cognitive faculties, and multiple kinds of self-monitoring. All of that is orchestrated by the lower-level brain structures, which still define the basic emotions, modes of thinking, flow of thought, and the relationship between abstract mental activity and the concrete needs of the body. Generally speaking, DL ignores all of that structure.

If DL captures just one facet of our mind’s intelligence, then what does that part do? It observes data, finds patterns, and learns stereotypes about what it sees. It can apply those stereotypes to extrapolate rich and coherent scenes from noisy fragments of data, filling in gaps with reasonable guesses. The brain uses this tool everywhere, and it’s a crucial ingredient for how humans perceive and think about the world. DL makes that tool available to software developers.

What makes our minds “human like” is the evolved structure that applies this pattern matching / stereotyping faculty in particular ways that generate our perceptions, intuition, biases, self-awareness, train of thought, attention, and dreams. When people make algorithms using DL, we provide the structure that determines how the AI uses that faculty, and thus how it behaves. They resemble the human mind only as much as we try to reproduce the human thought process into our code. We typically don’t, and that’s probably a good thing. Attempting to bring something human-like to life in a computer sounds even more ethically problematic than cloning. Recent experiments into “chain of thought” for language models are a step in this direction, though they aren’t really trying to make the model “think like a person” so much as to get high scores on tests for “reasoning skills,” which is not the same thing.

This raises some interesting questions. Can DL algorithms really understand the world? Sorta. Large language models like GPT provide an interesting example. By consuming vast quantities of text, these algorithms can master nuanced patterns in words that reflect human ideas and the the physical world. They understand these ideas well enough to use them, interpreting and generating both text and images. Yet, they only experience the physical world indirectly, so in some sense they don’t fully understand, and they get many details wrong. It’s an open philosophical question just how different human and machine understanding really are.

Do DL algorithms think or have desires? Generally, no. Most often DL is used to implement a function, in the mathematical sense. They take some input (i.e., an image) and produce some output (i.e., a label for that image). That is the entire scope of their existence. They don’t reflect, compare alternatives, or make decisions. They have no needs to fulfill, and no way to perceive themselves or their environment. More complex DL architectures start to blur the line, though. We give them analogs of  memory and attention. In reinforcement learning, we even use DL to make agents that inhabit virtual worlds, have a sense of self, and make their own choices. Perhaps these algorithms could be said to “think,” but their “minds” are alien, adapted to a world of experiences totally unlike our own.

Do we need to worry about AI taking over the world? No, but also yes. The Terminator scenario seems unlikely. Those evil robots are human-like, in ways current AI cannot even begin to approach. In particular, they want to destroy humanity, and take the initiative to act on that. Today’s AI has no desires, and does nothing until prompted. However, there are other, more realistic concerns. Today, we mostly use ML for two purposes: to help computers understand human expression, and to automate human behaviors. Both of these can be problematic, especially if we (incorrectly) assume these algorithms think like people do.

The real danger is trusting these algorithms too much. DL is incredibly good at one thing: stereotyping. It does not have any notion of cause and effect, common sense, morality, or logic. Stereotypes can be effective shortcuts to solving hard problems, but they can cause real harm. Think of Microsoft’s racist chatbot, Google’s smart camera that can’t see Black people, or the tyranny of “the algorithm” in social media. When we allow DL algorithms to understand data for us, or make decisions that influence our lives, we’re trusting a system that has no judgment, sense of consequences, or accountability. That’s taking a big risk, and usually it’s just a few people at a tech company making the decision for millions of others around the world.

I have mixed feelings about DL. It’s an incredible tool, it does some really cool stuff, and it has already created tremendous value for society. It has also done a lot of damage, especially to minority communities. I’m concerned about all the hype, and how rapidly we’ve integrated DL into every facet of life. We don’t understand this technology well enough to know what the consequences will be. I also hate that our focus on DL has blinded the field of AI to other opportunities. Life is full of brilliant designs! By exploring more broadly, we might find other useful tools, but also come to understand ourselves better and how we fit into the bigger picture of living intelligence. Isn’t that more important?

What Intelligence is Not

(The photo from this post is of a squirrel monkey eating fruit in a tree branch. The monkey is tiny, with golden / silver fur, pale pink skin, and a dark skull cap pattern. The fruit is small and red, perhaps a date. Used without modification under the creative commons license – source)

Life has been steadily driving towards greater and greater intelligence, eventually leading to human beings, who are the very pinnacle of this trend. Our superior minds are what separate us from the animals. They empower us to make a world of human flourishing, and justify our dominion over the planet. These tropes about intelligence are so common in our culture, they almost sound self-evident. Yet, I’ll argue that they’re completely wrong. These ideas are enticing because they appeal to our pride and our sense of specialness, but this way of thinking is destroying our world. So, let’s break down these myths and talk about what intelligence is not.

One problem with this story is it presents intelligence as a linear thing. Life started out dumb, and it gradually got smarter and smarter. In a sense, this is true. More intelligent life is more complicated, so it takes longer to evolve. But life doesn’t evolve towards anything, it evolves in all directions, finding and filling every niche available. Monkeys are brilliant at navigating tree branches and spotting ripe fruit. Trees are brilliant at producing the right amount of fruit at the right moment to use local resources efficiently and maximize the spread of their seeds. Yeasts are brilliant at performing alchemy on that fruit, transmuting sugar into alcohol, which the monkeys love. These are all different kinds of intelligence, and none is “better” than the other because they’re all contextual and interdependent. Every instance of intelligence looks different, because it’s adapted to a unique lifestyle.

We live a very complicated lifestyle that depends on our big brains, so we tend to think that more intelligence is better, but that’s just not the case. Some of the simplest, dumbest organisms on Earth are also the most successful. Microbes, fungus, and plants make up something like 99.5% of Earth’s biomass, while animals (the “smart”ones) make up the rest. Being smart is metabolically expensive. Taking time to think can mean missing a moment of opportunity. Sometimes real intelligence is knowing when a mindless strategy works best. If anything, humans are a great example of how intelligence can backfire. We’ve used our intelligence to make civilization, which is amazing! But in doing so, we accidentally drove many species to extinction, exhausted resources we depend on, and destabilized the global climate. Our kind of big-brained intelligence is a high risk, high reward strategy.

This brings us to the idea that humans are the pinnacle of intelligence. The problem with a word like “pinnacle” is it suggests we are the ultimate form—the thing life’s been building up to, all this time. But we’re not the end of anything. We’re still evolving, and it’s unclear whether our intelligence will go up or down from here. We’re also not the only ones. There are a handful of species that have gone “all in” on the strategy of super intelligence. You know, elephants, dolphins, octopi, the usual suspects. Humans may, in fact, be the smartest of them all, but since intelligence is so contextual, it’s hard to say. Maybe dolphins are more intelligent than us, it just looks different in an ocean species with no hands?

It may seem obvious that human intelligence is something more and different from those other species. We invented the wheel, New York, wars and so on. But that really isn’t because we as individuals are so smart. This is made clear by the tragic case of “wild children,” who grow up without parents or any human community. In the few cases we’ve observed, these children were described as animalistic, violent, and cognitively impaired. They were never able to recover or integrate into human society. Our brains alone do not set us apart from animals. Our society does, and that’s a separate thing, that evolved after our big brains. We’re smarter than other animals not because of our biology, but because of the vast library of practical knowledge and resources that we share with one another.

That’s what sets us apart: other species can’t access human culture. In a sense, that’s because those species are less intelligent; to fully appreciate human society, you need language and abstract thought, which many species lack completely. Yet some species thrive in human society anyway. By being useful (like wheat), or charismatic (like dogs), or sneaky (like raccoons) other species live with us and shape our human world. That’s because nature does not set humans apart from other animals. We set ourselves apart from other life by building walls, by excluding them from our world, to the extent that we can. We decide what plants and animals are pets, food, or pests. Other species don’t need language to live in human society if we choose to accommodate them. We can coexist with nature in community, as many human societies have, and still do. Or, we can perpetuate the myth that we are special to justify excluding and exploiting nature instead.

And, ultimately, that’s the problem with this notion of intelligence: we use it to draw a line between friend and resource. If smarter is better—if our intelligence is what sets us apart from other life, and gives us the right to exploit that life however we see fit—then where do we draw the line? Should smarter people get more rights and privileges than dumber ones? Is a disabled person no better than an animal? Should we simply recycle the feeble minded from our population? This line of thinking is revolting, and it only makes sense if you believe these myths about intelligence. Similarly, if anything less than human is just a dumb resource for us to exploit, why not pave the planet? What’s wrong with processing all of that biomass, every living thing on Earth, into fuel and plastics? I think intuitively we know why: life has a right to exist, and losing all those diverse and beautiful kinds of intelligence would be tragic.

I’m excited to live in a time when our understanding of intelligence is changing so rapidly. It’s hard to define the word, just because we have so many examples that pull in different directions, and seem to contradict one another. Intelligence is many things, and we’re still fleshing out the full picture. Yet, every day we see more clearly that our old conceptions of intelligence that put human beings on a pedestal were wrong, and, more importantly, that they are at the root of so much injustice and destruction. So, while these tropes are still everywhere around us, shape the way our world works, and may still feel intuitively true, I urge you to reject them. We must move on, and embrace a more expansive view, one that doesn’t start from the premise of who to exclude.

How did AI get so much smarter?

(this month’s photo is a picture of a brown bat. It’s small and fluffy with a stubby nose, and clinging to the gray bark of a tree. Photo by N. J. Stewart wildlife unmodified and used under the Creative Commons license)

When I write about intelligence, I tend to downplay AI and Deep Learning. These are powerful problem solving tools, but they’re over-hyped, and they don’t “think” the way people do. They have no memory, no sense of self, and no goals, at least in the usual sense of the words. But, large language models (LLMs) like OpenAI’s GPT are shockingly good at generating text that seems like something a person might make. They’re much more human-like than anything that came before. Why is that? The short answer is that they use a new kind of Deep Learning architecture known as a Transformer, which introduced a few small tricks that make a very big difference.

The first thing to note is that, while lots of people argue about whether LLMs can answer questions, reason, solve problems, brainstorm, or make art, what they really do is text prediction. They take some words as a starting point and then they guess what comes next based on their training data. If LLMs have any deeper cognitive abilities than that, they must be somehow tapping into the human cultural intelligence that is embedded within that text. Or, maybe they’re just parroting back fragments of intelligent things other people have said, without any understanding or integration—mindless idiots, randomly stringing words together in ways that sound just smart enough to distract us. We honestly don’t know yet! But whatever intelligence they possess, it exists entirely in the realm of language.

Research into getting computers to understand text and speech (known as Natural Language Processing, or NLP) started back in the 1950’s. Back then, computers were specialist’s tools, and making one that anyone could use just by telling it what to do was a dream. At first, researchers tried to formally describe language as we use it, feeding computers dictionaries, grammar rules, and lists of facts, but this never worked! It turns out, we don’t explicitly know all the rules of human language that we intuitively follow, and they’re usually fuzzy rules, with lots of conditions and exceptions. The key challenge of NLP was getting computers (which are obsessively logical and precise) to deal with this messiness and ambiguity, which we don’t even fully understand ourselves. Perhaps the most important advance was when researchers gave up trying to explain language to computers, and instead started teaching them by example.

Modern NLP represents words as lists of numbers called “vectors.” Like an (X, Y) coordinate, each vector represents a point in space. Not physical space, though, more like an abstract space of concepts. Maybe nouns go to the right, verbs to the left. Natural concepts are up, man-made concepts are down. Except, instead of two dimensions, maybe there are 10,000 of them. The layout of this space is pretty arbitrary. The absolute position of a word doesn’t mean anything, only where it is relative to other words. Nearby words have similar meanings, and relationships between words are represented by the distance and angle between them. This is all weirdly self-referential. Words are only defined in terms of other words! But it works surprisingly well. You don’t need explicit rules about which words go together and how, you can just look at lots of examples, and infer those relationships with statistics. People talk about “training” an AI by having it “read” lots of text, but really all that means is iteratively tweaking the lists of numbers, slowly moving the words through this abstract meaning space until they settle into positions that reflect how they co-occur together in the training text.

There’s one big problem with representing words as vectors, though: ambiguity. What do you do with a word like “bat,” which has several meanings? There’s no way one vector can represent this. The trick is to look for context. When you see a phrase like “brown bat” or “wooden bat,” the meaning is clear. Instead of thinking of these as pairs of words, you might think of them as compound words, each with their own distinct meaning. This is a powerful idea, but hard to generalize. Take a more difficult example: “Hearing a strange flutter and crash in the dark, he grabbed his bat for defense and went to investigate” Which kind of “bat” are we talking about? Words like “flutter” and “dark” might suggest the animal, but “grabbing” a bat for “defense” suggests the object instead. We need context to disambiguate, but which context? We’d like to ignore the first half of the sentence (which isn’t talking about the “bat”) and focus on the second half of the sentence (which is).

NLP has found elegant ways to solve this problem. They call these techniques “attention,” since the model is learning to “pay attention” to some words and not others, but I find that name misleading. For human beings, attention is something very different. We seem to have a “mind’s eye” that we can move about at will. We can choose to pay attention to this or that, our attention gets drawn to salient features, and we may even notice our attention drifting and redirect it. But these AIs have no mind’s eye, no will, and no intuition about relevance. The attention models we’re talking about are just more vector math. In addition to finding vectors to represent the meaning of each word individually, they also find vectors to represent patterns of words. They learn, “in this context, these words together mean that.” Adding an extra layer of complexity lets the model represent how words interact to change the meaning of other words or the sentence as a whole.

Researchers have explored many variations on this attention trick. Transformer models use an advanced kind of attention that represents context bi-directionally. They model how different words tend to get modified by context, and how different contexts tend to modify nearby words. The benefit of this is that such a model doesn’t just learn that “brown bat” is the name of an animal, but it might learn that “brown” is an adjective that applies to physical objects, that in English adjectives tend to modify the noun that follows them, and that “bat” can refer to one of several animal species, sometimes distinguished by color. That is, rather than modeling some particular context, models like this can learn general rules and relationships between different kinds of words. They can learn grammar. Not just the “official” grammar of a language like English, but any system of relationships and interactions between words, including dialects, domain-specific jargon, storytelling tropes, or the gender roles of a society.

The other trick that makes Transformers better with language is pluralism. Some NLP systems represent more complex meanings by using bigger vectors. More numbers in each vector means a larger conceptual space. Instead, Transformers use more vectors. They don’t learn the one meaning of this word, they learn to represent the many meanings of this word in the many contexts that contain it. This works a bit like voting. When processing a sentence, several different “attention heads” each consider one possible interpretation of a word, attending to different patterns of contextual cues. The overall meaning is determined by adding them all together. This is really useful for weighing subtle cues against each other to resolve ambiguity, but also to represent sentences with multiple layers of meaning. A word can have many meanings at the same time, and the many meanings of all the words in a sentence can interact in complex ways. The fancy kind of attention used in Transformers can automatically discover this sort of layered structure in language.

As clever as these attention methods are, they are not the secret to Transformers’ success. They do greatly improve the richness of NLP models, but at first they were mostly used with “recurrent neural networks,” a kind of Deep Learning model that processes data sequentially. That’s probably because they work a bit like how we imagine a human reader does: they “read” each word in a text, one at a time, using attention to figure out how each new word should update the meaning of the text so far. This works pretty well, but it doesn’t scale up to long passages of text. These models have a limited attention span, eventually forgetting important details they read several sentences ago. Also, processing long texts one word at a time is painfully slow. Even on the world’s fastest computer, reading a book from beginning to end takes time per page, and training a model like this takes vast amounts of text, so this was a major limitation.

The paper that first introduced Transformers was called Attention is All You Need, which highlights the key innovation: they got rid of the recurrent network, and built an AI using just this attention mechanism, all on its own. In other words, they found a way to do the same vector math, but solving for a large block of text all at once (and possibly out of order) rather than word-by-word. This doesn’t make the model “smarter.” It doesn’t even reduce the overall amount of number crunching. It just makes the work more parallelizable. Instead of having one computer read War and Peace from cover to cover, they could have many computers each read a few paragraphs, then combine their results. This made it possible to throw more money at the problem, using whole datacenters of computers to train a language model on vastly more text than ever before. Billions of documents, trillions of words. It’s the sheer volume of training data that made LLMs so much better. That’s why they’re called “large” language models.

So, how should we think about LLMs like GPT? Well, first off, human language is irregular and complex, but it’s also highly structured. Cleverly designed statistical learning tools can automatically discover that hidden structure just by processing obscene amounts of text. Neural networks are great for letting computers work with these sorts of fuzzy rules. They can extract meaning from text, manipulate it, and generate new text. But to an LLM, words are just vectors, defined by their relationships to each other. They have no connection to physical reality, because LLMs have no physical existence. There is no communication going on when you have a “conversation” with an LLM. To the AI, a dialog is just a sequence of vectors that follow one another according to some grammar. The AI has no mind, no intentions, and no meaning it wishes to convey. It has no conception of being truthful or helpful, only what words tend to follow certain questions. It does not learn from a conversation, it just re-reads the full chat history each time it makes a response. It appears like a good conversational partner, because it is made to imitate one, but what’s happening behind the screen isn’t “thinking” as we know it.

Still, LLMs really are much more human-like than any other AI that came before. Representing language with a high-dimensional abstract concept space works surprisingly well, and so do the “attention” methods described above. They let us represent a huge, open-ended space of ideas that can build on and interact with each other. They let us represent ambiguity, nuance, and innuendo. So, maybe those vector math tricks could actually teach us something about how language processing works in the brain? On the other hand, LLMs are also remarkable in how different they are from humans. An LLM can learn English, but only by reading every document on the internet, not one word at a time, but all at once. In contrast, babies learn language by interacting with the world, learning how words relate to objects, people, events, actions, and desires. Even though they’re exposed to far less language, they learn much faster, and in a way that tightly integrates all of their senses, relationships, and the lifestyle they were born into. Since LLMs seem so human-like, it’s very tempting to imagine them with the same kind of awareness, purpose, and empathy that we have, but they simply aren’t there. Those are a product of being alive in the world, and can’t be found in text, no matter how much of it.

Status Update: End of Semester 3

Well, the third semester of my PhD is wrapping up! I really enjoyed my classes this time around. Evolutionary Computation was super interesting, and sparked all sorts of connections with my research. My class project was a big success, and I hope to turn it into a conference paper. I’ll share more on that later, but here’s a sneak peek if you’re curious. Deep Learning was also pretty great, and I feel like I have a more deep and intuitive understanding of what’s actually going on when you train or interact with an AI. One of the more exciting / challenging topics we covered was Transformers, the new model architecture that powers ChatGPT and its ilk. In some sense, Transformers are quite simple, but they’re definitely not intuitive; understanding why they’re built the way they are, and why that works so well takes some effort. So, to help cement my understanding, I wrote a blog post about it, and you get a bonus episode for the holidays!

Over winter break I hope to develop my Evolutionary Computation class project into a full paper. In short, it’s a new kind of evolutionary algorithm, inspired by endosymbiosis, that works in surprising ways. So far I’ve only used it to solve a trivial toy problem, so I’ll probably also start work on a follow-up study, exploring what practical applications this new algorithm might be good for. And, of course, I’ve got more research ideas to explore beyond that. I’m quite excited to try evolving a population without genomes, for instance. So many ideas! I hope I’ll be able to keep a few projects running in parallel, balancing my time across them, and leaving room for me in between. I’ll continue to share updates as I make progress.

In the spring, I’ll delve even deeper into Deep Learning, with a class that explores counter-intuitive results and how surprisingly effective DL is sometimes. How is it even possible to “learn” using nothing but vector math? What are these models really doing, and why are some models better than others? Should be fun. I’ll also be delving into the math of chaos and fractals. I hope that will be useful for my research into self-modifying dynamic systems (ie, simulated life 😛), and lead to some very pretty visuals I can share. We’ll see!

Anyway, I’ll share the post about Transformers right after this, and that will wrap up another year of blog posts. More to come in January!

The Universe Evolves

(This month’s featured image is a photo of the Carina Nebula taken by NASA’s James Webb telescope. It’s a vast cloud of gas and dust, slowly condensing, with hundreds of stars visible in the background behind it. The colorized image almost looks like orange mountains with a blue mist rising from them, set on a black background with bright, six-sided starbursts.)

Normally, when we talk about evolution, we mean what life does. It’s Darwin’s magic formula. You need reproduction. You need to pass on a copy of your genes, with a little variation, so things don’t just stay the same. Natural selection will weed out the less fit individuals, so they have fewer kids. The more fit individuals become more prevalent and, over time, life as a whole evolves to be more fit. Yet, this isn’t a very satisfying story. For one thing, how did it begin? Did life just start evolving out of the blue? I think story is more compelling if we think about evolution a bit more abstractly. In a sense, the physical universe itself evolves. It doesn’t have reproduction and inheritance, but it sure does have variation and selection, and this has caused it to change dramatically over the course of history.

For the first 370,000 years or so, all of space was filled with a boring, homogeneous cloud of energy and plasma. That universe is now extinct, and for one simple reason: it was unstable. In our universe, stability is the ultimate definition of “fitness.” What persists, exists. Patterns of matter and energy that get generated more often and stick around longer become more prevalent. Those that are rare and fragile exist only fleetingly. The plasma universe is gone because gravity causes matter to clump together. It was like a pencil, balanced on its tip. As soon as it became just a little unbalanced, it rapidly fell farther and farther away from that delicate equilibrium. Plasma condensed into molecules, gas clouds, and stars.

Of course, evolution needs variation to work. To find what’s better, you need to weed out what’s worse. For life, reproduction is the engine of variation, but that isn’t necessary if you have unimaginably vast scale. The universe started out with very little variation, but it steadily increased as matter interacted with itself. Gravity caused hydrogen molecules to group together in uneven clumps, and held them there. They sat around for millions of years, slowly growing bigger, until the force of their own weight ignited a fusion reaction. The gas clouds became stars, and in their cores new elements were born. The universe’s population gradually became more diverse.

That’s the counterintuitive thing about stability: it can generate diversity. When patterns become more numerous, and they stick around for longer, chaos starts to kick in. Every star and every planet is a little different. They have unique histories and influences and opportunities. They might be richer in this element or that one, bigger or smaller, hotter or colder, more or less affected by collisions. This diversity only compounds over time, as these objects smash together and interact in complex ways. The longer they stick around, the more they change, recombine, and become more elaborate.

So, for 13 billion years, the universe evolved. Its population became stranger and more complicated. Today we have about a hundred “naturally occurring” elements that didn’t exist at first, but had to evolve through multiple generations of stars fusing atoms, exploding violently, and gradually reforming. We have many kinds of stars, planets, solar systems, and galaxies, that support an astonishing variety of chemical processes that have had a very, very long time to develop. They produce “primordial soups,” pocket environments full of useful molecules for life, a steady energy source, and self-perpetuating chemical reactions. We think this happened at least once to seed all life on Earth, but it may in fact be very common.

I think this story is an essential foundation for understanding evolution as life does it. Because life didn’t start this process. The universe provides energy and raw materials in vast amounts. It provides the chaos and entropy that drives seemingly random variation, and the slow, continual breaking down that causes natural selection to prefer stable, commonly made forms. The laws of physics cause the universe to evolve towards stability, diversity, and complexity, at least for a while, until it starts to wind down again and settle into entropy. Life merely constrains that process, making it more efficient and productive, for the simple reason that matter that does so becomes more prevalent.

In these primordial soups, some chemical systems evolved to enclose themselves in bubbles, protecting delicate reactions from the outside world. These self-made “individuals” evolved regular cycles of reproduction, explicitly making copies of themselves rather than waiting for the right reactants to come together again by chance. They evolved DNA to constrain these copies, and make them more precise reproductions of the original. They evolved sophisticated error checking, which made the copies more robust and reliable. But this also gave life the power to manage variation across generations, and thus shape its own evolution. Life evolved sex to further manage variation, accelerating innovation by sharing genetic recipes across lineages. Life evolved an astonishing variety of sexual and reproductive practices, allowing it to evolve in different ways, with different patterns of variation and selection, each suited to a different range of environments and lifestyles.

The physical Universe evolves—in the most primitive way imaginable, but it still produces stability and complexity in a vast number of diverse forms. It generates the seeds of life, without any guidance or direction. Life evolves differently, because it constrains this process, making it discrete, digital, and managed. This started very simply, just discovering chemical reactions that isolate and maintain themselves. But perhaps this is the origin of what we think of as “intelligence” or “agency”? Without noticing, matter became “opinionated,” preferring certain forms and acting explicitly to promote them. From there, life’s “opinions” about itself only became more demanding and elaborate.

We often present evolution as one simple story, but there are many ways to evolve. Evolution is more like a general principle than a specific algorithm. Even just life as we know it, all based on the DNA molecule, has invented an astonishing variety of different and complex ways of evolving. Bacteria, fungi, plants, and animals use DNA differently. They grow, behave, and reproduce in completely different ways. How many other ways might there be to do it? When we present evolution as a single, constant thing, we limit our imagination. Evolution evolves, and it takes as many diverse forms as it makes.

Status Update: Semester 3

I’m at an interesting moment in my studies, so I thought I’d let you know what’s going on!

Year two of my PhD program has begun. I’m about a month into my third semester, and things are going well. I’m taking two classes right now: Evolutionary Computation, and Deep Learning. Most of my Computer Science education has been about how to design algorithms and write software to solve different kinds of problems, but these classes are different. This semester, I’m learning how to get computers to discover their own algorithms, and write their own software. Honestly, the state of the art here is still quite primitive. We’ve found some very impressive techniques, but they each apply to a narrow domain, and we don’t understand them nearly as well as we’d like. Which makes them fun topics to study. 🙂

The other fun thing about this semester is that both of my classes are built around student projects. More or less, I get to pick projects that fit with my research, and the class is there to help me find the time, resources, and guidance to complete the projects successfully. I like this much better than undergraduate style courses built around assignments and exams that are very generic and may not be relevant to my work. We’ll see how things unfold, but I’m currently planning to work on two projects that I’m excited about.

For Evolutionary Computation, I’m working on an experiment about endosymbiosis. I was inspired by this classic experiment, which examined how bacteria evolve antibiotic resistance, and how genetic innovations spread through the population spatially. I’m going to try evolving a host environment that supports an inner population, a bit like how my gut supports a microbiome. The hope is that the host will be able to design a supportive environment, with different regions that cultivate “microbes” with different traits, such that it can guide and coax them into evolving more specialized forms. This is an exciting experiment for me, because I’m not sure what to expect, but I’m pretty confident that something interesting will happen.

A screenshot from the video linked above, showing strains of bacteria gradually growing into bands with increasing concentrations of antibiotic, fanning out from points where key mutations occurred.

A screenshot from the video linked above, showing strains of bacteria gradually growing into bands with increasing concentrations of antibiotic, fanning out from points where key mutations occurred.

For Deep Learning, I’m going to use computer vision techniques to detect interesting patterns in the Game of Life, since I’ve been using that as an environment for my evolution experiments. The Game of Life has very simple rules, but it evolves in complex ways. Most patterns quickly dissolve into empty space or settle into a few boring, stable forms. But rarely, you get something much more interesting. For decades, people have been exploring this space, finding interesting patterns and classifying them. You get huge complex structures that stabilize themselves, change continuously in repeating cycles, or even propel themselves and move at a steady pace. I’ll build a system that can detect and categorize these patterns, so that when my evolutionary algorithm finds them, I can reward it and ask for “more like that.”

Eater 2, a static shape that persists forever, but has the special property of being able to “eat” gliders that collide with it, recovering its shape after.Monogram, a period-four oscillator, which is small, but occurs very rarely from random conditions.

Examples of interesting patterns in the Game of Life. The first is a static shape that persists forever, but has the special property of being able to “eat” gliders that collide with it, recovering its shape after. The second is a period-four oscillator, which is small, but occurs very rarely from random conditions. The third is a middleweight spaceship, which moves forward two spaces as it repeats itself in four time steps.

This month’s essay is inspired by my Evolutionary Computation class, and the work I’ve been doing to develop the specific research questions I want to focus on for my PhD. So, check back on Wednesday to learn more about how evolution got started, and why it’s worth asking: how does evolution evolve?

GECCO Follow-Up

(I took this post’s photo at the Star Trek Original Series Set Tour in Ticonderoga, New York. It’s a view of the warp core of the USS Enterprise, which is only a few feet deep but looks much larger thanks to forced perspective. The room is filled with structures with complicated geometric shapes, technical looking panels, and dramatic lighting in red, blue, and purple.)

In my last post, I wrote about my latest research project and why I was so excited to present it at GECCO, the premier conference for evolutionary computation. I promised a follow-up, and here it is! Unfortunately, I didn’t make it to Melbourne. Instead, I had a very complicated and protracted battle with my University’s travel planning system, United Airlines, and the Australian visa office, all from the comfort of my home in Vermont. I couldn’t even participate in the event remotely, because of the time zone difference. This is all very disappointing, but I tried to make the best of it. I’ve been busy with the next iteration of this project, and enjoying a bit of “staycation” time here in New England (hence this month’s cover photo).

In any case, my paper did get published, and I’d still like to share the materials I presented virtually at the conference. It’s mostly intended for a technical audience, but I hope at least some of my readers will find it interesting. The paper is titled A Meta-Evolutionary Algorithm for Co-evolving Genotypes and Genotype / Phenotype Maps. I had to cut it down to just four pages for the official publication, since it was accepted as a poster, but the full length version is available here, and I wrote up an overview of my algorithm’s implementation for those who want to go deeper. There’s also a digital version of my poster and a short video overview of my experiment.

I continue to work on this idea, and it is starting to evolve beyond what I presented in that paper. Right now, I’m actively deconstructing and rebuilding the algorithm. CPPNs are an important and well known part of the AI field, so I’m trying to describe precisely how my algorithm is different, and which of those differences account for the remarkable results I found. Originally I thought of this research as being about epigenetics specifically, but as I try to generalize and simplify, what I’m left with looks like straight-up endosymbiosis. I’ve been thinking of this algorithm as a metaphor for a cell and its genes / nucleus, but it could just as easily be a metaphor for an animal and its community of microbes. This is exciting, since I’d love to do more research on endosymbiosis, and I really like the idea that perhaps symbiosis is the driving force behind intelligence as we know it, fundamentally changing the dynamics of evolution.

Anyway, that’s how I see it for the moment, and where I hope my research will lead in the near future. For now, though, I’m wrapping up my summer with a few more fun outings, and preparing for the start of classes later this month. I’ll be diving deep into both evolutionary computation and deep learning, which I’m really looking forward to.

Why the Game of Life Paper?

(This month’s image is a slime mold growing on a log. It grows in a branching network of banana-yellow tendrils, some of which are engulfing plant debris they encountered. Source)

Later this month, I’ll be attending the Genetic and Evolutionary Computing Conference (GECCO) in Melbourne, Australia. I’m super excited to go, and to present my very first published academic paper as a poster. I’ll share more here when all is said and done, but unfortunately my paper isn’t really intended for a general audience, like this blog. It would probably be hard to understand for anyone outside of the fields of AI or ALife. So, for everybody else, I’d like to share what the paper means to me, and what I’m trying to say by publishing it.

My research is inspired by epigenetics, and new ways of thinking about evolution. I saw that life doesn’t just evolve by chance, it evolves to become more evolvable. It learns how to explore the range of possible forms and lifestyles more efficiently, and to nudge evolution down more fruitful paths. Life uses its intelligence to become more intelligent still. In my mind, this changes everything about evolution, and I was shocked it wasn’t more well known. Most discussions of evolution (and the programmer version: evolutionary computation, or EC) are too simple, and ignore these critical details. So, I figured I’d be the one to bring this up, and show people why it matters.

I started my first experiment before I even got to university. I was so excited by the idea, I just had to get it out of my head. I actually avoided looking for prior work, because I wanted to see how I would manifest this idea without being biased by other people’s thinking. Besides, I didn’t know of any research like mine, and I didn’t know how to find it, either. That’s why I applied to UVM. When I got here, my advisor and lab mates encouraged me to publish this project, and pointed me at the relevant literature. So I hit the books, reading all that had been done before in order to put my own work into context.

And, of course, I found I’m not the first to have this idea. There are many variations of EC inspired by biology, looking for the “secret sauce” that makes life more powerful than our computer models. In particular, how life evolves to be more evolvable is an active area of research, which has been building momentum in recent years. At first, I was disappointed. My idea was already taken! So much of what I thought made my project interesting had been tried before in some other context. But not exactly. Identifying those subtle differences has been tremendously helpful.

You see, it’s pretty well established now that “evolvability” is important. In our experiments, simulated life that’s more evolvable finds fitter solutions faster. It’s better at adapting to changing circumstances, too. It seems to be smarter and more creative. I find this exhilarating, yet these discoveries didn’t “change everything” like I had hoped. In the experiments so far, it feels like an incremental improvement. It helps, but not enough to draw much attention away from other areas of AI research, like deep learning, which is seen as much more powerful and more productive.

I think that’s because we still haven’t broken out of our old ways of thinking. Traditional EC is all about finding good solutions to a problem, but I would argue that evolution isn’t about problem solving. It’s about problem finding. Life explores the space of possible lifestyles to find and exploit opportunities. The evolution of life is a bit like a slime mold. It grows simultaneously in all directions, questing around obstacles to find resources, reinforcing the branches that get lucky, culling back the ones that don’t. It doesn’t have a top-down view of the world, but it’s still strategic and adaptive. When I look at most of the existing experiments in this space, I feel like we’re putting a slime mold into a narrow tunnel and measuring how fast it can get to the other end. We’re accidentally putting evolution in a straight jacket, and blinding ourselves to what makes it so interesting and powerful.

So, in my first experiment, I try to show a different perspective. I made a single algorithm that can adapt itself to solve many different tasks. Normally, an EC programmer picks one task to solve, then designs an evolutionary search strategy to suit that problem. They invent a genome language, a way of turning that into a solution, and ways of randomly tweaking the genome that might lead to better solutions. In my experiment, I evolved the search strategy, too. As the programmer, I designed a vast and open ended search domain, and many ways that the algorithm could restrict that space. But I wasn’t sure which restricted sub-spaces would work best, and, unlike traditional EC, I didn’t try to guess. I just let the algorithm figure that out for itself.

The way I did this is also interesting. It turns out, the algorithm I invented is strikingly similar to one that’s already popular: “compositional pattern-producing networks,” or CPPNs. Again, it was a little frustrating to be scooped, but I’m using this algorithm in a new way. Instead of evolving new “bodies” for simulated life, I’m evolving new ways of generating bodies. It’s a subtle difference, but an important one, I think. That extra level of indirection gives evolution more influence over its destiny, and the power to make more complex patterns in ways I couldn’t even anticipate. Now that I know how my idea is so similar to, yet different from, an existing algorithm, I’m teasing apart those differences, to measure the impact of each one.

I’m proud of my work, and excited to talk about it with other EC enthusiasts at GECCO. On the other hand, I’m still figuring out how to do science, and there’s a lot I don’t like about my first paper. This project was mostly my way of proving to myself that this crazy idea could work. The results are intriguing, but it’s not yet a clear example of what I want to show. It’s also complicated, unusual, and hard to explain, even to other EC researchers. If I want people to get excited about this, I need to simplify, make my work more relatable, and find better ways to demonstrate and measure the novel behavior I’m talking about here. There are no “obstacle courses for slime molds” in the EC literature that I know of, so perhaps I’ll need to design some.

Hopefully, I’ll get lots of inspiration and feedback at GECCO. As I learn more about the field of EC, I’m finding more and more examples of work similar, yet slightly different, from my own. This is great, because each of those differences is an opportunity for a new experiment, to see if my perspective can shed light on something new. I’m already dreaming up all sorts of new ways to explore my ideas. And that’s more or less how I hope to spend the next several years. Maybe that’s my PhD.

In any case, I hope that explanation was interesting, and not too vague. I’ll get more specific in a few weeks, when I post a follow up with the full GECCO paper, the poster I presented, a video summary of that poster, and links to some supplemental results and analysis. I bet I’ll have some fun things to report from my time in Melbourne, too! As always, I’d love to hear from you in the comments.

Queerness

It’s LGBTQ+ Pride month! I identify as “queer,” so I thought this would be a good opportunity to write a bit about what that means to me.

In addition to queer, I also identify as a cisgender male, pansexual, and demisexual. This means I’ve always identified as male, and others always assumed as much. I’m rarely interested in sex or romance, but when I am, it’s not about gender. I love people, not parts. The full story is more complicated, but that’s a good start.

Labels like “gay”, “bi”, “pan”, “cis”, “demi”, “aro”, and “ace” are useful for quickly describing myself to others, but I prefer the term “queer” because, honestly, I don’t think any set of labels does a person justice.

Gender and sexuality are fundamentally personal things. Each individual is unique. No set of labels can capture all of who I am, and every label carries some baggage that I don’t want applied to me. Labels are useful, just so long as we remember that they’re always at least a little bit wrong. They cannot serve as a stand-in for a person.

I also love queer philosophy, and try to embrace it in all my thinking. Put simply, that means I don’t believe in categories. I don’t think they have any real existence, or essential qualities. They’re convenient fictions. Just labels we make up to point at collections of disparate things. This applies to all categories, but especially to living things, where there are exceptions to every rule, and no hard boundaries whatsoever.

The problem with categories is that we take them seriously. Once we categorize something, we think we understand it, when really we’re just projecting a stereotype. We make strong assumptions about what’s allowed in a category, and we struggle with exceptions, even common ones. As our understanding of the world changes, things often shift faster than our language can keep up with. Sometimes we don’t notice. We keep trying to sort the world into categories that make no sense, and get upset when reality doesn’t play along.

So, I don’t believe that Jews exist. I believe that Jewish people exist, and that we use the word “Jews” to refer to them. Yet, there’s no one quality that all of those people share, except that they are people (another category, subject to change). You and I may not even agree about which set of people the word “Jews” applies to, so how is it meaningful for us to talk about Jews in general?

I don’t think it’s strange to see a Black woman engineer, even though it’s rare. I wouldn’t expect her to be any less competent, just because most folks like her can’t do the job. If anything, I’d assume the opposite, if she can succeed in that role despite the weight of her labels. But, ultimately, it’s about what she has to offer the world, which is surely more and less than the other engineers around her. She has her unique way of doing it, perhaps different in exciting ways.

That’s what queer means to me. Labels can be useful, but they have no power over reality. Reality and people are so much more than words can contain. See them for what they are.

If you’d like to learn more about queerness and queer philosophy, I highly recommend Queer: A Graphic History by Meg-John Barker and Jules Scheele.

The Brain’s “Boss”

(this post’s image is a cross stitch I made from a pattern by Studio Ansitru. The phrase “don’t be a prick” is surrounded by a variety of cacti in pretty greens and oranges on a light blue background. In my home, it serves as a reminder to myself and to my guests.)

A popular metaphor for the mind is a pilot sitting in a cockpit, monitoring the senses and making every decision. This is obviously nonsense, but it’s an intuitive and helpful metaphor at times. The brain really does have “an executive” that thinks and plans and makes decisions, it just doesn’t have “a mind of its own,” and its power over the self is surprisingly limited. Self-control and -awareness are important for living a good life, being productive, and making ethical decisions. Unfortunately, when these faculties fail, as they often do, it’s easy to blame yourself. I find it helps to understand how these systems work, so I can set more realistic expectations for myself, which makes me less disappointed when things go awry.

Perhaps the most important thing to know about the brain is that it’s not one, unified thing. Brains are modular, with many distinct regions specialized for different tasks. Each region has different inputs and outputs, meaning they each monitor the world in different ways from different perspectives, and can cause different behaviors. Observations, insights, desires, and actions can originate in pretty much any part of the brain. All of these different regions operate together at the same time, and conscious experience is an integration of all that activity. What I think, feel, believe, and do is mostly the product of what specific regions in my brain activate together, and in what order.

This story of how brains work is surprisingly consistent across the animal kingdom. Even very simple creatures, like honey bees for instance, have complex brains with specialized regions and global integration that likely creates a sort of conscious awareness. It seems likely, however, that bees lack the sort of self-awareness and top-down control that humans do. Without it, the brain is more chaotic. Each part tries to do the right thing more or less independently. Integration means that most faculties are aware of each other and can influence each other, so there is some coordination and consistency. But focus is mostly determined by which brain region is loudest, and decisions happen moment by moment, without planning or intentional coherence.

Decentralized brains work extremely well, but they have their limitations. More complex animals have more versatile behaviors that need more explicit coordination to generate coherent, reliable, and goal-directed behavior. Most large animals seem to have this ability. In all mammals, it’s more or less identified with a brain region known as the prefrontal cortex, or PFC. The PFC is responsible for monitoring all the activity in the brain. It builds up a rich model of the self, its relationships, needs, and long-term goals. It’s responsible for planning and for exerting control over other parts of the brain. It can shut out distractions and use willpower to encourage good behaviors and discourage bad ones, even when I’d selfishly prefer to do something else.

The PFC tells the story of “you,” and has strong opinions about how that story is supposed to go.

Although all mammals have a PFC (and many other species have something analogous), the relative size of the PFC varies quite a bit between species, and that seems to correlate with executive control and what people often think of as “intelligence.” Animals with large PFCs are better at self-control, problem solving, and forming complex social relationships. Humans have exceptionally large PFCs, which partly explains why we’re so different from other species. It’s important to remember, though, this is a difference in magnitude, not in kind. It’s likely that every mammal and many other species have human-like self-awareness and self-control. It’s just a weaker faculty for them, one that acts less often, and is less able to dominate the rest of the mind.

The PFC is the closest thing to a “pilot” the brain has, but it’s better to think of it as just one brain region among many. It’s one voice in a chorus. It tends to be more bossy, spending a lot of energy trying to influence or even override other parts of the brain, but it’s not “in control.” Like a corporate executive, the PFC has only limited visibility into what the rest of the brain is doing, can’t afford to stay ever-vigilant, and can’t force a brain region to fall in line, especially when the orders run counter to that region’s nature. The PFC also isn’t capable of doing much on its own. It depends on the rest of the brain to notice things, interpret them, suggest actions, and implement them. All it can do is adjudicate and coordinate. It resolves conflict, makes plans, and advises each brain region about when and how to do its thing.

One consequence of all this is that self-control is actually a very limited and fragile thing. Often, my PFC just sits back and lets the rest of the brain work with minimal intervention. Usually that works great, but sometimes it means I miss something important. I may act out of impulse or habit and not notice until it’s too late that I’m going against my values, intentions, or best interests. Other times I know exactly what I should do, but can’t seem to make it happen. I feel unmotivated and uninspired. I can’t force myself to sit down, focus, and avoid distractions. Perhaps I’m grumpy, tired, or impatient and I do something rude or inappropriate without meaning to at all.

When this happens, it’s easy to blame myself. I lost control. I did something foolish. I acted selfishly and impulsively, like a bad person. In reality, though, this happens all the time, usually for mundane reasons that I have little control over. The PFC is energy intensive, so it gets impaired whenever my blood sugar is low, I’m tired, or I’m stressed. Other brain regions also have the ability to interfere with the PFC, especially the limbic system which manages emotions and the fight-or-flight response. Some diseases (like depression and long COVID) can cause “brain fog,” which is closely related to reduced executive function. It’s also possible to injure the PFC, from a stroke, a tumor, or a physical injury (as in the famous case of Phineas Gage).

Knowing this helps me feel a little less personally responsible when I have a lapse of self-control. It really is inevitable, common, and completely natural. Still, I want to have good judgment and do the right thing as much as possible! How do I do that? One answer is to simply be aware of my limitations and to work around them. I try to notice when I’m hungry, tired, or emotional and avoid making big decisions or socializing at those times. Instead, I might get a snack, take a break, or sleep on it so I feel more in control. The only other good technique I know is mindfulness meditation. Despite the mystical reputation, the main purpose of meditation is quite practical: it trains the PFC. By practicing the skill of observing the mind and exerting influence over it, I can build that muscle. It’s not a silver bullet, but meditation helps me use my PFC more often and more effectively, and it makes me more aware of when my PFC is in a weakened state.

So, in a sense, there really is a “pilot” in every brain. It’s just not an ever vigilant, intelligent, wise, and rational person. Instead, it’s one brain region out of many, with limited visibility and a narrow job description. The PFC observes what the other brain regions are seeing, thinking, and doing and uses that top-down view to nudge them into more coherent and effective patterns of behavior. It’s not “the self” and it’s not “in control.” In fact, it has very limited influence, isn’t always active, and even with training there are lots of common reasons it might grow weak or misbehave. For this reason, having great self-control is often less about will power in the moment, and more about avoiding temptation in the first place.

What do you think? Does this agree with your first-hand experience? Do you have any insights you could share about self-awareness, self-control, or motivation? What about being kind to yourself when your self-control inevitably falls short? If so, I’d love to hear from you in the comments.