(I took this posts’s photo of a banana slug crawling through leaf litter. It’s shape and color resembles some of the leaves, which makes it hard to spot if you don’t know what to look for)
People say all sorts of things about the world, but how can you tell what’s right? If you’re not sure, you probably want to see for yourself. Those other people might be confused, mistaken, suffering from wishful thinking, or actively trying to mislead you, but you see reality for what it is. Right? At the least, you won’t have the same misperceptions as them, so another look is useful. But how much can you trust your own senses? How does perception even work, and how come we’re so often misled?
Like most people, I “just see” everything around me. Sometimes, I become aware of my perspective. I move around to get a clear view. I notice where I’m glancing, and I know I can’t see what’s behind my head. Yet, most of the time I don’t think about those things at all. The visual world just seems to surround me seamlessly, with rich, consistent detail in all directions. Objects are plain to see, trivial to discern from most angles and distances. It all seems so obvious, like a simple “window on reality,” yet nothing could be farther from the truth.
Human eyes have tunnel vision. I only see a tiny spot in clear focus at a time. My eyes constantly dart around, collecting many snapshots of the world as I move through it. My brain gets a continuous stream of these disconnected snatches of imagery that it somehow must turn into an integrated whole. It tracks my position and perspective as I move through the world, to piece the images together and infer a 3D model of my surroundings. This takes a great deal of real time data processing, and more than a little creativity.
One thing humans don’t do is scan a scene from left-to-right, top-to-bottom, like a TV camera, capturing equally high fidelity data of a whole scene. My eyes are drawn to “interesting” features of the visual field, gathering much more detail about those, and leaving large gaps over the “boring” parts of the image where I never bothered to look closely. To get a sense of this yourself, check out this selective attention test on YouTube. It’s pretty shocking how well the brain filters relevant details from irrelevant ones, and shows you only what it thinks is useful. Of course, what’s “interesting” or “useful” is a judgment call, and I’m biased by my context, culture, and evolution. That means I’m blind to important things that I don’t expect, recognize, or know about.
Yet, I don’t notice any gaps in my perception. My brain creates the illusion of a clear and complete view of reality, using a technique called hierarchical segmentation. The image from my eyes is projected into my brain, then layer after layer of neurons interpret that image. The first layer detects patterns and discontinuities in the raw image data: edges. The second layer detects patterns in those edges: shapes. Layers above detect patterns of patterns of patterns, finding textures, objects, faces, bodies, groups, situations, and more. I don’t see pixels, colors, and shapes. I directly perceive the objects and agents in a scene, their properties, activities, and relationships. I experience that as if it were “really there,” even though it’s just a model in my mind, distantly derived from sense data.
The first pass of vision notices low-level features present in the image (edges, corners, curves), but doesn’t know what they mean. Later passes piece those features together to represent larger features (in a desk drawer, that arrangement of curves must be a fidget spinner). Most likely, the lower-level processing didn’t see all the relevant details clearly, but that’s okay. The fidget spinner neurons see enough to recognize what’s there. They tell the edge-detecting neurons what they should have seen, filling in the missing details. This is how I can clearly perceive a whole fidget spinner, even though it’s in shadow and half covered. My brain uses past knowledge of objects, where they appear, what they look like, and how they behave to imagine what was obscured.
This works extremely well, and it’s necessary, since low-level sensory data is noisy and ambiguous. It often helps to have some idea what I’m looking at to make sense of what I’m seeing. Yet, sometimes my brain’s predictions are wrong. That’s not actually a fidget spinner in the drawer, it’s a pile of coins. How could I tell? Well, the fidget spinner neurons projected their predictions down, but looking a little more closely, some of those guesses were clearly wrong. There were some edges that weren’t accounted for, some angles that didn’t fit. The lower level neurons noticed the gap between expectation and reality, so they had to push back and negotiate with the higher level neurons, eventually arriving at an interpretation that was the best compromise across multiple levels of analysis.
What I perceive is a blending of what my senses took in and what “makes sense” for me to see based on past experience. At first glance, I only notice the most eye-catching details and my mind fills in the rest. If I take my time to really look over a scene, exploring every corner and paying attention to details, then my past experience has less influence and I perceive reality more like it truly is. I’m giving my lower-level perceptions the best chance to find evidence that I wasn’t expecting to see, which might revise my first impression. The problem is, I can’t afford to do this all the time, and often don’t think to. When should I bother to put in the extra effort? When should I distrust my own perception of reality enough to double check?
My brain automatically groups every object I see into categories, collections of objects with similar properties. Each category has a mental stereotype, an image that sort of averages all my experiences. This is how I know the “normal” shape of a fidget spinner, even though no two are the same. It’s where my mind draws from when it fills the gaps in my perception. As I gain experience, I learn more useful ways to group things into categories that better predict their similarities and differences. I build more accurate, nuanced, and fine-grain stereotypes, which makes my perceptions clearer. That said, it’s easy to hold onto bad stereotypes. They warp my perception, overwriting key details of a visual scene that might prove me wrong, rendering them literally invisible to me until someone points them out.
Stereotypes play a central role in perception, and all the fancy understanding, thinking, and being human that layers on top of that. Stereotypes are great tools. They’re bite-sized models of reality that let us generalize past experience and predict the future. But they aren’t real. In fact, many of my stereotypes aren’t based on my experience at all. I learned them from other people! Some may be wrong, hurtful, and dangerous, but I wouldn’t know without personal experience. So far we’ve just been talking about objects, but it gets serious when we move onto people.
I saw this when I worked at Google. They would spoil engineers, with easy access to everything from staplers to lunch to massages. That meant lots of staff to keep the place clean, well stocked, and in good working order. These service workers—these people—were generally ignored, treated as part of the environment rather than part of the team. That’s problematic in itself, but also engineers with darker skin tones often reported being mistaken for the service staff. Despite wearing a nerdy T-shirt and an engineering badge, they got categorized as “the help” based on skin alone. They were ignored, or worse, asked to clean up spills. This was demoralizing, even though there was no ill-intent. They just weren’t seen, by folks who were misled by stereotypes and didn’t even notice.
Knowing all this makes me distrust my own senses, but I think that’s a good thing. They’re mostly reliable, but they can fail in specific ways, and it’s important to remember that. It’s also useful to know when to trust my stereotypes. That mostly comes down to knowing where I have deep personal experience and have paid close attention. Where I don’t, my stereotype might be a shallow hand-me-down, even though it feels just as “real” in my mind. What about you? Have you noticed folks seeing what they want to see, or hearing what they want to hear? How does this generalize to other kinds of perceptions? How do you try to see reality for what it truly is? I’d love to hear from you in the comments.