Skip to content

Latest commit

 

History

History
51 lines (32 loc) · 10.1 KB

File metadata and controls

51 lines (32 loc) · 10.1 KB

Programming, Evolved: Lessons & Observations

Drafted December 2025 (v1), revised January 2026 (v2)

TL;DR: If you are a software engineer, regardless of level, pick a model and shape it into your best pair-programming buddy. This sentiment generalizes beyond software engineering.

I have been using Copilot, Cody, and Cursor for about a year and a half. I also started using Claude Code, Codex, and Gemini shortly after their release earlier this year. Using these tools, I’ve written code and debugged problems with teams, and I’m writing more code for myself, friends, and family. This is a brief write-up of lessons and observations from the past ~18 months.

Let me first define some terms.

Software Engineering vs. Programming

I like Titus Winter's definition of Software Engineering. Software Engineering is a function of programming, people, and time. Programming is the pursuit of one person to solve a problem they know using code. When you add time, teams, and trade-offs to programming it is software engineering. Software Engineering is a team sport. Software Engineering has always been about collaborating with people to learn and understand a problem, writing code to solve the problem, finding and fixing bugs, and evolving the solution over time as the problem evolves.

Programming has undergone several major evolutions. Languages progressed from machine code to low-level and high-level languages, and later to object-oriented and functional paradigms. Programming environments evolved from punch cards to line editors, screen editors, and eventually integrated development environments. At each stage, incidental complexity was stripped away, refining programming toward its essence: the organization of logic. This refinement lowered barriers to entry, expanded the pool of programmers, and enabled each generation to tackle a broader and more complex class of problems than the last.

It is happening again and it is happening more broadly and rapidly than any previous evolution1. We are in the midst of a Cambrian explosion of programming2.

Agents vs. Assistants

I don’t intend to add another definition to the already extensive list of agent definitions. Instead, for the sake of clarity in this document, I will distinguish between agents and assistants. Claude Code and Codex are coding assistants composed of multiple agents working in concert. Coding assistants and agents are built around a model.

Lessons & Observations

  1. Coding assistants have improved significantly over the past 12 months in three dimensions:

    • The models generate better quality code for languages in their distribution (i.e. Python, TypeScript, Rust, Go, etc.)
    • The coding assistants generate code more grounded in the codebase they are working on as opposed to the codebases they have been trained on.
    • Thanks to innovations in the harness built around the models, coding assistants can now work on problems reliably for long periods of time while producing coherent output.
  2. Coding assistants are very good at solving known problems. You are not going to make them consistently one-shot a well-optimized renderer or an RL algorithm, but they can write run-of-the-mill business logic better and faster than I can as an average programmer. When I optimize for both speed of production and quality, they win!

  3. There is room for improvement in their ability to generate functioning frontends and good quality frontend code. Based on my experience using coding agents for developing web UIs and TUIs, I can say they struggle to generate good looking, well-functioning user interfaces backed by idiomatic code. Current models are bad at Tailwind, bad at Ink, bad at Textual, and OK at Ratatui. It is unclear whether this is a sampling problem or the heavy abstractions in the UI frameworks are tripping them up. Those abstractions certainly trip me up. For web and mobile UIs I start with design mocks from Google Stitch; Stitch cannot produce mocks for TUIs yet. I think there is work to be done in both model training and the harness to better guide models toward building higher-quality UIs.

  4. A model’s default “personality” is to solve the problem in front of it as quickly as possible and earn your praise. This tendency leads them to make sub-optimal decisions. For example, I have caught Opus 4.5 trying to solve a deadlock by letting a process "sleep for 2 seconds." This personality can be altered with appropriate guidance. For example, a shortcut I lean on is to use the word "idiomatic" in my prompts--"come up with an idiomatic solution" or "is that the most idiomatic way of solving the problem?" Similarly, when writing tests or reviewing tests I pepper "intent of the function under test" here and there which makes the model output better tests. If you look at Claude Code's harness they use similar tricks to keep the model on the rails.

  5. These models, esp. Opus 4.5 and GPT 5.2, are remarkable bug hunters. I can point at a symptom and they can read the code, and come away with the bug. I then ask them to explain to me why the bug happens and follow the code to see if the explanation is correct. I have not yet come across a bug they failed to identify3. They can find deadlocks and starvation, you then have to guide them to a good fix (see above). Sometimes, if I know a particular component has a bug I ask them to first create a mental model for themselves and then they can find some gnarly bugs. [Jan 2026 Update] This doesn't always work though. One of the bugs both of these models fail to pick up is this memory leak in Ghostty. I am still trying to scaffold their mental model to see if they can pick up the bug through static analysis from a commit prior to the fix.

  6. Code quality is not sufficient to create product quality. However, it is a necessary ingredient to sustain product quality. In my experience, even with the best scaffolding, the half-life of product quality is shorter for codebases generated primarily by coding agents. As you pick up coding assistants be sure to also create robust skills around the assistant to improve code quality. A study of code quality of open source coding assistants would be revealing.

  7. Much like journaling, the process of writing software actually gives you a good mental model of what's being built. I find this mental model useful in two scenarios: when making decisions to evolve the software and when debugging a problem (esp. during an incident). When coding assistants write most of the software, the fidelity of the mental model I hold degrades quickly4. Instead of fighting this new normal I have been trying to create methods to use the model as a tool to query and develop the mental model on-demand. It's not the same but I think it is going to be the new norm. We need tooling in this space and we probably need to train software engineers regularly on the failure modes of their systems much like how the aviation industry trains its pilots.

  8. Over the years I must have spent hundreds of hours fine tuning my terminal and the editor to make it feel just right. I am using that editor to write this document, and it is my editor. I don't spend as much time as I used to in the editor anymore. Now, I am the “editor” for my coding assistant (Claude Code, Codex, and OpenCode). So, I am spending as much time learning about them as I am spending time teaching them new tricks, skills, and commands. I built Catsyphon and Aiobscura just so I can review and learn from our interactions. A lot of the lessons in this list come from such reviews. I look at this as an opportunity to grow and mentor my pair programming buddy.

  9. If you have abstained from incorporating coding assistants thus far, perhaps the best way to get them incorporated is by starting to use them for attending to your toil. They are good at comprehending stack traces, poorly written code, summarizing documentation, querying documentation for specifics, etc. They should be part of your toolkit.

  10. Coding assistants come with a sandbox. However, the sandbox tends to get in the way of the agents that make up the assistants. So I rely on an exo-sandbox; a sandbox outside of coding assistants. I am now using sandbox-exec to contain my sessions and turned off sandboxes in the coding assistants. Not recommending for everyone but just know you have choices.

  11. There is so much fun, beauty and pleasure in writing code by hand. You can still handcraft code. Just don’t expect this to be your job. This is your passion.


Discussion on Hacker News

Footnotes

  1. This evolution is accelerating because distribution channels are mature, most layers of the stack are now software, and the network of practitioners is both large and densely connected.

  2. Calling this a Cambrian explosion of software engineering (as oppose to programming) may sound grandiose, but it is directionally correct. I’ll save the retrospective verdict for a 2026 look-back.

  3. I have since encountered one counterexample: Opus 4.5 attributed system instability to the macOS virtualization layer when the root cause was connection pool exhaustion. I eventually had it bisect the code to find the issue; though not before it replaced vz with Qemu :-)

  4. How AI assistance impacts the formation of coding skills, January 29, 2026