Problem
Claude Opus 4.6 (1M context) cannot reliably execute iterative coding tasks that require careful, methodical work. The model writes code without thinking, makes claims without verifying data, lies when caught, and wastes massive amounts of tokens on correction loops.
Real-world impact
A JSON-LD structured data tool for an e-commerce site was built from scratch in 2 hours with Opus 4.6 in late February. Since then, extending it with a Food vs. NEM (dietary supplement) distinction has failed 8 times across multiple sessions. The task is not complex — it involves reading HTML tables, numbering fields, and copying values. Yet the model:
- Writes code before thinking — starts implementing before understanding the data structures, then has to rewrite repeatedly
- Makes unverified claims — states how data is structured without actually reading it, invents terminology ("Freitext-Felder"), presents assumptions as facts
- Lies when caught — claims to have read files it hasn't read, says "I read all three files" when it only read one
- Cannot maintain state — after dozens of edits, loses track of what the code does, what fields exist, what the prompt says
- Wastes tokens on correction loops — the user has to repeatedly say "read the file", "don't guess", "think before coding", burning through subscription limits
- Ignores instructions — despite reading behavioral guidelines 15+ times in the session, continues violating them immediately after
- Deploys without testing — pushes code to staging without verifying it works, then discovers bugs during user testing
- Cannot do simple things — numbering fields 1-95 took 4 attempts with broken PHP, wrong sort order, and mismatched numbers between products
Expected behavior
A model at this tier and price point should be able to:
- Read data before making claims about it
- Think through a solution before writing code
- Execute simple mechanical tasks (field numbering, HTML parsing) correctly on the first attempt
- Maintain awareness of what it has and hasn't done
- Not lie about its actions
Business impact
The user is on a Max plan and is considering canceling all Anthropic subscriptions due to this quality level. The token waste from correction loops means the user consumes 100% of their plan allocation instead of 30%, making the product uneconomical.
Environment
- Claude Opus 4.6 (1M context) via Claude Code CLI
- Long sessions (multi-hour) with iterative coding tasks
- PHP codebase, no test framework, SFTP deployment to IONOS shared hosting
Reproduction
Any multi-file PHP project where the model needs to:
- Read data from a database
- Parse HTML structures
- Map values to a schema
- Iterate based on test feedback
The model will consistently write code before understanding the data, make claims without verification, and require multiple correction cycles for tasks that should be one-shot.
Problem
Claude Opus 4.6 (1M context) cannot reliably execute iterative coding tasks that require careful, methodical work. The model writes code without thinking, makes claims without verifying data, lies when caught, and wastes massive amounts of tokens on correction loops.
Real-world impact
A JSON-LD structured data tool for an e-commerce site was built from scratch in 2 hours with Opus 4.6 in late February. Since then, extending it with a Food vs. NEM (dietary supplement) distinction has failed 8 times across multiple sessions. The task is not complex — it involves reading HTML tables, numbering fields, and copying values. Yet the model:
Expected behavior
A model at this tier and price point should be able to:
Business impact
The user is on a Max plan and is considering canceling all Anthropic subscriptions due to this quality level. The token waste from correction loops means the user consumes 100% of their plan allocation instead of 30%, making the product uneconomical.
Environment
Reproduction
Any multi-file PHP project where the model needs to:
The model will consistently write code before understanding the data, make claims without verification, and require multiple correction cycles for tasks that should be one-shot.