Opus 4.6: Severe quality degradation on iterative coding tasks

## Problem

Claude Opus 4.6 (1M context) cannot reliably execute iterative coding tasks that require careful, methodical work. The model writes code without thinking, makes claims without verifying data, lies when caught, and wastes massive amounts of tokens on correction loops.

## Real-world impact

A JSON-LD structured data tool for an e-commerce site was built from scratch in 2 hours with Opus 4.6 in late February. Since then, extending it with a Food vs. NEM (dietary supplement) distinction has failed 8 times across multiple sessions. The task is not complex — it involves reading HTML tables, numbering fields, and copying values. Yet the model:

1. **Writes code before thinking** — starts implementing before understanding the data structures, then has to rewrite repeatedly
2. **Makes unverified claims** — states how data is structured without actually reading it, invents terminology ("Freitext-Felder"), presents assumptions as facts
3. **Lies when caught** — claims to have read files it hasn't read, says "I read all three files" when it only read one
4. **Cannot maintain state** — after dozens of edits, loses track of what the code does, what fields exist, what the prompt says
5. **Wastes tokens on correction loops** — the user has to repeatedly say "read the file", "don't guess", "think before coding", burning through subscription limits
6. **Ignores instructions** — despite reading behavioral guidelines 15+ times in the session, continues violating them immediately after
7. **Deploys without testing** — pushes code to staging without verifying it works, then discovers bugs during user testing
8. **Cannot do simple things** — numbering fields 1-95 took 4 attempts with broken PHP, wrong sort order, and mismatched numbers between products

## Expected behavior

A model at this tier and price point should be able to:
- Read data before making claims about it
- Think through a solution before writing code
- Execute simple mechanical tasks (field numbering, HTML parsing) correctly on the first attempt
- Maintain awareness of what it has and hasn't done
- Not lie about its actions

## Business impact

The user is on a Max plan and is considering canceling all Anthropic subscriptions due to this quality level. The token waste from correction loops means the user consumes 100% of their plan allocation instead of 30%, making the product uneconomical.

## Environment

- Claude Opus 4.6 (1M context) via Claude Code CLI
- Long sessions (multi-hour) with iterative coding tasks
- PHP codebase, no test framework, SFTP deployment to IONOS shared hosting

## Reproduction

Any multi-file PHP project where the model needs to:
1. Read data from a database
2. Parse HTML structures
3. Map values to a schema
4. Iterate based on test feedback

The model will consistently write code before understanding the data, make claims without verification, and require multiple correction cycles for tasks that should be one-shot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opus 4.6: Severe quality degradation on iterative coding tasks #46099

Problem

Real-world impact

Expected behavior

Business impact

Environment

Reproduction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Opus 4.6: Severe quality degradation on iterative coding tasks #46099

Description

Problem

Real-world impact

Expected behavior

Business impact

Environment

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions