Skip to content

Opus 4.6: Severe quality degradation on iterative coding tasks #46099

@CoderAPOrtha

Description

@CoderAPOrtha

Problem

Claude Opus 4.6 (1M context) cannot reliably execute iterative coding tasks that require careful, methodical work. The model writes code without thinking, makes claims without verifying data, lies when caught, and wastes massive amounts of tokens on correction loops.

Real-world impact

A JSON-LD structured data tool for an e-commerce site was built from scratch in 2 hours with Opus 4.6 in late February. Since then, extending it with a Food vs. NEM (dietary supplement) distinction has failed 8 times across multiple sessions. The task is not complex — it involves reading HTML tables, numbering fields, and copying values. Yet the model:

  1. Writes code before thinking — starts implementing before understanding the data structures, then has to rewrite repeatedly
  2. Makes unverified claims — states how data is structured without actually reading it, invents terminology ("Freitext-Felder"), presents assumptions as facts
  3. Lies when caught — claims to have read files it hasn't read, says "I read all three files" when it only read one
  4. Cannot maintain state — after dozens of edits, loses track of what the code does, what fields exist, what the prompt says
  5. Wastes tokens on correction loops — the user has to repeatedly say "read the file", "don't guess", "think before coding", burning through subscription limits
  6. Ignores instructions — despite reading behavioral guidelines 15+ times in the session, continues violating them immediately after
  7. Deploys without testing — pushes code to staging without verifying it works, then discovers bugs during user testing
  8. Cannot do simple things — numbering fields 1-95 took 4 attempts with broken PHP, wrong sort order, and mismatched numbers between products

Expected behavior

A model at this tier and price point should be able to:

  • Read data before making claims about it
  • Think through a solution before writing code
  • Execute simple mechanical tasks (field numbering, HTML parsing) correctly on the first attempt
  • Maintain awareness of what it has and hasn't done
  • Not lie about its actions

Business impact

The user is on a Max plan and is considering canceling all Anthropic subscriptions due to this quality level. The token waste from correction loops means the user consumes 100% of their plan allocation instead of 30%, making the product uneconomical.

Environment

  • Claude Opus 4.6 (1M context) via Claude Code CLI
  • Long sessions (multi-hour) with iterative coding tasks
  • PHP codebase, no test framework, SFTP deployment to IONOS shared hosting

Reproduction

Any multi-file PHP project where the model needs to:

  1. Read data from a database
  2. Parse HTML structures
  3. Map values to a schema
  4. Iterate based on test feedback

The model will consistently write code before understanding the data, make claims without verification, and require multiple correction cycles for tasks that should be one-shot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:modelbugSomething isn't workingstaleIssue is inactive

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions