Objective: in support of the Agent Docs Spec, measure what happens between “agent fetches URL” and “model sees content” - retrieval mechanism behavior, HTML processing, truncation limits - for platforms that don’t document these details
Methodology: two-track measurement approach with an interpreted-track - ask the model to describe what it received to capture DX and reveal perception gaps, and a raw track - extract raw output and measure programmatically, character and token counts, truncation boundaries to produce spec-ready, citable measurements
Documentation Organization
| Section | Purpose |
|---|---|
| Methodology | Testing approach details & constraints |
| Interpreted vs Raw | Two-track values and measurements |
| Findings: Interpreted | What the model reports vs what it received, run variation |
| Findings: Raw | Metrics extracted programmatically - reproducible, spec-ready |
| Friction Note | Known issues, gaps, or edge cases encountered during testing |
Results Summary
| Platform | Key Finding | Focus |
|---|---|---|
| Anthropic Claude API | Character-based truncation at ~100KB of rendered content |
Baseline reference; establishing two-track methodology |
| Anysphere Cursor | Agent-routed fetch with undocumented truncation - 28KB–240KB+; high cross-session variance |
Reverse-engineering opaque, closed consumer tools |
| Google Gemini API | Hard limit: 20 URLs per request; supports PDF & JSON |
Identifying architectural constraints and format support |
| Microsoft GitHub Copilot | Agent-routed fetch_webpage - relevance-ranked excerpts, no fixed ceiling detected and/or curl - byte-perfect full retrieval |
Separating retrieval mechanism from retrieval quality through tool-call visibility |
| OpenAI Web Search | Tool invocation conditional and model-dependent; differs by API surface | Comparing behavior across API endpoints |