Table of Contents | Agent Ecosystem Testing

Objective: in support of the Agent Docs Spec, measure what happens between “agent fetches URL” and “model sees content” - retrieval mechanism behavior, HTML processing, truncation limits - for platforms that don’t document these details

Methodology: two-track measurement approach with an interpreted-track - ask the model to describe what it received to capture DX and reveal perception gaps, and a raw track - extract raw output and measure programmatically, character and token counts, truncation boundaries to produce spec-ready, citable measurements

Documentation Organization

Section	Purpose
Methodology	Testing approach details & constraints
Interpreted vs Raw	Two-track values and measurements
Findings: Interpreted	What the model reports vs what it received, run variation
Findings: Raw	Metrics extracted programmatically - reproducible, spec-ready
Friction Note	Known issues, gaps, or edge cases encountered during testing

Results Summary

Platform	Key Finding	Focus
Anthropic Claude API	Character-based truncation at ~100KB of rendered content	Baseline reference; establishing two-track methodology
Anysphere Cursor	Agent-routed fetch with undocumented truncation - 28KB–240KB+; high cross-session variance	Reverse-engineering opaque, closed consumer tools
Google Gemini API	Hard limit: 20 URLs per request; supports PDF & JSON	Identifying architectural constraints and format support
Microsoft GitHub Copilot	Agent-routed `fetch_webpage` - relevance-ranked excerpts, no fixed ceiling detected and/or `curl` - byte-perfect full retrieval	Separating retrieval mechanism from retrieval quality through tool-call visibility
OpenAI Web Search	Tool invocation conditional and model-dependent; differs by API surface	Comparing behavior across API endpoints