<|°_°|> Agent Ecosystem Testing GitHub ↗

Objective: in support of the Agent Docs Spec, measure what happens between “agent fetches URL” and “model sees content” - retrieval mechanism behavior, HTML processing, truncation limits - for platforms that don’t document these details

Methodology: two-track measurement approach with an interpreted-track - ask the model to describe what it received to capture DX and reveal perception gaps, and a raw track - extract raw output and measure programmatically, character and token counts, truncation boundaries to produce spec-ready, citable measurements

Documentation Organization

Section Purpose
Methodology Testing approach details & constraints
Interpreted vs Raw Two-track values and measurements
Findings: Interpreted What the model reports vs what it received, run variation
Findings: Raw Metrics extracted programmatically - reproducible, spec-ready
Friction Note Known issues, gaps, or edge cases encountered during testing

Results Summary

Platform Key Finding Focus
Anthropic Claude API Character-based truncation at
~100KB of rendered content
Baseline reference; establishing two-track methodology
Anysphere Cursor Agent-routed fetch with undocumented truncation - 28KB–240KB+;
high cross-session variance
Reverse-engineering opaque, closed consumer tools
Google Gemini API Hard limit: 20 URLs per request;
supports PDF & JSON
Identifying architectural constraints and format support
Microsoft GitHub Copilot Agent-routed fetch_webpage - relevance-ranked excerpts, no fixed ceiling detected and/or curl - byte-perfect full retrieval Separating retrieval mechanism from retrieval quality through tool-call visibility
OpenAI Web Search Tool invocation conditional and model-dependent; differs by API surface Comparing behavior across
API endpoints