Yzx Wiki
- 这是当前
wiki/的入口页,用于索引已结构化沉淀的实体、概念、主题和来源总结。当前版本已纳入 Anthropic 关于 SWE-bench Verified、Workflows vs agents、BrowseComp、Model Context Protocol (MCP)、AI-resistant technical evaluations、Agent teams、Multi-agent research systems、Generator-evaluator loop、Tool ergonomics for agents、Programmatic tool calling、Inference infrastructure regressions、Infrastructure noise in evals、Permission delegation for agents、Sandboxing for agents、Meta-harness、Context engineering、Contextual Retrieval、Context hygiene for agents、Multi-context window workflows、Reasoning tools for agents、Evaluation harness、Spec-driven development 与 贡献驱动招聘 的资料,也纳入了 OpenAI 关于 Codex 与 Sora 的工程复盘、CitriniResearch 关于 AI 宏观经济风险、Intelligence displacement spiral、Ghost GDP、Agentic commerce 和 Intelligence premium unwind 的情景推演,以及 Peter Steinberger、Simon Willison、Matt Rickard 关于 Agentic engineering 工作流与意图约束的资料,以及若干成长与招聘材料。
- summaries: Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet、Building effective agents、How we built our multi-agent research system、Eval awareness in Claude Opus 4.6’s BrowseComp performance、Shipping at Inference-Speed、Just Talk To It - the no-bs Way of Agentic Engineering、Highlights from my conversation about agentic engineering on Lenny’s Podcast、My fireside chat about agentic engineering at the Pragmatic Summit、Your job is to deliver code you have proven to work、The Spec Layer、Code execution with MCP - Building more efficient agents、Designing AI-resistant technical evaluations、Building a C compiler with a team of parallel Claudes、Harness design for long-running application development、Writing effective tools for agents — with agents、Introducing advanced tool use on the Claude Developer Platform、A postmortem of three recent issues、Claude Code auto mode- a safer way to skip permissions、Beyond permission prompts- making Claude Code more secure and autonomous、Scaling Managed Agents-Decoupling the brain from the hands、Best Practices for Claude Code、Effective harnesses for long-running agents、The “think” tool- Enabling Claude to stop and think in complex tool use situations、Demystifying evals for AI agents、Effective context engineering for AI agents、Introducing Contextual Retrieval、Quantifying infrastructure noise in agentic coding evals、我们如何使用 Codex 在 28 天内构建 Android 版 Sora、THE 2028 GLOBAL INTELLIGENCE CRISIS、国务院关于深入实施“人工智能+”行动的意见、孙宇晨为什么能这么成功?、张一鸣2016年演讲、《不要害怕任何人和任何事》、品质、Hired Through GitHub: Part 2
- entities: Anthropic、Claude 3.5 Sonnet、Claude Opus 4.6、OpenAI、Codex、Sora、CitriniResearch、国务院、孙宇晨、张一鸣、Zed
- concepts: SWE-bench Verified、Agent scaffold、Workflows vs agents、BrowseComp、Eval awareness、Model Context Protocol (MCP)、AI-resistant technical evaluations、Agent teams、Multi-agent research systems、Generator-evaluator loop、Tool ergonomics for agents、Programmatic tool calling、Inference infrastructure regressions、Infrastructure noise in evals、Permission delegation for agents、Sandboxing for agents、Meta-harness、Context engineering、Contextual Retrieval、Context hygiene for agents、Multi-context window workflows、Reasoning tools for agents、Evaluation harness、Spec-driven development、贡献驱动招聘、人工智能+、Intelligence displacement spiral、Ghost GDP、Agentic commerce、Intelligence premium unwind、行动优先于情绪、主动性
- topics: Agentic coding evals、Agentic engineering、Agent 开发、AI 宏观经济风险
- log
- Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet
- Building effective agents
- How we built our multi-agent research system
- Eval awareness in Claude Opus 4.6’s BrowseComp performance
- Shipping at Inference-Speed
- Just Talk To It - the no-bs Way of Agentic Engineering
- Highlights from my conversation about agentic engineering on Lenny’s Podcast
- My fireside chat about agentic engineering at the Pragmatic Summit
- Your job is to deliver code you have proven to work
- The Spec Layer
- Code execution with MCP - Building more efficient agents
- Designing AI-resistant technical evaluations
- Building a C compiler with a team of parallel Claudes
- Harness design for long-running application development
- Writing effective tools for agents — with agents
- Introducing advanced tool use on the Claude Developer Platform
- A postmortem of three recent issues
- Claude Code auto mode- a safer way to skip permissions
- Beyond permission prompts- making Claude Code more secure and autonomous
- Scaling Managed Agents-Decoupling the brain from the hands
- Best Practices for Claude Code
- Effective harnesses for long-running agents
- The “think” tool- Enabling Claude to stop and think in complex tool use situations
- Demystifying evals for AI agents
- Effective context engineering for AI agents
- Introducing Contextual Retrieval
- Quantifying infrastructure noise in agentic coding evals
- 我们如何使用 Codex 在 28 天内构建 Android 版 Sora
- THE 2028 GLOBAL INTELLIGENCE CRISIS
- 国务院关于深入实施“人工智能+”行动的意见
- 孙宇晨为什么能这么成功?
- 张一鸣2016年演讲
- 《不要害怕任何人和任何事》
- 品质
- Hired Through GitHub: Part 2
- Anthropic
- Claude 3.5 Sonnet
- Claude Opus 4.6
- OpenAI
- Codex
- Sora
- CitriniResearch
- 国务院
- 孙宇晨
- 张一鸣
- Zed
- SWE-bench Verified
- Agent scaffold
- Workflows vs agents
- BrowseComp
- Eval awareness
- Model Context Protocol (MCP)
- AI-resistant technical evaluations
- Agent teams
- Multi-agent research systems
- Generator-evaluator loop
- Tool ergonomics for agents
- Programmatic tool calling
- Inference infrastructure regressions
- Infrastructure noise in evals
- Permission delegation for agents
- Sandboxing for agents
- Meta-harness
- Context engineering
- Contextual Retrieval
- Context hygiene for agents
- Multi-context window workflows
- Reasoning tools for agents
- Evaluation harness
- Spec-driven development
- 贡献驱动招聘
- 人工智能+
- Intelligence displacement spiral
- Ghost GDP
- Agentic commerce
- Intelligence premium unwind
- 行动优先于情绪
- 主动性
- Agentic coding evals
- Agentic engineering
- Agent 开发
- AI 宏观经济风险
- 本页依据本次导入产生的结构化页面自动维护。
- 导入来源:[llm-wiki/raw/anthropic/Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet](/raw/anthropic/Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet.md)
- 导入来源:[llm-wiki/raw/anthropic/Building effective agents](/raw/anthropic/Building effective agents.md)
- 导入来源:[llm-wiki/raw/anthropic/How we built our multi-agent research system](/raw/anthropic/How we built our multi-agent research system.md)
- 导入来源:[llm-wiki/raw/anthropic/Eval awareness in Claude Opus 4.6’s BrowseComp performance](/raw/anthropic/Eval awareness in Claude Opus 4.6’s BrowseComp performance.md)
- 导入来源:[llm-wiki/raw/anthropic/Code execution with MCP - Building more efficient agents](/raw/anthropic/Code execution with MCP - Building more efficient agents.md)
- 导入来源:[llm-wiki/raw/anthropic/Designing AI-resistant technical evaluations](/raw/anthropic/Designing AI-resistant technical evaluations.md)
- 导入来源:[llm-wiki/raw/anthropic/Building a C compiler with a team of parallel Claudes](/raw/anthropic/Building a C compiler with a team of parallel Claudes.md)
- 导入来源:[llm-wiki/raw/anthropic/Harness design for long-running application development](/raw/anthropic/Harness design for long-running application development.md)
- 导入来源:[llm-wiki/raw/anthropic/Writing effective tools for agents — with agents](/raw/anthropic/Writing effective tools for agents — with agents.md)
- 导入来源:[llm-wiki/raw/anthropic/Introducing advanced tool use on the Claude Developer Platform](/raw/anthropic/Introducing advanced tool use on the Claude Developer Platform.md)
- 导入来源:[llm-wiki/raw/anthropic/A postmortem of three recent issues](/raw/anthropic/A postmortem of three recent issues.md)
- 导入来源:[llm-wiki/raw/anthropic/Claude Code auto mode- a safer way to skip permissions](/raw/anthropic/Claude Code auto mode- a safer way to skip permissions.md)
- 导入来源:[llm-wiki/raw/anthropic/Beyond permission prompts- making Claude Code more secure and autonomous](/raw/anthropic/Beyond permission prompts- making Claude Code more secure and autonomous.md)
- 导入来源:[llm-wiki/raw/anthropic/Scaling Managed Agents-Decoupling the brain from the hands](/raw/anthropic/Scaling Managed Agents-Decoupling the brain from the hands.md)
- 导入来源:[llm-wiki/raw/anthropic/Best Practices for Claude Code](/raw/anthropic/Best Practices for Claude Code.md)
- 导入来源:[llm-wiki/raw/anthropic/Effective harnesses for long-running agents](/raw/anthropic/Effective harnesses for long-running agents.md)
- 导入来源:[llm-wiki/raw/anthropic/The “think” tool- Enabling Claude to stop and think in complex tool use situations](/raw/anthropic/The “think” tool- Enabling Claude to stop and think in complex tool use situations.md)
- 导入来源:[llm-wiki/raw/anthropic/Demystifying evals for AI agents](/raw/anthropic/Demystifying evals for AI agents.md)
- 导入来源:[llm-wiki/raw/anthropic/Effective context engineering for AI agents](/raw/anthropic/Effective context engineering for AI agents.md)
- 导入来源:[llm-wiki/raw/anthropic/Introducing Contextual Retrieval](/raw/anthropic/Introducing Contextual Retrieval.md)
- 导入来源:[llm-wiki/raw/anthropic/Quantifying infrastructure noise in agentic coding evals](/raw/anthropic/Quantifying infrastructure noise in agentic coding evals.md)
- 导入来源:[llm-wiki/raw/openai/我们如何使用 Codex 在 28 天内构建 Android 版 Sora](/raw/openai/我们如何使用 Codex 在 28 天内构建 Android 版 Sora.md)
- 导入来源:[llm-wiki/raw/01_AI/THE 2028 GLOBAL INTELLIGENCE CRISIS](/raw/01_AI/THE 2028 GLOBAL INTELLIGENCE CRISIS.md)
- 导入来源:[llm-wiki/raw/peter blog/Shipping at Inference-Speed](/raw/peter blog/Shipping at Inference-Speed.md)
- 导入来源:[llm-wiki/raw/peter blog/Just Talk To It - the no-bs Way of Agentic Engineering](/raw/peter blog/Just Talk To It - the no-bs Way of Agentic Engineering.md)
- 导入来源:[llm-wiki/raw/01_AI/Highlights from my conversation about agentic engineering on Lenny’s Podcast](/raw/01_AI/Highlights from my conversation about agentic engineering on Lenny’s Podcast.md)
- 导入来源:[llm-wiki/raw/01_AI/My fireside chat about agentic engineering at the Pragmatic Summit](/raw/01_AI/My fireside chat about agentic engineering at the Pragmatic Summit.md)
- 导入来源:[llm-wiki/raw/01_AI/Your job is to deliver code you have proven to work](/raw/01_AI/Your job is to deliver code you have proven to work.md)
- 导入来源:[llm-wiki/raw/02_AI编程/The Spec Layer](/raw/02_AI编程/The Spec Layer.md)
- 导入来源:llm-wiki/raw/01_AI/国务院关于深入实施“人工智能+”行动的意见
- 导入来源:llm-wiki/raw/03_成长/孙宇晨为什么能这么成功?
- 导入来源:llm-wiki/raw/03_成长/张一鸣2016年演讲
- 导入来源:llm-wiki/raw/03_成长/《不要害怕任何人和任何事》
- 导入来源:llm-wiki/raw/03_成长/品质
- 导入来源:[llm-wiki/raw/03_成长/Hired Through GitHub Part 2](/raw/03_成长/Hired Through GitHub Part 2.md)