Skip to content

✨ feat: associate web crawl documents with agent documents#13893

Merged
arvinxx merged 6 commits into
canaryfrom
feat/web-crawl-agent-document-association
Apr 16, 2026
Merged

✨ feat: associate web crawl documents with agent documents#13893
arvinxx merged 6 commits into
canaryfrom
feat/web-crawl-agent-document-association

Conversation

@arvinxx

@arvinxx arvinxx commented Apr 16, 2026

Copy link
Copy Markdown
Member

πŸ’» Change Type

  • ✨ feat
  • βœ… test

πŸ”— Related Issue

Fixes LOBE-7242

πŸ”€ Description of Change

When web browsing crawls pages, the crawled documents are now also associated with the current agent's document workspace via the agentDocuments junction table. This enables cross-topic persistence and agent document load rules for crawled content.

Key changes:

  • AgentDocumentModel.associate() β€” New method to link an existing documents row to an agent via the agentDocuments table. Uses onConflictDoNothing for idempotency. Accepts object params { agentId, documentId, policyLoad? }.
  • AgentDocumentsService.associateDocument() β€” Service-layer wrapper.
  • TRPC associateDocument endpoint β€” New mutation for client-side usage.
  • Client AgentDocumentService.associateDocument() β€” Client service method with SWR revalidation.
  • Client executor (lobe-web-browsing.ts) β€” After notebook save, associates each crawled document with the agent.
  • Server-side runtime (webBrowsing.ts) β€” Converted from singleton to per-request factory. Creates documents and associates them with the agent when userId + serverDB + agentId context is available.
  • DocumentModel.findOrCreateFolder() β€” New utility for folder hierarchy support.
  • DOCUMENT_FOLDER_TYPE constant β€” Extracted 'custom/folder' string into a shared constant used across models and repositories.

πŸ§ͺ How to Test

  • Added/updated tests

Tests added:

  • AgentDocumentModel.associate β€” 3 integration tests (link, idempotency, no duplicate documents row)
  • DocumentModel.findOrCreateFolder β€” 5 integration tests (create, idempotency, user isolation, nested, same-name-different-parent)
  • AgentDocumentsService.associateDocument β€” 1 unit test

πŸ€– Generated with Claude Code

- Add `associate` method to AgentDocumentModel for linking existing documents
- Add `associateDocument` to AgentDocumentsService, TRPC router, and client service
- Update web browsing executor to associate crawled pages with agent after notebook save
- Add server-side crawl-to-agent-document persistence in webBrowsing runtime
- Add `findOrCreateFolder` to DocumentModel for folder hierarchy support
- Extract `DOCUMENT_FOLDER_TYPE` constant from hardcoded 'custom/folder' strings
- Add tests for associate, findOrCreateFolder, and service layer

Fixes LOBE-7242

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Apr 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lobehub Ready Ready Preview, Comment Apr 16, 2026 2:09pm

Request Review

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @arvinxx, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@codecov

codecov Bot commented Apr 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 60.00000% with 50 lines in your changes missing coverage. Please review.
βœ… Project coverage is 66.79%. Comparing base (94b6827) to head (fed5598).
⚠️ Report is 5 commits behind head on canary.

Additional details and impacted files
@@            Coverage Diff            @@
##           canary   #13893     +/-   ##
=========================================
  Coverage   66.78%   66.79%             
=========================================
  Files        2044     2044             
  Lines      174052   174110     +58     
  Branches    17146    21230   +4084     
=========================================
+ Hits       116249   116293     +44     
- Misses      57679    57693     +14     
  Partials      124      124             
Flag Coverage Ξ”
app 59.06% <29.57%> (-0.01%) ⬇️
database 92.46% <100.00%> (+0.02%) ⬆️
packages/agent-runtime 79.72% <ΓΈ> (ΓΈ)
packages/context-engine 83.22% <ΓΈ> (ΓΈ)
packages/conversation-flow 92.36% <ΓΈ> (ΓΈ)
packages/file-loaders 87.02% <ΓΈ> (ΓΈ)
packages/memory-user-memory 74.74% <ΓΈ> (ΓΈ)
packages/model-bank 99.86% <ΓΈ> (ΓΈ)
packages/model-runtime 84.20% <ΓΈ> (ΓΈ)
packages/prompts 69.24% <ΓΈ> (ΓΈ)
packages/python-interpreter 92.90% <ΓΈ> (ΓΈ)
packages/ssrf-safe-fetch 0.00% <ΓΈ> (ΓΈ)
packages/utils 90.34% <ΓΈ> (ΓΈ)
packages/web-crawler 88.66% <ΓΈ> (ΓΈ)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Ξ”
Store 66.23% <43.75%> (+0.02%) ⬆️
Services 52.10% <20.00%> (-0.03%) ⬇️
Server 66.73% <17.64%> (-0.04%) ⬇️
Libs 51.29% <ΓΈ> (ΓΈ)
Utils 91.12% <ΓΈ> (ΓΈ)
πŸš€ New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • πŸ“¦ JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d1a22afeb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/database/src/models/agentDocuments/agentDocument.ts
Comment thread packages/database/src/models/document.ts Outdated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace monkey-patching of crawlMultiPages with a proper onCrawlComplete
callback in the runtime constructor options.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
arvinxx and others added 2 commits April 16, 2026 21:12
Replace onCrawlComplete callback with documentService dependency injection.
The runtime now directly handles createDocument + associateDocument internally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ages

Add WebBrowsingDocumentContext (topicId, agentId) as a parameter to
crawlMultiPages, which flows through to documentService methods. This
allows a singleton runtime with per-call context on the client side.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… by null parentId

- associate: verify documentId belongs to current user before creating link
- findOrCreateFolder: add parentId IS NULL condition for root-level lookup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arvinxx arvinxx force-pushed the feat/web-crawl-agent-document-association branch from ae7a103 to fed5598 Compare April 16, 2026 13:44
@arvinxx arvinxx merged commit c046d04 into canary Apr 16, 2026
43 of 48 checks passed
@arvinxx arvinxx deleted the feat/web-crawl-agent-document-association branch April 16, 2026 15:11
arvinxx added a commit that referenced this pull request Apr 27, 2026
# πŸš€ LobeHub v2.1.53 (20260427)

**Release Date:** April 27, 2026
**Since v2.1.52:** 194 merged PRs Β· 17 contributors

> Introduce Heterogeneous Agent β€” Claude Code and Codex run as
first-class desktop runtimes, paired with a new Agent Signal package,
sharper desktop UX, and a wave of flagship model additions.

---

## ✨ Highlights

- **Introduce Heterogeneous Agent** β€” Claude Code and Codex run as
first-class desktop agents: subagent rendering, partial-message
streaming, multi-turn resume, terminal error surfacing, rich tool
inspectors, and runtime polish. (#14162, #13754, #14067, #14001, #13970,
#13942)
- **Screen capture & Quick Chat tray** β€” New desktop screen capture
overlay (macOS permission-gated) with Quick Chat tray and upload
pipeline improvements; chat input auto-focuses on overlay mount.
(#13818, #14097, #14105)
- **Desktop topic & tab UX** β€” Dedicated topic popup window with
cross-window sync, Cmd+W/Cmd+T tab shortcuts, TabBar polish, recent
working directories expanded to 20, and human approval notifications.
(#13957, #13983, #13972, #14036, #14092)
- **Git workflow built-in** β€” One-click pull/push from the branch chip,
ahead/behind badge, and submodule/worktree repo detection. (#14041,
#13980, #13978)
- **Agent Signal package** β€” New `@lobechat/agent-signal` runtime for
dynamic memory feedback signals, with OTel metrics and self-iteration in
Lab. (#14157, #14170, #14159, #14169, #14187)
- **New models** β€” Claude Opus 4.7 with `xhigh` effort tier, GPT-5.5,
DeepSeek V4 Flash/Pro with reasoning slider, Kimi K2.6, MiMo-V2.5/Pro,
gpt-image-2, Qwen3.6 Flash/Plus, and Pixverse-c1. (#13903, #14147,
#14114, #14004, #14089, #14039, #13923)
- **New providers** β€” OpenCode Zen, OpenCode Go, and Azure OpenAI Router
runtime. (#13943, #14064, #13823)
- **Mobile settings overhaul** β€” Full settings menu and responsive
profile layout for mobile. (#14019)

---

## πŸ—οΈ Heterogeneous Agent

- Claude Code runtime, working-directory awareness, and sidebar polish.
(#13970)
- CC subagent rendering with persistent streamed text; parallel-tool
orphan fix. (#14001, #13968, #14024)
- Per-step usage persisted to each step assistant message. (#13964)
- Per-phase workflow expand defaults; full-expand toggle with
three-level expansion. (#14171, #13906)
- Hetero-mode actions bar; tool inspector polish. (#13963, #14034,
#14030)
- Codex desktop integration with rich tool rendering and devtools
preview. (#14067, #14100)
- Codex terminal error surfacing and CLI output tracing. (#14166)
- Tighten `isCanUseVision` default and add aggregator fallback. (#14172)
- Persist `ccSessionId` in topic metadata for CC multi-turn resume.
(#13902)
- CC account card, topic filter, and integration polish. (#13955,
#13942, #13950)
- Token-level deltas streamed via `--include-partial-messages`. (#13929)

---

## 🧠 Agent Signal & Self-Iteration

- New `@lobechat/agent-signal` package with dynamic feedback signals.
(#14157)
- AgentSignalRuntime wired through agent-tracing and observability-otel
metrics. (#14170, #14159)
- Self-iteration feature flag added to Lab; front-side flag check.
(#14169, #14186)
- Signal policy for receiving memory feedback dynamically. (#14187)

---

## πŸ’¬ Conversation

- Queue follow-up sends during running CC turns. (#14179)
- Persist per-topic chat scroll position; pin user message + fold long
messages. (#14191, #14056)
- Inline resend when editing last user message. (#14080)
- Disable first-block markdown streaming to prevent flicker. (#14193,
#13904)
- Prevent Markdown stream replay when vlist remounts streaming items.
(#14086)
- Stop repinning after manual scroll; unify scroll-to-user + spacer
hooks. (#14099, #14132)

---

## πŸ“± Platforms & Integrations

### Desktop / Electron

- Screen capture overlay, Quick Chat tray, and upload pipeline
improvements. (#13818)
- macOS permission gate for screen capture; auto-focus chat panel input.
(#14097, #14105)
- Dedicated topic popup window with cross-window sync. (#13957)
- TabBar polish: `+` button for new topic, dark theme blend, close icon
by default. (#13972, #14203, #13973)
- Recent working directories expanded from 5 to 20; submodule/worktree
repo detection. (#14036, #13978)
- Cmd+W / Cmd+T tab shortcuts and global shortcut consolidation.
(#13983, #13880)
- Linux icon configuration; human approval desktop notifications.
(#14042, #14092)

### Git Workflow

- One-click pull/push from branch chip; ahead/behind badge with
refactored GitCtr. (#14041, #13980)

### Mobile

- Full settings menu and responsive profile layout. (#14019)
- Agent route added to mobile router; mobile agent topic route
registered. (#14103, #14158)
- Session list skeleton row layout corrected. (#14040)

### Bot / Messaging

- DM strategy support; bot emoji and markdown render optimization.
(#14201, #14091, #14140)
- Slack webhook fix; bot platform setup guide reference. (#14052,
#14121)

---

## πŸ€– Models & Providers

### New models

- **Claude Opus 4.7** with `xhigh` effort tier; strip temperature/top_p.
(#13903, #13909)
- **GPT-5.5**. (#14147)
- **DeepSeek V4** Flash/Pro cards with reasoning slider; cache-hit and
Pro discount pricing. (#14114, #14209, #14196, #14131)
- **Kimi K2.6** model with LobeHub-hosted card. (#14004, #14006)
- **MiMo-V2.5 / V2.5-Pro**. (#14089)
- **gpt-image-2**, **Qwen3.6 Flash/Plus**, **Pixverse-c1**. (#14039,
#13923)

### New providers

- **OpenCode Zen** and **OpenCode Go** with env-var support. (#13943,
#14064)
- **Azure OpenAI Router** runtime support. (#13823)
- Model alias mapping for image and video runtimes. (#13896)
- Seedance video models migrated to Dreamina. (#14144)

### Runtime reliability

- Sanitize invalid tool_call arguments to unbreak strict providers.
(#14033)
- Tolerate null `function.name` in streaming tool_call deltas. (#14139)
- Preserve Gemini 3 `thoughtSignature` in `call_tools_batch`
normalization. (#14032)
- Downgrade `image_url` parts when target model lacks vision. (#14029)
- Preserve Cloudflare provider error context. (#14136)
- Use `safety_identifier` for OpenAI Responses API. (#14148)
- Unwrap underlying PG error in `formatErrorEventData`. (#14038)

---

## πŸ–₯️ User Experience

- **Onboarding** β€” Preset agent naming suggestions, structured hunk ops
for `updateDocument`, persona analytics snapshot, footer promotion
pipeline, wrap-up button. (#13931, #13989, #13930, #13853, #13934)
- **Document workflow** β€” Agent documents promoted as primary workspace
panel; history management and compare workflow; web-crawl docs
associated with agent documents. (#13924, #13725, #13893)
- **cmdk** β€” Agent identity surfaced on topic search results;
topic/message search scoped to current agent. (#14204, #13960)
- **Floating chat panel** and workspace improvements. (#13887)
- **Topic completion status** with dropdown action and filter. (#14005)

---

## πŸ”§ Tooling

- Redis-backed feature flag provider for runtime config. (#14098)
- Vite upgraded to 8.0.0 with Rolldown strict execution order. (#12720,
#14058)
- `@lobechat/model-bank` automated npm release with provenance. (#14015,
#14017, #14018)
- Skill activation fallback when `activateTools` cannot find identifier.
(#14010)
- Cron tool: timezone and existing jobs injected into system prompt;
clarified `lobe-gtd` and `lobe-cron` descriptions. (#14012, #14013)

---

## πŸ”’ Security & Reliability

- **Security:** uuid bumped to v14 (advisory). (#14083)
- **Security:** validate avatar URL and scope old-avatar deletion to
owner. (#13982)
- **Security:** clear OIDC sessions on better-auth signout; return 401
(not 500) for expired OIDC JWT. (#13916, #14014)
- **Reliability:** scope pending-approval check to current assistant
turn. (#14182)
- **Reliability:** sanitize heterogeneous-agent attachment cache
filenames. (#13937)
- **Reliability:** reduce subagent task status error noise. (#14026)

---

## πŸ‘₯ Contributors

Huge thanks to **17 contributors** who shipped **194 merged PRs** this
week.

@hardy Β· @shaun0927 Β· @hezhijie0327 Β· @sxjeru Β· @arvinxx Β· @Innei Β·
@tjx666 Β· @lijian Β· @neko Β· @rdmclin2 Β· @AmAzing129 Β· @sudongyuer Β·
@CanisMinor Β· @rivertwilight

Plus @lobehubbot and renovate[bot] for maintenance.

---

**Full Changelog**:
v2.1.52...v2.1.53
mrsimpson pushed a commit to mrsimpson/lobehub that referenced this pull request May 8, 2026
…3893)

* ✨ feat: associate web crawl documents with agent documents

- Add `associate` method to AgentDocumentModel for linking existing documents
- Add `associateDocument` to AgentDocumentsService, TRPC router, and client service
- Update web browsing executor to associate crawled pages with agent after notebook save
- Add server-side crawl-to-agent-document persistence in webBrowsing runtime
- Add `findOrCreateFolder` to DocumentModel for folder hierarchy support
- Extract `DOCUMENT_FOLDER_TYPE` constant from hardcoded 'custom/folder' strings
- Add tests for associate, findOrCreateFolder, and service layer

Fixes LOBE-7242

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* πŸ› fix: log errors in web crawl agent document association

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ♻️ refactor: add onCrawlComplete callback to WebBrowsingExecutionRuntime

Replace monkey-patching of crawlMultiPages with a proper onCrawlComplete
callback in the runtime constructor options.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ♻️ refactor: move document save logic into WebBrowsingExecutionRuntime

Replace onCrawlComplete callback with documentService dependency injection.
The runtime now directly handles createDocument + associateDocument internally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ♻️ refactor: pass per-call context to documentService via crawlMultiPages

Add WebBrowsingDocumentContext (topicId, agentId) as a parameter to
crawlMultiPages, which flows through to documentService methods. This
allows a singleton runtime with per-call context on the client side.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* πŸ› fix: enforce document ownership in associate and match root folders by null parentId

- associate: verify documentId belongs to current user before creating link
- findOrCreateFolder: add parentId IS NULL condition for root-level lookup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant