🐛 fix: add document parsing to knowledge base chunking pipeline by arvinxx · Pull Request #13221 · lobehub/lobehub

arvinxx · 2026-03-24T10:09:31Z

💻 Change Type

🐛 fix

🔗 Related Issue

Knowledge base file uploads were missing document parsing, causing detailed content to be unavailable.

🔀 Description of Change

When files are uploaded to a knowledge base, the parseFileToChunks async handler only performed chunking (splitting into chunks for RAG/semantic search) but did not create a documents record. This meant the parsed document content was not available for detailed viewing.

This fix adds document parsing as a pre-step in the chunking pipeline:

Before chunking begins, checks if a documents record already exists for the file
If not, calls DocumentService.parseFile() to create one
Wrapped in try-catch so document parsing failure does not block the chunking flow

🧪 How to Test

Tested locally
Added/updated tests
No tests needed

Upload a file (PDF, DOCX, MD) to a knowledge base
Verify that both chunking and document parsing complete
Check that the file's detailed content is viewable in the knowledge base

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-03-24T10:09:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lobehub	Ready	Preview, Comment	Mar 24, 2026 11:45am

sourcery-ai

We've reviewed this pull request using the Sourcery rules engine

lobehubbot · 2026-03-24T10:10:34Z

@rivertwilight @nekomeowww - This is a knowledge base fix that adds document parsing to the chunking pipeline in the server async router. Please take a look.

codecov · 2026-03-24T10:14:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.37%. Comparing base (383cace) to head (0ecfc1f).
⚠️ Report is 1 commits behind head on canary.

Additional details and impacted files

@@             Coverage Diff             @@
##           canary   #13221       +/-   ##
===========================================
+ Coverage   74.20%   87.37%   +13.17%     
===========================================
  Files        1537      578      -959     
  Lines      126450    44024    -82426     
  Branches    13930     6854     -7076     
===========================================
- Hits        93828    38466    -55362     
+ Misses      32511     5447    -27064     
  Partials      111      111

Flag	Coverage Δ
app	`?`
database	`97.89% <ø> (ø)`
packages/agent-runtime	`89.60% <ø> (ø)`
packages/context-engine	`83.57% <ø> (ø)`
packages/conversation-flow	`92.36% <ø> (ø)`
packages/file-loaders	`87.02% <ø> (ø)`
packages/memory-user-memory	`66.68% <ø> (ø)`
packages/model-bank	`99.84% <ø> (ø)`
packages/model-runtime	`84.79% <ø> (ø)`
packages/prompts	`74.60% <ø> (ø)`
packages/python-interpreter	`92.90% <ø> (ø)`
packages/ssrf-safe-fetch	`0.00% <ø> (ø)`
packages/utils	`90.09% <ø> (ø)`
packages/web-crawler	`88.82% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Store	`∅ <ø> (∅)`
Services	`∅ <ø> (∅)`
Server	`∅ <ø> (∅)`
Libs	`∅ <ø> (∅)`
Utils	`93.47% <ø> (+2.06%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lobehubbot · 2026-03-24T11:49:37Z

❤️ Great PR @arvinxx ❤️

The growth of project is inseparable from user feedback and contribution, thanks for your contribution! If you are interesting with the lobehub developer community, please join our discord and then dm @arvinxx or @canisminor1990. They will invite you to our private developer channel. We are talking about the lobe-chat development or sharing ai newsletter around the world.

@bakiburakogun

# 🚀 release: 20260326 This release includes **91 commits**. Key updates are below. - **Agent can now execute background tasks** — Agents can perform long-running operations without blocking your conversation. [#13289](#13289) - **Better error messages** — Redesigned error UI across chat and image generation with clearer explanations and recovery options. [#13302](#13302) - **Smoother topic switching** — No more full page reloads when switching topics while an agent is responding. [#13309](#13309) - **Faster image uploads** — Large images are now automatically compressed to 1920px before upload, reducing wait times. [#13224](#13224) - **Improved knowledge base** — Documents are now properly parsed before chunking, improving retrieval accuracy. [#13221](#13221) ### Bot Platform - **WeChat Bot support** — You can now connect LobeChat to WeChat, in addition to Discord. [#13191](#13191) - **Richer bot responses** — Bots now support custom markdown rendering and context injection. [#13294](#13294) - **New bot commands** — Added `/new` to start fresh conversations and `/stop` to halt generation. [#13194](#13194) - **Discord stability fixes** — Fixed thread creation issues and Redis connection drops. [#13228](#13228) [#13205](#13205) ### Models & Providers - **GLM-5** is now available in the LobeHub model list. [#13189](#13189) - **Coding Plan providers** — Added support for code planning assistant providers. [#13203](#13203) - **Tencent Hunyuan 3.0 ImageGen** — New image generation model from Tencent. [#13166](#13166) - **Gemini content handling** — Better handling when Gemini blocks content due to safety filters. [#13270](#13270) - **Claude token limits fixed** — Corrected max window tokens for Anthropic Claude models. [#13206](#13206) ### Skills & Tools - **Auto credential injection** — Skills can now automatically request and use required credentials. [#13124](#13124) - **Smarter tool permissions** — Built-in tools skip confirmation for safe paths like `/tmp`. [#13232](#13232) - **Model switcher improvements** — Quick access to provider settings and visual highlight for default model. [#13220](#13220) ### Memory - **Bulk delete memories** — You can now delete all memory entries at once. [#13161](#13161) - **Per-agent memory control** — Memory injection now respects individual agent settings. [#13265](#13265) ### Desktop App - **Gateway connection** — Desktop app can now connect to LobeHub Gateway for enhanced features. [#13234](#13234) - **Connection status indicator** — See gateway connection status in the titlebar. [#13260](#13260) - **Settings persistence** — Gateway toggle state now persists across app restarts. [#13300](#13300) ### CLI - **API key authentication** — CLI now supports API key auth for programmatic access. [#13190](#13190) - **Shell completion** — Tab completion for bash/zsh/fish shells. [#13164](#13164) - **Man pages** — Built-in manual pages for CLI commands. [#13200](#13200) ### Security - **XSS protection** — Sanitized search result image titles to prevent script injection. [#13303](#13303) - **Workflow hardening** — Fixed potential shell injection in release automation. [#13319](#13319) - **Dependency update** — Updated nodemailer to address security advisory. [#13326](#13326) ### Bug Fixes - Fixed skill page not redirecting correctly after import. [#13255](#13255) [#13261](#13261) - Fixed token counting in group chats. [#13247](#13247) - Fixed editor not resetting when switching to empty pages. [#13229](#13229) - Fixed manual tool toggle not working. [#13218](#13218) - Fixed Search1API response parsing. [#13207](#13207) [#13208](#13208) - Fixed mobile topic menus rendering issues. [#12477](#12477) - Fixed history count calculation for accurate context. [#13051](#13051) - Added missing Turkish translations. [#13196](#13196) ### Credits Huge thanks to these contributors: @bakiburakogun @hardy-one @Zhouguanyang @sxjeru @hezhijie0327 @arvinxx @cy948 @CanisMinor @Innei @lijian @lobehubbot @neko @rdmclin2 @rivertwilight @tjx666

🐛 fix: add document parsing to knowledge base chunking pipeline

5797f76

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sourcery-ai Bot reviewed Mar 24, 2026

View reviewed changes

vercel Bot deployed to Preview March 24, 2026 10:20 View deployment

arvinxx added 2 commits March 24, 2026 18:22

fix plugin title

a03ec26

update

0ecfc1f

vercel Bot deployed to Preview March 24, 2026 10:43 View deployment

🐛 fix: add missing findByFileId mock in document service tests

ab3797e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview March 24, 2026 11:45 View deployment

arvinxx merged commit 72ba8c8 into canary Mar 24, 2026
31 checks passed

arvinxx deleted the fix/kb-upload-document-parsing branch March 24, 2026 11:49

ONLY-yours mentioned this pull request Mar 27, 2026

🚀 release: 20260327 #13330

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 fix: add document parsing to knowledge base chunking pipeline#13221

🐛 fix: add document parsing to knowledge base chunking pipeline#13221
arvinxx merged 4 commits into
canaryfrom
fix/kb-upload-document-parsing

arvinxx commented Mar 24, 2026

Uh oh!

vercel Bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

lobehubbot commented Mar 24, 2026

Uh oh!

codecov Bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

lobehubbot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

arvinxx commented Mar 24, 2026

💻 Change Type

🔗 Related Issue

🔀 Description of Change

🧪 How to Test

Uh oh!

vercel Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

lobehubbot commented Mar 24, 2026

Uh oh!

codecov Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

lobehubbot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Mar 24, 2026 •

edited

Loading

codecov Bot commented Mar 24, 2026 •

edited

Loading