Skip to content

docs + optional public asset URLs (failure modes, #272)#275

Merged
LarFii merged 2 commits into
HKUDS:mainfrom
Abdeltoto:feat/docs-public-asset-urls
May 11, 2026
Merged

docs + optional public asset URLs (failure modes, #272)#275
LarFii merged 2 commits into
HKUDS:mainfrom
Abdeltoto:feat/docs-public-asset-urls

Conversation

@Abdeltoto

@Abdeltoto Abdeltoto commented May 10, 2026

Copy link
Copy Markdown
Contributor

What this does

  • Adds a short multimodal RAG failure modes checklist (#207, #213) and links it from the README + \env.example.
  • Implements optional public HTTPS / CDN / S3-style URLs alongside local MinerU paths (#272): set \RAGANYTHING_PUBLIC_ASSET_BASE_URL\ and \RAGANYTHING_PUBLIC_ASSET_STRIP_PREFIX\ → content list items can get *_public_url\ fields; filesystem paths stay unchanged for local processing.
  • Small unit tests for the URL helper.

Happy to rename fields or adjust the env contract if you prefer another pattern — just say the word.

Abdeltoto and others added 2 commits May 10, 2026 15:32
…HKUDS#272)

- Add docs/multimodal_rag_failure_modes.md and link from README + env.example
- Optional RAGANYTHING_PUBLIC_ASSET_* env vars add *_public_url fields
- New raganything/asset_urls.py; parser attaches URLs after path resolve
- Tests for asset URL helper (force-add; test_* is gitignored for new files)

Co-authored-by: Cursor <cursoragent@cursor.com>
- attach_public_media_urls now logs a single warning when only one of
  RAGANYTHING_PUBLIC_ASSET_BASE_URL / RAGANYTHING_PUBLIC_ASSET_STRIP_PREFIX
  is set, instead of silently no-oping. State resets when the env returns
  to either fully unset or fully set, so legitimate transitions still work.
- Strip prefix resolution is hoisted into _resolve_strip_prefix and
  skipped entirely when the path cannot be resolved.
- README and the failure-modes doc now flag that *_public_url is
  produced only on the MinerU parser path today; other parsers are not
  yet covered.
- tests: added warn-once cases for each misconfiguration shape and a
  silent case when both env vars are absent. Reformatted the file so
  pre-commit's ruff-format hook passes (this was the previously failing
  CI step on this PR).
@LarFii LarFii force-pushed the feat/docs-public-asset-urls branch from 932d0ef to 30da9b6 Compare May 11, 2026 09:16
@LarFii LarFii merged commit 146828f into HKUDS:main May 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants