Skip to content

feat: Add option to make /api/v1/retrieval/process/web endpoint additive #21336

@jfahrenkrug

Description

@jfahrenkrug

Check Existing Issues

  • I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

  • I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

I've written a script that uses the sitemap of my website and then calls /api/v1/retrieval/process/web for every URL to ingest it into a knowledge base. It's working, but I was surprised how little data ended up in my Qdrant DB after successfully processing 602 webpages.
It turns out that every call to /api/v1/retrieval/process/web actually completely overrides the knowledge base. You can see that save_docs_to_vector_db is called with overwrite=True here:

Desired Solution you'd like

I'd like to be able to add hundreds of URLs (processed website) to the same knowledge base, maybe by passing overwrite=False to the request.

Alternatives Considered

Using the files API or the text API endpoint.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions