Skip to content

chunking support to vectorize.table() #142

@ChuckHend

Description

@ChuckHend

Provide ability to automatically chunk text in the input columns to the vectorize.table function, or provide a utility function (vectorize.chunk_table()?) that takes an input table, chunks the data in each row to multiple rows, and puts the output into a new table. I suppose vectorize.table could call vectorize.chunk_table under the hood as a convenience.

Use case is when there are giant documents, then user might want be able to retrieve just a subset of that document. Retrieving a subset of the document means that the chunk would hopefully be more relevant and specific than the entire document.

Langchain’s recursive_text_splitter for an example of this: https://python.langchain.com/docs/how_to/recursive_text_splitter/

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions