Added chunking support to vectorize.table()#161
Added chunking support to vectorize.table()#161harshtech123 wants to merge 6 commits intoChuckHend:mainfrom
Conversation
added chunk_text funtion to split into smaller chunks and seted max_length to 500.
|
@harshtech123 - what will the table structure look like in Postgres given chunking during the embedding transformation call as it is proposed in this PR? |
|
@ChuckHend - |
|
Thank you! Please add a test with an assertion that the table gets set up correctly. |
added test that insure that table gets setup correctly 1 test_long_text_endpoint (tests long inputs) 2 test_small_input (tests small inputs) 3 test_empty_input (tests empty or none inputs) 4 test_boundary_chunking(insures that only exact chunk size is there) also updated the transform.py for more better and error handling chunking functionality
|
@ChuckHend kindly review the updates i have made also added some more funtionality to transform.py to handle empty inputs plus added tests with an assertion that the table gets set up correctly. |
|
@harshtech123 I think this PR is a bit off the intention behind #142. The changes in this PR are made to |
|
@ChuckHend thanks for this confirmation , i am working to add this functionality and i have a question about sql files in extension did i have to update them all as well if i embed this function. |
|
There will likely need to be changes made to the code here. This feature perhaps needs further scoping. |
need review for this functionality
|
@ChuckHend i am closing this pr because i am working on #166 |


/claim #142
Added chunk_text function to split into smaller chunks and You can adjust max_length in chunk_table to your desired text chunk length. Here, I’ve set it to 500 characters.