Skip to content

Added chunking support to vectorize.table()#161

Closed
harshtech123 wants to merge 6 commits intoChuckHend:mainfrom
harshtech123:main
Closed

Added chunking support to vectorize.table()#161
harshtech123 wants to merge 6 commits intoChuckHend:mainfrom
harshtech123:main

Conversation

@harshtech123
Copy link
Copy Markdown
Contributor

/claim #142
Added chunk_text function to split into smaller chunks and You can adjust max_length in chunk_table to your desired text chunk length. Here, I’ve set it to 500 characters.

added chunk_text funtion to split into smaller chunks and seted max_length to 500.
@ChuckHend
Copy link
Copy Markdown
Owner

@harshtech123 - what will the table structure look like in Postgres given chunking during the embedding transformation call as it is proposed in this PR?

@harshtech123
Copy link
Copy Markdown
Contributor Author

@ChuckHend -
After chunking, each document will be split into multiple chunks, and each chunk will be assigned its own row in the table.
table structure might looks like
Screenshot (4)

@ChuckHend
Copy link
Copy Markdown
Owner

Thank you! Please add a test with an assertion that the table gets set up correctly.

added test that insure that table gets setup correctly
1 test_long_text_endpoint (tests long inputs)
2 test_small_input (tests small inputs)
3 test_empty_input (tests empty or none inputs)
4 test_boundary_chunking(insures that only exact chunk size is there)

also updated the transform.py for more better and error handling chunking functionality
@harshtech123
Copy link
Copy Markdown
Contributor Author

@ChuckHend kindly review the updates i have made also added some more funtionality to transform.py to handle empty inputs plus added tests with an assertion that the table gets set up correctly.
Screenshot (5)

@ChuckHend
Copy link
Copy Markdown
Owner

@harshtech123 I think this PR is a bit off the intention behind #142. The changes in this PR are made to vector-serve, which is a python service that runs outside of Postgres. We are looking to add the capability to vectorize.table(), which is a function call in the Postgres extension and it is defined here.

@harshtech123
Copy link
Copy Markdown
Contributor Author

@ChuckHend thanks for this confirmation , i am working to add this functionality and i have a question about sql files in extension did i have to update them all as well if i embed this function.

@ChuckHend
Copy link
Copy Markdown
Owner

There will likely need to be changes made to the code here. This feature perhaps needs further scoping.

@harshtech123
Copy link
Copy Markdown
Contributor Author

@ChuckHend i am closing this pr because i am working on #166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants