Skip to content

added function that chunks text inputs in vectorize.table() function #142#166

Merged
ChuckHend merged 11 commits intoChuckHend:mainfrom
harshtech123:vectorize
Jan 21, 2025
Merged

added function that chunks text inputs in vectorize.table() function #142#166
ChuckHend merged 11 commits intoChuckHend:mainfrom
harshtech123:vectorize

Conversation

@harshtech123
Copy link
Copy Markdown
Contributor

@harshtech123 harshtech123 commented Oct 22, 2024

/claim #142
/closes #142
@ChuckHend can you please review this function any new changes are welcome!

function that chunk text
@harshtech123 harshtech123 changed the title added function that chunks text inputs in vectorize.table() function added function that chunks text inputs in vectorize.table() function #142 Oct 22, 2024
@ChuckHend
Copy link
Copy Markdown
Owner

@harshtech123 , are you collaborating with @asr2003 on #162? There seems to be very similar implementation on both of these PRs.

@harshtech123
Copy link
Copy Markdown
Contributor Author

@ChuckHend no i created a pr before he did and i have no idea about what he was doing ...

@harshtech123
Copy link
Copy Markdown
Contributor Author

@ChuckHend i just wants a neat and clean pr thats why i created a new one , before this i made changes on #161

Comment on lines +98 to +106
chunk_input: default!(bool, false), // New parameter to enable chunking
max_chunk_size: default!(i32, 1000), // New parameter for chunk size
) -> Result<String> {
if chunk_input {
// Call chunk_table if chunking is enabled
chunk_table(table, &columns[0], max_chunk_size, "'chunked_data'")?;
}

// Proceed with the original table initialization logic
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make chunk_table() a stand-alone function. So, remove these lines of code from this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@ChuckHend
Copy link
Copy Markdown
Owner

Maybe I am using the function wrong?

postgres=# select vectorize.chunk_table('products', 'description', 2);
ERROR:  column "id" does not exist
LINE 1: SELECT id, description FROM products
               ^
QUERY:  SELECT id, description FROM products

@harshtech123
Copy link
Copy Markdown
Contributor Author

harshtech123 commented Jan 18, 2025

  • @ChuckHend function can be called using SELECT vectorize.chunk_table('input_table', 'column_name', 1000, 'output_table');
  • i am making this as standalone and adding test in meaninwhile time

@harshtech123
Copy link
Copy Markdown
Contributor Author

harshtech123 commented Jan 19, 2025

@ChuckHend the function is ready to test
1 install every necessary extension and started psql
2 created a table using -

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    description TEXT
);

3 insert data to test on - INSERT INTO products (description) VALUES ('This is a test string that will be chunked into smaller pieces.');
4 run the chunk_tabel function - SELECT vectorize.chunk_table('products', 'description', 10, 'chunked_data');
5 finally verify the chunk data - SELECT * FROM chunked_data;

let max_chunk_size = max_chunk_size as usize;

// Retrieve rows from the input table, ensuring column existence
let query = format!("SELECT id, {} FROM {}", column_name, input_table);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires the input_table to have a column "id", right? Could this instead follow the same convention as vectorize.table where we have a parameter for the primary_key, then select from that instead of hardcoding to "id"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added primary key as parameter and selecting id from that instead of hardcoding id

added primary key as parameter so we dont have to hardcode id
@ChuckHend
Copy link
Copy Markdown
Owner

/split @asr2003

@ChuckHend ChuckHend merged commit 1b48190 into ChuckHend:main Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chunking support to vectorize.table()

2 participants