Conversation
There was a problem hiding this comment.
PR Summary
This pull request implements changes to receive and write external index data in chunks, aiming to reduce memory usage when handling large index files. The key modifications include:
- Modified StoreExternalIndex function in src/hnsw/external_index.c to support both streaming and non-streaming modes
- Introduced new functions external_index_receive_metadata and external_index_receive_index_part in src/hnsw/external_index_socket.c for chunked data transfer
- Added error handling for incomplete data reception and buffer management for processing chunks
- Updated function signatures and added new parameters in src/hnsw/external_index.h to accommodate the chunked transfer approach
- Replaced the previous method of receiving the entire index file at once with a more memory-efficient chunked approach
These changes should significantly reduce memory usage by processing large index files in smaller parts instead of loading them entirely into memory.
5 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
src/hnsw/external_index.c
Outdated
| // rotate buffer | ||
| buffer_position = external_index_buffer_size - local_progress; | ||
| memcpy(external_index_data, external_index_data + local_progress, buffer_position); |
There was a problem hiding this comment.
Let's implement an actual Ring Buffer for this to avoid copying here.
I think external_index_receive_index_part needs to change to take the write, read positions and size of the ring buffer instead of the write position and size, as it does now.
src/hnsw/external_index_socket.c
Outdated
| elog(ERROR, "external index socket read failed"); | ||
| break; | ||
| case EXTERNAL_INDEX_INDEXING_ERROR: | ||
| buffer[ size ] = '\0'; |
There was a problem hiding this comment.
what is the guarantee that this does not overflow the buffer?
a53f4d8 to
e41a549
Compare
…om socket in chunks - simplify `StoreExternalIndexNodes` function to write the provided buffer into index pages until fully flushed. - try to write as much nodes as possible before requesting new chunk from external indexing server to optimize data streaming - do not throw errors or process interrupts from `StoreExternalIndex` or external index socket functions, instead set status and error msg to `buildstat->status`, then check the status and throw the error or process interrupts after resource cleanup. - receive and process index file in 10MB chunks
e41a549 to
02ec07f
Compare
StoreExternalIndexNodes function and receive index file from socket in chunks
Benchmarks
|
4b8019a to
cdc0a19
Compare
d6c2349 to
1fe46f4
Compare
Refactor StoreExternalIndexNodes function and receive index file from socket in chunks
StoreExternalIndexNodesfunction to write the provided buffer into index pages until fully flushed.StoreExternalIndexor external index socket functions, instead set status and error msg tobuildstat->status, then check the status and throw the error or process interrupts after resource cleanup.