Refactor codebase to better reflect architectural structure#99
Merged
Refactor codebase to better reflect architectural structure#99
Conversation
Move TpPostingEntry and TpPostingList from memtable.h to posting.h to co-locate structs with their functions. Add segment/pagemapper.h with centralized logical-to-physical address translation utilities, eliminating duplicate SEGMENT_DATA_PER_PAGE definitions from segment.c and segment_merge.c. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Split monolithic index.c into focused components: - src/am/ - access method (handler, build, scan, vacuum) - src/types/ - data types (vector, query, score) - src/state/ - index state management (state, registry, metapage, limit) - src/planner/ - query planner hooks and cost estimation - src/debug/ - debugging utilities (dump) - src/segment/ - disk segment operations (renamed merge.c, query.c) Also includes: - Move TpPostingEntry/TpPostingList structs to posting.h - Add segment/pagemapper.h for address translation utilities 🤖 Generated with [Claude Code](https://claude.com/claude-code)
0beecf7 to
d042e3f
Compare
postgres.h must be included before any other PostgreSQL headers to ensure proper type definitions (uint32, uint16, bool, etc.).
- Add <access/htup_details.h> for GETSTRUCT macro - Remove excess elements from relopt_parse_elt initializers
Introduces a columnar interface that both memtable and segment can
implement, allowing scoring code to be storage-agnostic:
- src/source.h: TpPostingData (columnar ctids/frequencies) and
TpDataSource interface with get_postings, get_doc_length, close
- src/source.c: Helper functions for allocating/freeing posting data
- src/memtable/source.{h,c}: Memtable implementation of TpDataSource
This is the foundation for decoupling memtable from segment knowledge.
Implements the columnar TpDataSource interface for segments: - Supports V2 block-based segment format with skip indices - Converts block postings to columnar format with CTIDs - Uses cached CTID arrays for doc_id to CTID translation - Dequantizes fieldnorm for document length lookups This completes the TpDataSource abstraction layer, allowing scoring code to work uniformly with both memtable and segment data.
Replace direct memtable internal access (string_table, doclength_table, TpPostingList, TpPostingEntry) with the abstract TpDataSource interface. This makes the scoring code independent of memtable implementation details and uses the new columnar TpPostingData format. The segment scoring continues to use tp_score_all_terms_in_segment_chain which has important optimizations for processing all terms in a single segment traversal.
Dead code removed: - tp_destroy_index_dsa, tp_registry_detach_dsa, tp_registry_reset_dsa, tp_registry_get_shared_dp (unimplemented declarations) - tp_release_local_index_state, tp_get_local_index_state_if_cached, tp_destroy_shared_index_state (unused implementations in state.c) - tp_recover_from_docid_pages (~170 lines, unused in metapage.c) - tp_string_table_delete, tp_string_table_get_or_create_posting_list, tp_realloc_posting_entries_dsa, tp_get_string_from_dp (stringtable) Bug fix: - calculate_term_score() was using hardcoded k1=1.2, b=0.75 instead of the actual index parameters from metapage. Now correctly passes metap->k1 and metap->b to the scoring function. Total: 376 lines removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major codebase reorganization splitting the monolithic
index.c(2300+ lines) into focused modules:New directory structure:
src/am/- access method implementation (handler, build, scan, vacuum)src/types/- data types (vector, query, score)src/state/- index state management (state, registry, metapage, limit)src/planner/- query planner hooks and cost estimationsrc/debug/- debugging utilities (dump)src/segment/- renamed merge.c, query.c for clarityAdditional improvements:
TpPostingEntry/TpPostingListstructs toposting.h(co-located with functions)segment/pagemapper.hfor centralized address translation utilitiesSEGMENT_DATA_PER_PAGEdefinitions