Skip to content

Refactor codebase to better reflect architectural structure#99

Merged
tjgreen42 merged 9 commits intomainfrom
tj/refactor-posting-pagemapper
Jan 1, 2026
Merged

Refactor codebase to better reflect architectural structure#99
tjgreen42 merged 9 commits intomainfrom
tj/refactor-posting-pagemapper

Conversation

@tjgreen42
Copy link
Copy Markdown
Collaborator

@tjgreen42 tjgreen42 commented Dec 31, 2025

Summary

Major codebase reorganization splitting the monolithic index.c (2300+ lines) into focused modules:

New directory structure:

  • src/am/ - access method implementation (handler, build, scan, vacuum)
  • src/types/ - data types (vector, query, score)
  • src/state/ - index state management (state, registry, metapage, limit)
  • src/planner/ - query planner hooks and cost estimation
  • src/debug/ - debugging utilities (dump)
  • src/segment/ - renamed merge.c, query.c for clarity

Additional improvements:

  • Move TpPostingEntry/TpPostingList structs to posting.h (co-located with functions)
  • Add segment/pagemapper.h for centralized address translation utilities
  • Remove duplicate SEGMENT_DATA_PER_PAGE definitions

Move TpPostingEntry and TpPostingList from memtable.h to posting.h
to co-locate structs with their functions.

Add segment/pagemapper.h with centralized logical-to-physical address
translation utilities, eliminating duplicate SEGMENT_DATA_PER_PAGE
definitions from segment.c and segment_merge.c.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Split monolithic index.c into focused components:
- src/am/ - access method (handler, build, scan, vacuum)
- src/types/ - data types (vector, query, score)
- src/state/ - index state management (state, registry, metapage, limit)
- src/planner/ - query planner hooks and cost estimation
- src/debug/ - debugging utilities (dump)
- src/segment/ - disk segment operations (renamed merge.c, query.c)

Also includes:
- Move TpPostingEntry/TpPostingList structs to posting.h
- Add segment/pagemapper.h for address translation utilities

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@tjgreen42 tjgreen42 force-pushed the tj/refactor-posting-pagemapper branch from 0beecf7 to d042e3f Compare December 31, 2025 04:39
postgres.h must be included before any other PostgreSQL headers
to ensure proper type definitions (uint32, uint16, bool, etc.).
- Add <access/htup_details.h> for GETSTRUCT macro
- Remove excess elements from relopt_parse_elt initializers
Introduces a columnar interface that both memtable and segment can
implement, allowing scoring code to be storage-agnostic:

- src/source.h: TpPostingData (columnar ctids/frequencies) and
  TpDataSource interface with get_postings, get_doc_length, close
- src/source.c: Helper functions for allocating/freeing posting data
- src/memtable/source.{h,c}: Memtable implementation of TpDataSource

This is the foundation for decoupling memtable from segment knowledge.
Implements the columnar TpDataSource interface for segments:

- Supports V2 block-based segment format with skip indices
- Converts block postings to columnar format with CTIDs
- Uses cached CTID arrays for doc_id to CTID translation
- Dequantizes fieldnorm for document length lookups

This completes the TpDataSource abstraction layer, allowing scoring
code to work uniformly with both memtable and segment data.
Replace direct memtable internal access (string_table, doclength_table,
TpPostingList, TpPostingEntry) with the abstract TpDataSource interface.
This makes the scoring code independent of memtable implementation
details and uses the new columnar TpPostingData format.

The segment scoring continues to use tp_score_all_terms_in_segment_chain
which has important optimizations for processing all terms in a single
segment traversal.
Dead code removed:
- tp_destroy_index_dsa, tp_registry_detach_dsa, tp_registry_reset_dsa,
  tp_registry_get_shared_dp (unimplemented declarations)
- tp_release_local_index_state, tp_get_local_index_state_if_cached,
  tp_destroy_shared_index_state (unused implementations in state.c)
- tp_recover_from_docid_pages (~170 lines, unused in metapage.c)
- tp_string_table_delete, tp_string_table_get_or_create_posting_list,
  tp_realloc_posting_entries_dsa, tp_get_string_from_dp (stringtable)

Bug fix:
- calculate_term_score() was using hardcoded k1=1.2, b=0.75 instead of
  the actual index parameters from metapage. Now correctly passes
  metap->k1 and metap->b to the scoring function.

Total: 376 lines removed.
@tjgreen42 tjgreen42 changed the title Refactor posting structs and segment page mapping Refactor codebase to better reflect architectural structure Jan 1, 2026
@tjgreen42 tjgreen42 merged commit 3f05966 into main Jan 1, 2026
11 checks passed
@tjgreen42 tjgreen42 deleted the tj/refactor-posting-pagemapper branch January 1, 2026 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant