fix: security hardening for user-input-facing code paths#251
Merged
Conversation
Address multiple security issues found during audit, focusing on vulnerabilities triggerable by end-user input (queries, binary protocol, corrupted index data): - tpvector_recv: validate total_size to prevent undersized/huge allocations and validate deserialized fields (index_name_len, entry_count) from untrusted binary input - tpvector_out: add bounds checks on entry iteration to prevent walking past varlena bounds; validate lexeme_len >= 0 before alloca to prevent stack overflow - tpvector_in: replace atoi() with pg_strtoint32() for proper overflow/error handling on user-provided frequency values - query.c IDF cache: fix off-by-one OOB read where num_terms++ sentinel caused lookup_cached_idf to iterate past array bounds; use separate bool flag instead - tp_gettuple: convert recursive call to loop to prevent stack overflow (up to 100K frames) when results have invalid blocks - tp_segment_lookup_ctid: convert Assert-only doc_id bounds check to runtime check (Assert compiled out in release builds) - segment/scan.c, merge.c: convert Assert-only doc_id bounds checks in cached CTID array accesses to runtime checks - Validate string_entry.length from on-disk segments against TP_MAX_TERM_LENGTH (1MB) before allocating buffers, across 5 locations in scan.c, segment.c, and merge.c - posting.c: add overflow check before capacity * growth_factor multiplication; cast to Size for palloc argument - state.c: fix use-after-free (metap->magic after pfree), fix use-after-close (index_rel after index_close), fix access to hash entry after HASH_REMOVE
Cast total_size to Size before comparing with MaxAllocSize to avoid signed/unsigned comparison warning that GCC treats as error with -Werror=sign-compare. The cast is safe because the preceding check already rejects negative values.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Security audit and hardening of code paths reachable from user input (queries, binary protocol, corrupted stored data). Fixes 9 distinct vulnerability classes:
Binary protocol validation (
tpvector_recv): Added bounds checks ontotal_size,index_name_len, andentry_countfrom untrusted binary input. Previously, a malicious client could trigger integer underflow, undersized allocations, or OOB writes.Stored data validation (
tpvector_out): Added bounds checks on entry iteration to prevent reading past varlena allocation. Validateslexeme_len >= 0beforeallocato prevent stack overflow with corrupted data.Input parsing (
tpvector_in): Replacedatoi()withpg_strtoint32()for proper overflow detection and error reporting on user-provided frequency values.IDF cache OOB read (
query.c): Fixed off-by-one wherenum_terms++sentinel (to suppress repeated warnings) causedlookup_cached_idfto iterate one element past theterms[]array bounds. Uses a separateboolflag instead.Stack overflow (
tp_gettuple): Converted recursive call to a loop. Previously, up to 100K recursive frames could exhaust the stack when results reference invalid blocks.Assert-only bounds checks (
segment.c,scan.c,merge.c): ConvertedAssert(doc_id < num_docs)to runtime checks intp_segment_lookup_ctidand cached CTID array accesses. Asserts are compiled out in release builds, leaving no protection against corrupted segment data causing OOB heap reads.Unchecked term length from disk (5 locations): Added
TP_MAX_TERM_LENGTH(1MB) validation forstring_entry.lengthread from on-disk segments before allocating buffers. Previously, corrupted segments could trigger unbounded allocations or buffer overflows viauint32wrap-around.Integer overflow (
posting.c): Added overflow check beforecapacity * growth_factorin posting list expansion. Cast toSizeforpallocargument to prevent 32-bit truncation.Use-after-free/close (
state.c): Fixed three instances:metap->magicaccessed afterpfree(metap),index_rel->rd_indexaccessed afterindex_close, and hash entry accessed afterHASH_REMOVE.Testing
make format-checkpasses