fix(gc): avoid SELECT * when querying old requests/executions for cle…#1700
Conversation
Greptile SummaryThis PR optimizes the GC cleanup functions to avoid
Confidence Score: 5/5Safe to merge — the change is narrowly scoped to restricting the SELECT column list in two batch-query loops, and every field consumed by the downstream cleanup helpers is present in the new select list. Both cleanup functions only use id, project_id, data_storage_id, and (for executions) request_id to build external-storage object keys and DB delete predicates. The schema confirms data_storage_id is nullable and stored as a plain int in the Go struct; the existing == 0 guard already handles the NULL case correctly both before and after this change. Pagination relies on DELETE removing the fetched rows so the next query naturally advances — that logic is unchanged. No unselected field is read anywhere in the affected code paths. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant GC as GC Worker
participant DB as PostgreSQL
participant ES as External Storage
Note over GC,DB: Before fix: SELECT * (all blob columns)
GC->>DB: "SELECT * FROM request_executions WHERE created_at < cutoff ORDER BY id LIMIT 500"
DB-->>GC: 500 rows incl. request_body / response_body / response_chunks (MBs)
Note over GC,DB: After fix: SELECT id, project_id, data_storage_id, request_id
loop Batch loop
GC->>DB: "SELECT id, project_id, data_storage_id, request_id FROM request_executions WHERE created_at < cutoff ORDER BY id LIMIT 500"
DB-->>GC: 500 rows (integers only, tiny payload)
GC->>ES: DeleteData(keys derived from id / project_id / request_id)
ES-->>GC: OK
GC->>DB: DELETE FROM request_executions WHERE id IN (...)
DB-->>GC: deleted count
end
Reviews (1): Last reviewed commit: "fix(gc): avoid SELECT * when querying ol..." | Re-trigger Greptile |
There was a problem hiding this comment.
Code Review
This pull request optimizes the garbage collection process by adding field selection to database queries for request executions and records. By only fetching necessary fields, the changes reduce memory consumption and prevent potential out-of-memory issues. I have no further feedback to provide.
…anup (looplj#1700) (cherry picked from commit 3ed308e)
修复清理旧数据时 SELECT 全字段导致大量数据传输的问题
在 cleanupOldRequestExecutions 和 cleanupOldRequestsRecords 中,原先使用 .All(ctx) 进行 SELECT * 查询,会拉取 request_body、response_body、response_chunks 等大 JSON blob 字段导致:
修复方式:使用 .Select() 只查询清理所需的 int 类型字段(ID、ProjectID、DataStorageID、RequestID)