Skip to content

fix(gc): avoid SELECT * when querying old requests/executions for cle…#1700

Merged
looplj merged 1 commit into
looplj:unstablefrom
FlyLoongZ:unstable
May 23, 2026
Merged

fix(gc): avoid SELECT * when querying old requests/executions for cle…#1700
looplj merged 1 commit into
looplj:unstablefrom
FlyLoongZ:unstable

Conversation

@FlyLoongZ

Copy link
Copy Markdown
Contributor

修复清理旧数据时 SELECT 全字段导致大量数据传输的问题

在 cleanupOldRequestExecutions 和 cleanupOldRequestsRecords 中,原先使用 .All(ctx) 进行 SELECT * 查询,会拉取 request_body、response_body、response_chunks 等大 JSON blob 字段导致:

  1. PostgreSQL 到 AxonHub 的大流量数据传输
  2. AxonHub内存需求暴涨导致 OOM/节点掉线
  3. DELETE 语句尚未执行进程已崩溃,数据未被实际删除

修复方式:使用 .Select() 只查询清理所需的 int 类型字段(ID、ProjectID、DataStorageID、RequestID)

@greptile-apps

greptile-apps Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR optimizes the GC cleanup functions to avoid SELECT * queries when fetching old request executions and request records, replacing them with targeted SELECT calls that only retrieve the integer fields required for the cleanup logic (id, project_id, data_storage_id, request_id). This eliminates unnecessary transfer of large JSON blob columns (request_body, response_body, response_chunks) from PostgreSQL, which could cause OOM crashes during GC runs.

  • cleanupOldRequestExecutions now selects only id, project_id, data_storage_id, and request_id — exactly the fields used by cleanupExecutionExternalStorage to build object-storage keys.
  • cleanupOldRequestsRecords now selects only id, project_id, and data_storage_id — exactly the fields used by cleanupRequestExternalStorage.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to restricting the SELECT column list in two batch-query loops, and every field consumed by the downstream cleanup helpers is present in the new select list.

Both cleanup functions only use id, project_id, data_storage_id, and (for executions) request_id to build external-storage object keys and DB delete predicates. The schema confirms data_storage_id is nullable and stored as a plain int in the Go struct; the existing == 0 guard already handles the NULL case correctly both before and after this change. Pagination relies on DELETE removing the fetched rows so the next query naturally advances — that logic is unchanged. No unselected field is read anywhere in the affected code paths.

No files require special attention.

Important Files Changed

Filename Overview
internal/server/gc/gc.go Adds targeted .Select() calls in both batch-query loops to restrict fetched columns to the integer fields actually consumed downstream; all fields referenced by the external-storage cleanup helpers are correctly included.

Sequence Diagram

sequenceDiagram
    participant GC as GC Worker
    participant DB as PostgreSQL
    participant ES as External Storage

    Note over GC,DB: Before fix: SELECT * (all blob columns)
    GC->>DB: "SELECT * FROM request_executions WHERE created_at < cutoff ORDER BY id LIMIT 500"
    DB-->>GC: 500 rows incl. request_body / response_body / response_chunks (MBs)

    Note over GC,DB: After fix: SELECT id, project_id, data_storage_id, request_id
    loop Batch loop
        GC->>DB: "SELECT id, project_id, data_storage_id, request_id FROM request_executions WHERE created_at < cutoff ORDER BY id LIMIT 500"
        DB-->>GC: 500 rows (integers only, tiny payload)
        GC->>ES: DeleteData(keys derived from id / project_id / request_id)
        ES-->>GC: OK
        GC->>DB: DELETE FROM request_executions WHERE id IN (...)
        DB-->>GC: deleted count
    end
Loading

Reviews (1): Last reviewed commit: "fix(gc): avoid SELECT * when querying ol..." | Re-trigger Greptile

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the garbage collection process by adding field selection to database queries for request executions and records. By only fetching necessary fields, the changes reduce memory consumption and prevent potential out-of-memory issues. I have no further feedback to provide.

@looplj looplj merged commit 3ed308e into looplj:unstable May 23, 2026
4 checks passed
yoke233 pushed a commit to yoke233/axonhub that referenced this pull request May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants