Skip to content

feat: backup usage stats#1632

Merged
looplj merged 15 commits into
looplj:unstablefrom
henryz78:fix-backup-usage-stats
May 12, 2026
Merged

feat: backup usage stats#1632
looplj merged 15 commits into
looplj:unstablefrom
henryz78:fix-backup-usage-stats

Conversation

@henryz78

Copy link
Copy Markdown
Contributor

Summary

  • include usage statistics in manual backup, restore, and auto-backup flows
  • add includeUsageStats to GraphQL inputs/settings and the backup settings UI
  • export usage requests and usage logs with project/channel/API-key reference metadata for restore
  • strip Ent edges from usage-stat backup JSON to avoid API key leakage and duplicated nested request data
  • only include API key values when IncludeAPIKeys is explicitly enabled; avoid restoring API key links from raw numeric IDs
  • avoid logging raw API key values when restore references cannot be resolved
  • restore usage stats with cached FK resolution, safe optional-reference handling, and actionable missing-reference warnings
  • deduplicate restored usage requests by stable fingerprints, including re-restore cases where API key values were omitted from the backup
  • reduce re-restore lookup amplification by deduplicating timestamp probes before querying existing requests
  • use each usage log’s own APIKeyID when adding optional API-key restore metadata
  • default auto backup to include usage stats, with backward-compatible settings parsing

Testing

  • git diff --check
  • Not run: Go tests / gofmt, because go and gofmt are not available in this local environment

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for backing up and restoring usage statistics, including usage requests and logs. The implementation includes UI toggles for manual and automatic backups, GraphQL schema updates, and backend logic for batched data processing and entity remapping. Feedback suggests optimizing memory usage by batching API key lookups during backup and refining the deduplication query during restore to handle potential performance issues with high-concurrency timestamps.

Comment on lines +225 to +235
apiKeys, err := svc.db.APIKey.Query().
Select(apikey.FieldID, apikey.FieldKey).
All(ctx)
if err != nil {
return nil, err
}

for _, ak := range apiKeys {
apiKeyKeys[ak.ID] = ak.Key
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Querying all API keys at once to build a lookup map might lead to high memory consumption if the system has a very large number of API keys. Consider using a more memory-efficient approach or batching this query if the number of keys is expected to be significant.

Comment on lines +1004 to +1019
for start := 0; start < len(createdAt); start += usageBackupBatchSize {
end := min(start+usageBackupBatchSize, len(createdAt))
requests, err := db.Request.Query().
Where(request.CreatedAtIn(createdAt[start:end]...)).
WithProject().
WithChannel().
WithAPIKey().
All(ctx)
if err != nil {
return nil, err
}

for _, req := range requests {
addExistingUsageRequest(lookup, req)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Querying existing requests by CreatedAt in batches of 500 timestamps might still return a large number of rows if many requests share the exact same timestamp (e.g., in high-concurrency scenarios). While CreatedAt usually has high precision, consider if additional filters (like ProjectID) could be added to the query to further narrow down the results and improve performance during restore.

@greptile-apps

greptile-apps Bot commented May 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds usage statistics (requests and logs) to the manual backup, restore, and auto-backup flows, including a new includeUsageStats option wired through GraphQL, the business layer, and the UI. The implementation handles FK resolution by name with ID fallback, deduplicates restored records via both ID and stable fingerprint, redacts credentials from backup output unless explicitly opted in, and uses a backward-compatible *bool pointer trick for the auto-backup settings JSON field.

  • Backup: batched cursor pagination (500 at a time) exports usage requests with project/channel/credential metadata; custom MarshalJSON strips Ent edge structs to prevent nested data leakage.
  • Restore: usageRestoreResolver caches project, channel, and credential lookups; existingUsageRequests pre-fetches by ID and timestamp to detect duplicates before inserting; usage logs are bulk-inserted in batches after deduplication against existing and in-session records.
  • Settings: autoBackupSettingsJSON intermediate struct with *bool for IncludeUsageStats preserves backward compatibility; defaults to false for auto-backup and true for manual backup/restore.

Confidence Score: 5/5

Safe to merge. The backup and restore logic is well-structured with batched queries, proper dedup, and credential redaction.

The change introduces significant new code for backup and restore of usage data, but the core paths are covered by round-trip tests, deduplication logic is thorough (both ID and fingerprint-based), and credential handling is correctly guarded. The only finding is a silently-ignored omitzero JSON tag that has no runtime impact since the relevant timestamp fields are always populated for real records.

internal/server/backup/types.go has the misleading omitzero tag; internal/server/backup/restore.go carries the bulk of the new logic and would benefit from an eye on the transaction scope for large datasets.

Important Files Changed

Filename Overview
internal/server/backup/restore.go Adds full restore pipeline for usage requests and logs with FK resolution, deduplication by ID and fingerprint, batched DB queries, and safe optional-reference handling.
internal/server/backup/types.go Introduces BackupUsageRequest/BackupUsageLog with custom MarshalJSON to strip Ent edges; uses non-standard omitzero tag (silently ignored by encoding/json) on time.Time fields.
internal/server/backup/backup_ops.go Adds batched backup of usage requests and logs; redaction of sensitive key values works correctly via empty map fallback; switches to compact json.Marshal when usage stats are included.
internal/server/biz/system.go Uses intermediate autoBackupSettingsJSON with *bool for IncludeUsageStats to safely handle backward-compatible JSON parsing of existing stored settings.
internal/server/backup/restore_test.go Adds a round-trip restore test for usage stats covering token counts, cost, project/channel/API-key linkage.
internal/server/backup/backup_test.go Adds backup test covering usage stats inclusion, credential redaction by default, and optional credential inclusion; verifies no Ent edge data leaks into JSON output.
frontend/src/features/system/components/backup-settings.tsx Adds includeUsageStats toggle to manual backup, restore, and auto-backup forms; defaults to true for manual ops and false for auto-backup.
internal/server/gql/backup.graphql Adds includeUsageStats to BackupOptionsInput, RestoreOptionsInput (default true), and AutoBackupSettings/UpdateAutoBackupSettingsInput.

Sequence Diagram

sequenceDiagram
    participant UI as BackupSettings UI
    participant GQL as GraphQL Resolver
    participant BackupSvc as backup.BackupService
    participant DB as Database

    Note over UI,DB: Backup Flow
    UI->>GQL: "backup(opts: {includeUsageStats: true})"
    GQL->>BackupSvc: BackupWithoutAuth(ctx, opts)
    BackupSvc->>DB: Request.Query (batched, cursor ID)
    DB-->>BackupSvc: []ent.Request (with Project, Channel edges)
    BackupSvc->>DB: UsageLog.Query (batched, cursor ID)
    DB-->>BackupSvc: []ent.UsageLog (with Project, Channel edges)
    BackupSvc-->>GQL: JSON (MarshalJSON strips edges, redacts keys)
    GQL-->>UI: backup file

    Note over UI,DB: Restore Flow
    UI->>GQL: "restore(data, opts: {includeUsageStats: true})"
    GQL->>BackupSvc: Restore(ctx, data, opts)
    BackupSvc->>DB: Load all projects, channels, keys (resolver cache)
    BackupSvc->>DB: existingUsageRequests (by ID + createdAt batches)
    DB-->>BackupSvc: existing requests for dedup
    loop For each backup request
        BackupSvc->>BackupSvc: resolve project/channel/key IDs
        BackupSvc->>BackupSvc: check byID + byFingerprint dedup
        BackupSvc->>DB: Request.Create (if new)
    end
    loop For each backup log (batched 500)
        BackupSvc->>BackupSvc: resolve requestIDMap + dedup checks
        BackupSvc->>DB: UsageLog.CreateBulk
    end
    BackupSvc-->>GQL: nil (success)
Loading

Reviews (4): Last reviewed commit: "Fix usage backup lint formatting" | Re-trigger Greptile

Comment on lines +177 to +179
if includeAPIKeyValues {
query.WithAPIKey()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Discarded WithAPIKey() return value

query.WithAPIKey() is called without capturing its return value. Ent's With* methods use a pointer receiver (_q *RequestQuery) and mutate the query in-place, so the eager load is registered correctly — but the pattern is non-idiomatic and easy to misread as a no-op. It's also fragile: if a future Ent refactor changes With* to return a new query object rather than self, this silently stops loading API key edges and req.Edges.APIKey would always be nil for callers expecting key values.

Comment on lines +166 to +200
func (svc *BackupService) backupUsageRequests(ctx context.Context, includeAPIKeyValues bool) ([]*BackupUsageRequest, error) {
var usageRequestDataList []*BackupUsageRequest
lastID := 0

for {
query := svc.db.Request.Query().
Where(request.IDGT(lastID)).
Order(ent.Asc(request.FieldID)).
Limit(usageBackupBatchSize).
WithProject().
WithChannel()
if includeAPIKeyValues {
query.WithAPIKey()
}

usageRequests, err := query.All(ctx)
if err != nil {
return nil, err
}

if len(usageRequests) == 0 {
break
}

for _, req := range usageRequests {
usageRequestDataList = append(usageRequestDataList, backupUsageRequest(req, includeAPIKeyValues))
lastID = req.ID
}

if len(usageRequests) < usageBackupBatchSize {
break
}
}

return usageRequestDataList, nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Full dataset accumulated in memory before serialization

backupUsageRequests and backupUsageLogs each batch-query from the DB (500 at a time) but accumulate every record into an in-memory slice before returning. For large deployments with millions of requests, both slices — plus the final json.Marshal(backupData) call that holds another representation — will coexist in heap. A streaming JSON approach (e.g. writing records as they are fetched and encoding directly to an io.Writer) would bound peak memory to roughly one batch rather than the full dataset.

Comment on lines +896 to +911
log.Warn(ctx, "API key not found for restoring usage request, restoring with null API key",
log.Int("request_id", oldID),
)
}

if existing, ok := existingRequests.byID[oldID]; ok {
if sameUsageRequest(existing, reqData, projectID, channelID, apiKeyID) {
idMap[oldID] = existing.ID
continue
}
}
if existing, ok := existingRequests.byFingerprint[usageRequestBackupFingerprint(reqData)]; ok {
idMap[oldID] = existing.ID
continue
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 sameUsageRequest compares ChannelID directly without accounting for failed resolution

When resolveChannelID cannot resolve the channel (returns ok=false), channelID is 0. The comparison existing.ChannelID == channelID then only matches an existing request whose DB channel is already null. If a request from the first restore was written with a valid channel ID (because the channel existed then but was later deleted), this check returns false, the fingerprint check also fails (channel name differs), and the re-restore inserts a duplicate record. This is an edge case only triggered when a referenced channel disappears between the original restore and a re-restore.

@looplj

looplj commented May 10, 2026

Copy link
Copy Markdown
Owner

usage 表可能太大了,每天备份也不合适吧。

@henryz78

Copy link
Copy Markdown
Contributor Author

有道理,usage stats 可能会非常大,所以我已经把自动备份里的 usage stats 改成默认不包含,避免每天备份时生成过大的备份文件。

不过这个选项还是保留给用户手动开启,因为有些用户确实需要备份使用统计,比如迁移实例、灾难恢复,或者保留历史请求量、token 用量、成本统计和使用日志,避免恢复后统计页面的数据全部丢失。

@looplj looplj changed the title Fix backup usage stats feat: backup usage stats May 11, 2026
@looplj

looplj commented May 11, 2026

Copy link
Copy Markdown
Owner

ci 挂了

@henryz78

Copy link
Copy Markdown
Contributor Author

不好意思漏了个import。已在最新提交中修复,等待工作流批准以重新运行检查。

@henryz78

Copy link
Copy Markdown
Contributor Author

CI 之前只剩 lint 的格式问题,我已经修复并推送了,现在等你批准 workflow 重新跑即可。

@looplj looplj merged commit 19d1101 into looplj:unstable May 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants