feat: backup usage stats#1632
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds support for backing up and restoring usage statistics, including usage requests and logs. The implementation includes UI toggles for manual and automatic backups, GraphQL schema updates, and backend logic for batched data processing and entity remapping. Feedback suggests optimizing memory usage by batching API key lookups during backup and refining the deduplication query during restore to handle potential performance issues with high-concurrency timestamps.
| apiKeys, err := svc.db.APIKey.Query(). | ||
| Select(apikey.FieldID, apikey.FieldKey). | ||
| All(ctx) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| for _, ak := range apiKeys { | ||
| apiKeyKeys[ak.ID] = ak.Key | ||
| } | ||
| } |
There was a problem hiding this comment.
| for start := 0; start < len(createdAt); start += usageBackupBatchSize { | ||
| end := min(start+usageBackupBatchSize, len(createdAt)) | ||
| requests, err := db.Request.Query(). | ||
| Where(request.CreatedAtIn(createdAt[start:end]...)). | ||
| WithProject(). | ||
| WithChannel(). | ||
| WithAPIKey(). | ||
| All(ctx) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| for _, req := range requests { | ||
| addExistingUsageRequest(lookup, req) | ||
| } | ||
| } |
There was a problem hiding this comment.
Querying existing requests by CreatedAt in batches of 500 timestamps might still return a large number of rows if many requests share the exact same timestamp (e.g., in high-concurrency scenarios). While CreatedAt usually has high precision, consider if additional filters (like ProjectID) could be added to the query to further narrow down the results and improve performance during restore.
Greptile SummaryThis PR adds usage statistics (requests and logs) to the manual backup, restore, and auto-backup flows, including a new
Confidence Score: 5/5Safe to merge. The backup and restore logic is well-structured with batched queries, proper dedup, and credential redaction. The change introduces significant new code for backup and restore of usage data, but the core paths are covered by round-trip tests, deduplication logic is thorough (both ID and fingerprint-based), and credential handling is correctly guarded. The only finding is a silently-ignored omitzero JSON tag that has no runtime impact since the relevant timestamp fields are always populated for real records. internal/server/backup/types.go has the misleading omitzero tag; internal/server/backup/restore.go carries the bulk of the new logic and would benefit from an eye on the transaction scope for large datasets. Important Files Changed
Sequence DiagramsequenceDiagram
participant UI as BackupSettings UI
participant GQL as GraphQL Resolver
participant BackupSvc as backup.BackupService
participant DB as Database
Note over UI,DB: Backup Flow
UI->>GQL: "backup(opts: {includeUsageStats: true})"
GQL->>BackupSvc: BackupWithoutAuth(ctx, opts)
BackupSvc->>DB: Request.Query (batched, cursor ID)
DB-->>BackupSvc: []ent.Request (with Project, Channel edges)
BackupSvc->>DB: UsageLog.Query (batched, cursor ID)
DB-->>BackupSvc: []ent.UsageLog (with Project, Channel edges)
BackupSvc-->>GQL: JSON (MarshalJSON strips edges, redacts keys)
GQL-->>UI: backup file
Note over UI,DB: Restore Flow
UI->>GQL: "restore(data, opts: {includeUsageStats: true})"
GQL->>BackupSvc: Restore(ctx, data, opts)
BackupSvc->>DB: Load all projects, channels, keys (resolver cache)
BackupSvc->>DB: existingUsageRequests (by ID + createdAt batches)
DB-->>BackupSvc: existing requests for dedup
loop For each backup request
BackupSvc->>BackupSvc: resolve project/channel/key IDs
BackupSvc->>BackupSvc: check byID + byFingerprint dedup
BackupSvc->>DB: Request.Create (if new)
end
loop For each backup log (batched 500)
BackupSvc->>BackupSvc: resolve requestIDMap + dedup checks
BackupSvc->>DB: UsageLog.CreateBulk
end
BackupSvc-->>GQL: nil (success)
Reviews (4): Last reviewed commit: "Fix usage backup lint formatting" | Re-trigger Greptile |
| if includeAPIKeyValues { | ||
| query.WithAPIKey() | ||
| } |
There was a problem hiding this comment.
Discarded
WithAPIKey() return value
query.WithAPIKey() is called without capturing its return value. Ent's With* methods use a pointer receiver (_q *RequestQuery) and mutate the query in-place, so the eager load is registered correctly — but the pattern is non-idiomatic and easy to misread as a no-op. It's also fragile: if a future Ent refactor changes With* to return a new query object rather than self, this silently stops loading API key edges and req.Edges.APIKey would always be nil for callers expecting key values.
| func (svc *BackupService) backupUsageRequests(ctx context.Context, includeAPIKeyValues bool) ([]*BackupUsageRequest, error) { | ||
| var usageRequestDataList []*BackupUsageRequest | ||
| lastID := 0 | ||
|
|
||
| for { | ||
| query := svc.db.Request.Query(). | ||
| Where(request.IDGT(lastID)). | ||
| Order(ent.Asc(request.FieldID)). | ||
| Limit(usageBackupBatchSize). | ||
| WithProject(). | ||
| WithChannel() | ||
| if includeAPIKeyValues { | ||
| query.WithAPIKey() | ||
| } | ||
|
|
||
| usageRequests, err := query.All(ctx) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| if len(usageRequests) == 0 { | ||
| break | ||
| } | ||
|
|
||
| for _, req := range usageRequests { | ||
| usageRequestDataList = append(usageRequestDataList, backupUsageRequest(req, includeAPIKeyValues)) | ||
| lastID = req.ID | ||
| } | ||
|
|
||
| if len(usageRequests) < usageBackupBatchSize { | ||
| break | ||
| } | ||
| } | ||
|
|
||
| return usageRequestDataList, nil |
There was a problem hiding this comment.
Full dataset accumulated in memory before serialization
backupUsageRequests and backupUsageLogs each batch-query from the DB (500 at a time) but accumulate every record into an in-memory slice before returning. For large deployments with millions of requests, both slices — plus the final json.Marshal(backupData) call that holds another representation — will coexist in heap. A streaming JSON approach (e.g. writing records as they are fetched and encoding directly to an io.Writer) would bound peak memory to roughly one batch rather than the full dataset.
| log.Warn(ctx, "API key not found for restoring usage request, restoring with null API key", | ||
| log.Int("request_id", oldID), | ||
| ) | ||
| } | ||
|
|
||
| if existing, ok := existingRequests.byID[oldID]; ok { | ||
| if sameUsageRequest(existing, reqData, projectID, channelID, apiKeyID) { | ||
| idMap[oldID] = existing.ID | ||
| continue | ||
| } | ||
| } | ||
| if existing, ok := existingRequests.byFingerprint[usageRequestBackupFingerprint(reqData)]; ok { | ||
| idMap[oldID] = existing.ID | ||
| continue | ||
| } | ||
|
|
There was a problem hiding this comment.
sameUsageRequest compares ChannelID directly without accounting for failed resolution
When resolveChannelID cannot resolve the channel (returns ok=false), channelID is 0. The comparison existing.ChannelID == channelID then only matches an existing request whose DB channel is already null. If a request from the first restore was written with a valid channel ID (because the channel existed then but was later deleted), this check returns false, the fingerprint check also fails (channel name differs), and the re-restore inserts a duplicate record. This is an edge case only triggered when a referenced channel disappears between the original restore and a re-restore.
|
usage 表可能太大了,每天备份也不合适吧。 |
|
有道理,usage stats 可能会非常大,所以我已经把自动备份里的 usage stats 改成默认不包含,避免每天备份时生成过大的备份文件。 不过这个选项还是保留给用户手动开启,因为有些用户确实需要备份使用统计,比如迁移实例、灾难恢复,或者保留历史请求量、token 用量、成本统计和使用日志,避免恢复后统计页面的数据全部丢失。 |
|
ci 挂了 |
|
不好意思漏了个import。已在最新提交中修复,等待工作流批准以重新运行检查。 |
|
CI 之前只剩 lint 的格式问题,我已经修复并推送了,现在等你批准 workflow 重新跑即可。 |
Summary
includeUsageStatsto GraphQL inputs/settings and the backup settings UIIncludeAPIKeysis explicitly enabled; avoid restoring API key links from raw numeric IDsAPIKeyIDwhen adding optional API-key restore metadataTesting
git diff --checkgoandgofmtare not available in this local environment