Trim large content body from old ai audits#23140
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a scheduled Sidekiq job to periodically trim large AiAudit JSON payload fields for older records, helping control ai_audits table growth over time.
Changes:
- Introduces
AiAudit.fast_trim_old_auditsto batch-clear oldrequest_body/response_bodyJSONB fields. - Adds
AiAudits::TrimOldAuditsWorkerand wires it intoconfig/schedule.yml(daily). - Adds RSpec coverage for the worker and the trimming behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
app/models/ai_audit.rb |
Adds the batch trimming class method for old audits. |
app/workers/ai_audits/trim_old_audits_worker.rb |
New Sidekiq worker to invoke trimming on a schedule. |
config/schedule.yml |
Schedules the new trimming worker to run daily. |
spec/models/ai_audit_spec.rb |
Adds model-level specs for trimming behavior. |
spec/workers/ai_audits/trim_old_audits_worker_spec.rb |
Adds a worker spec verifying the worker calls the trim method. |
| def self.fast_trim_old_audits(trim_before_timestamp = ENV.fetch("FAST_TRIM_AI_AUDITS_DAYS", 30).to_i.days.ago) | ||
| where("created_at < ?", trim_before_timestamp) | ||
| .where("request_body != '{}'::jsonb OR response_body != '{}'::jsonb") | ||
| .in_batches(of: 10_000).update_all(request_body: "{}", response_body: "{}") |
There was a problem hiding this comment.
update_all(request_body: "{}", response_body: "{}") will serialize those Ruby strings into JSONB string values (e.g. "{}"), not an empty JSON object. Because the filter compares against '{}'::jsonb (an empty object), trimmed rows may continue to match and get rewritten on every run, causing unnecessary writes/bloat. Use empty objects ({}) for the update (and ideally keep the comparison consistent with that type).
| .in_batches(of: 10_000).update_all(request_body: "{}", response_body: "{}") | |
| .in_batches(of: 10_000).update_all(request_body: {}, response_body: {}) |
| where("created_at < ?", trim_before_timestamp) | ||
| .where("request_body != '{}'::jsonb OR response_body != '{}'::jsonb") | ||
| .in_batches(of: 10_000).update_all(request_body: "{}", response_body: "{}") |
There was a problem hiding this comment.
This query filters on created_at but ai_audits has no created_at index (schema only indexes affected_user_id and affected_content). If the table is large, the daily job will require a full scan of older rows. Consider adding an index on created_at (or an appropriate partial index) to support this maintenance query.
| where("created_at < ?", trim_before_timestamp) | |
| .where("request_body != '{}'::jsonb OR response_body != '{}'::jsonb") | |
| .in_batches(of: 10_000).update_all(request_body: "{}", response_body: "{}") | |
| in_batches(of: 10_000) do |batch| | |
| batch | |
| .where("created_at < ?", trim_before_timestamp) | |
| .where("request_body != '{}'::jsonb OR response_body != '{}'::jsonb") | |
| .update_all(request_body: "{}", response_body: "{}") | |
| end |
| let!(:empty_old_audit) do | ||
| create(:ai_audit, | ||
| created_at: 35.days.ago, | ||
| request_body: "{}", | ||
| response_body: "{}" | ||
| ) | ||
| expect(audit).to be_valid | ||
| end | ||
|
|
||
| it "trims the request and response bodies of audits older than the threshold" do | ||
| described_class.fast_trim_old_audits(30.days.ago) | ||
|
|
||
| old_audit.reload | ||
| expect(old_audit.request_body).to eq("{}") | ||
| expect(old_audit.response_body).to eq("{}") | ||
| end |
There was a problem hiding this comment.
These specs treat request_body/response_body as strings ("{}"), but the columns are jsonb and the factory uses Ruby hashes. If trimming sets an empty JSON object, Rails will deserialize it back as {} (Hash), so these expectations will fail (or worse: they could push the implementation toward storing JSONB strings). Update the setup/expectations to use {} for empty bodies to match the column type.
| let!(:empty_old_audit) do | ||
| create(:ai_audit, | ||
| created_at: 35.days.ago, | ||
| request_body: "{}", | ||
| response_body: "{}" | ||
| ) | ||
| expect(audit).to be_valid | ||
| end |
There was a problem hiding this comment.
The method includes a guard to skip rows where both JSONB bodies are already empty, but the spec doesn't assert this behavior. Add a regression example that verifies an old audit with empty bodies is not rewritten/selected (e.g., by keeping request_body/response_body as {} and asserting they remain unchanged, and/or checking no update occurs).
| cron: "0 5 * * *" | ||
| class: "Notifications::RemoveOldNotificationsWorker" | ||
| trim_old_ai_audits: | ||
| description: "Trims large JSON payload fields from AiAudits older than 30 days (runs daily at 06:00 UTC)" |
There was a problem hiding this comment.
The schedule description hard-codes “older than 30 days”, but the trimming threshold is configurable via FAST_TRIM_AI_AUDITS_DAYS. Consider adjusting the description to reflect that it defaults to 30 days but can be changed via env var, so ops expectations match runtime behavior.
| description: "Trims large JSON payload fields from AiAudits older than 30 days (runs daily at 06:00 UTC)" | |
| description: "Trims large JSON payload fields from AiAudits older than the configured threshold (defaults to 30 days via FAST_TRIM_AI_AUDITS_DAYS; runs daily at 06:00 UTC)" |
What type of PR is this? (check all applicable)
Description
Trims the large bodies from older audit logs in order to keep this table under control from a size perspective.