Skip to content

Trim large content body from old ai audits#23140

Merged
benhalpern merged 2 commits intomainfrom
ben/prune-ai-audits
Apr 14, 2026
Merged

Trim large content body from old ai audits#23140
benhalpern merged 2 commits intomainfrom
ben/prune-ai-audits

Conversation

@benhalpern
Copy link
Copy Markdown
Contributor

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update

Description

Trims the large bodies from older audit logs in order to keep this table under control from a size perspective.

Copilot AI review requested due to automatic review settings April 14, 2026 14:48
@benhalpern benhalpern requested review from a team as code owners April 14, 2026 14:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a scheduled Sidekiq job to periodically trim large AiAudit JSON payload fields for older records, helping control ai_audits table growth over time.

Changes:

  • Introduces AiAudit.fast_trim_old_audits to batch-clear old request_body/response_body JSONB fields.
  • Adds AiAudits::TrimOldAuditsWorker and wires it into config/schedule.yml (daily).
  • Adds RSpec coverage for the worker and the trimming behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
app/models/ai_audit.rb Adds the batch trimming class method for old audits.
app/workers/ai_audits/trim_old_audits_worker.rb New Sidekiq worker to invoke trimming on a schedule.
config/schedule.yml Schedules the new trimming worker to run daily.
spec/models/ai_audit_spec.rb Adds model-level specs for trimming behavior.
spec/workers/ai_audits/trim_old_audits_worker_spec.rb Adds a worker spec verifying the worker calls the trim method.

Comment thread app/models/ai_audit.rb Outdated
def self.fast_trim_old_audits(trim_before_timestamp = ENV.fetch("FAST_TRIM_AI_AUDITS_DAYS", 30).to_i.days.ago)
where("created_at < ?", trim_before_timestamp)
.where("request_body != '{}'::jsonb OR response_body != '{}'::jsonb")
.in_batches(of: 10_000).update_all(request_body: "{}", response_body: "{}")
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_all(request_body: "{}", response_body: "{}") will serialize those Ruby strings into JSONB string values (e.g. "{}"), not an empty JSON object. Because the filter compares against '{}'::jsonb (an empty object), trimmed rows may continue to match and get rewritten on every run, causing unnecessary writes/bloat. Use empty objects ({}) for the update (and ideally keep the comparison consistent with that type).

Suggested change
.in_batches(of: 10_000).update_all(request_body: "{}", response_body: "{}")
.in_batches(of: 10_000).update_all(request_body: {}, response_body: {})

Copilot uses AI. Check for mistakes.
Comment thread app/models/ai_audit.rb Outdated
Comment on lines +6 to +8
where("created_at < ?", trim_before_timestamp)
.where("request_body != '{}'::jsonb OR response_body != '{}'::jsonb")
.in_batches(of: 10_000).update_all(request_body: "{}", response_body: "{}")
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query filters on created_at but ai_audits has no created_at index (schema only indexes affected_user_id and affected_content). If the table is large, the daily job will require a full scan of older rows. Consider adding an index on created_at (or an appropriate partial index) to support this maintenance query.

Suggested change
where("created_at < ?", trim_before_timestamp)
.where("request_body != '{}'::jsonb OR response_body != '{}'::jsonb")
.in_batches(of: 10_000).update_all(request_body: "{}", response_body: "{}")
in_batches(of: 10_000) do |batch|
batch
.where("created_at < ?", trim_before_timestamp)
.where("request_body != '{}'::jsonb OR response_body != '{}'::jsonb")
.update_all(request_body: "{}", response_body: "{}")
end

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +35
let!(:empty_old_audit) do
create(:ai_audit,
created_at: 35.days.ago,
request_body: "{}",
response_body: "{}"
)
expect(audit).to be_valid
end

it "trims the request and response bodies of audits older than the threshold" do
described_class.fast_trim_old_audits(30.days.ago)

old_audit.reload
expect(old_audit.request_body).to eq("{}")
expect(old_audit.response_body).to eq("{}")
end
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These specs treat request_body/response_body as strings ("{}"), but the columns are jsonb and the factory uses Ruby hashes. If trimming sets an empty JSON object, Rails will deserialize it back as {} (Hash), so these expectations will fail (or worse: they could push the implementation toward storing JSONB strings). Update the setup/expectations to use {} for empty bodies to match the column type.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +27
let!(:empty_old_audit) do
create(:ai_audit,
created_at: 35.days.ago,
request_body: "{}",
response_body: "{}"
)
expect(audit).to be_valid
end
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method includes a guard to skip rows where both JSONB bodies are already empty, but the spec doesn't assert this behavior. Add a regression example that verifies an old audit with empty bodies is not rewritten/selected (e.g., by keeping request_body/response_body as {} and asserting they remain unchanged, and/or checking no update occurs).

Copilot generated this review using guidance from repository custom instructions.
Comment thread config/schedule.yml Outdated
cron: "0 5 * * *"
class: "Notifications::RemoveOldNotificationsWorker"
trim_old_ai_audits:
description: "Trims large JSON payload fields from AiAudits older than 30 days (runs daily at 06:00 UTC)"
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schedule description hard-codes “older than 30 days”, but the trimming threshold is configurable via FAST_TRIM_AI_AUDITS_DAYS. Consider adjusting the description to reflect that it defaults to 30 days but can be changed via env var, so ops expectations match runtime behavior.

Suggested change
description: "Trims large JSON payload fields from AiAudits older than 30 days (runs daily at 06:00 UTC)"
description: "Trims large JSON payload fields from AiAudits older than the configured threshold (defaults to 30 days via FAST_TRIM_AI_AUDITS_DAYS; runs daily at 06:00 UTC)"

Copilot uses AI. Check for mistakes.
@benhalpern benhalpern merged commit 57e0846 into main Apr 14, 2026
20 checks passed
@benhalpern benhalpern deleted the ben/prune-ai-audits branch April 14, 2026 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants