Skip to content

[hotfix] process eval task OOM#207

Merged
JayaSurya-27 merged 2 commits intomainfrom
hotfix/process_eval_task_oom
May 5, 2026
Merged

[hotfix] process eval task OOM#207
JayaSurya-27 merged 2 commits intomainfrom
hotfix/process_eval_task_oom

Conversation

@JayaSurya-27
Copy link
Copy Markdown
Contributor

@JayaSurya-27 JayaSurya-27 commented May 5, 2026

Summary

The task previously hydrated full ObservationSpan instances — including
large attribute and I/O payloads — before dispatching evaluations. On
eval tasks covering a high volume of spans this caused worker OOMs.

This change limits the query to span IDs and removes redundant
materialization:

  • Restrict fetched columns via .only("id") and values_list("id", flat=True)
    in place of full model instances.
  • In the sampling path, reuse the IDs returned by the random-sample query
    directly rather than issuing a second filter(id__in=...) round-trip.
  • Apply the cnt cap by slicing the queryset (list(qs[:cnt])) so the LIMIT
    is pushed to Postgres instead of forcing full evaluation via len().
  • Dispatch celery tasks by iterating span IDs directly.

Behavior is unchanged: the same set of span IDs is enqueued for
evaluation and persisted to spanids_processed.

Type of change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation only
  • 🧹 Chore / refactor (no user-visible change)
  • 🚀 Performance improvement
  • 🧪 Test-only change

How was this tested?

Screenshots / recordings (if UI)

Checklist

  • My code follows the style guide
  • I've added tests that prove my fix is effective or that my feature works
  • make check-all / yarn check-all passes locally
  • I've updated the documentation where relevant
  • No hardcoded secrets, URLs, or PII
  • I've signed the CLA

@JayaSurya-27 JayaSurya-27 self-assigned this May 5, 2026
@JayaSurya-27 JayaSurya-27 added the bug Something isn't working label May 5, 2026
@JayaSurya-27 JayaSurya-27 merged commit 33a6977 into main May 5, 2026
2 of 3 checks passed
@JayaSurya-27 JayaSurya-27 deleted the hotfix/process_eval_task_oom branch May 5, 2026 07:30
@khushalsonawat khushalsonawat restored the hotfix/process_eval_task_oom branch May 5, 2026 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend bug Something isn't working tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants