Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch#96
Merged
tricknotes merged 56 commits intomainfrom Apr 8, 2026
Merged
Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch#96tricknotes merged 56 commits intomainfrom
tricknotes merged 56 commits intomainfrom
Conversation
…orage
GitHub API changes made WatchEvent data unavailable. Instead of relying on
stored events, StarEvent now fetches starred repos from GitHub API
(/users/{login}/starred) on demand and persists them in PostgreSQL for
caching and archival purposes.
- Rewrite StarEvent and Repository from Mongoid to ActiveRecord
- Add fetch_and_upsert class method to StarEvent for GitHub API integration
- Store repository metadata in a separate repositories table
- Update controllers to fetch on demand before reading from DB
- Remove mongoid gem, config/mongoid.yml, and MongoDB from CI/Docker
- Remove fetch_repositories.rake (no longer needed)
- Update views, helpers, rake tasks, and test support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alias_method doesn't work with ActiveRecord attribute methods; use a regular method definition instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use ::OpenStruct to avoid NameError in Repository model - Handle Time objects from Octokit in fetch_starred_since - Add webmock and stub StarEvent.fetch_and_upsert in tests to prevent unintended GitHub API calls during test suite Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ruby 3.4+ requires explicit ostruct gem as it was removed from the default gems. Revert ::OpenStruct to OpenStruct now that the gem is properly declared. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9d5685a to
b63b05b
Compare
- Update avatar_image_tag/image_link_to_github_url to handle OpenStruct (repo.owner) using respond_to?(:login) duck typing - Update notify.text.erb to use event.actor_login instead of event['actor']['login'] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Restore Hash fallback in avatar_image_tag (lost during rebase merge) - Update notify.text.erb to use event.actor_login instead of event['actor']['login'] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tricknotes
commented
Mar 24, 2026
Contributor
|
@tricknotes I've opened a new pull request, #97, to work on those changes. Once the pull request is ready, I'll request review from you. |
e90d9b3 to
0d916ef
Compare
- Remove StarEvent.fetch_and_upsert calls from controllers (activities, dashboard, stars) - controllers now read from DB only - Remove User#fetch_star_events (no longer needed) - Add lib/tasks/fetch_star_events.rake for periodic background fetch - Remove fetch_and_upsert stub from rails_helper (no longer needed) - Skip Settings.url_options in test env to avoid BASE_URL port leaking into action_mailer.default_url_options Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The repo_* keys in the intermediate hash were not DB columns but appeared alongside DB columns, causing confusion. Refactor so that fetch_starred_since returns two distinct collections: star_events (only the star_events table fields) and repos (repository fields). upsert_repositories now receives repo data directly without needing the repo_* prefixed keys. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
all_by and by were identical — both delegate to where(actor_login:) which accepts a single value or an array. Remove the by class method, rename all_by scope to by, and update the one call site in User. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace LIKE-based owner scope with an equality check on the new repo_owner column (indexed). Populate repo_owner from repo.owner.login during fetch, and derive it from repo_name in stub_star_event! for tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Data.define provides an immutable value object with an explicit interface, no method_missing overhead, and errors on unknown attributes. Also removes the ostruct gem dependency since Data is built into Ruby 3.2+. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- by_name: was used in the old Mongoid-based repository! lookup on StarEvent, which was removed during the PostgreSQL migration - watchers_count: compatibility alias left over from when repositories were fetched directly from GitHub API responses; views access stargazers_count directly on the ActiveRecord model Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: tricknotes <290782+tricknotes@users.noreply.github.com> Agent-Logs-Url: https://github.com/tricknotes/starseeker/sessions/88dafe1b-34ea-48bd-b7f5-bf70463355de
Include repo_owner column and its index directly in the initial create_star_events migration, removing the separate add_repo_owner_to_star_events migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Controllers no longer call StarEvent.fetch_and_upsert directly; GitHub API access is only triggered via the rake task, which is never executed during the test suite. There are no stub_request usages, so webmock provides no value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner
Author
|
It's production ready 🎉 |
Three targeted improvements to address R15 (memory quota exceeded): 1. Reuse a single persistent HTTPS connection in fetch_and_upsert_graphql Previously execute_graphql created a new Net::HTTP object per batch, causing repeated TLS handshakes and OpenSSL context allocations that accumulated across N_logins/GRAPHQL_BATCH_SIZE batches before GC could reclaim them. Net::HTTP.start now opens one connection for all batches, reducing connection objects from O(N_batches) to O(1). 2. Slice users in fetch_per_user rake task with GC.compact between slices find_in_batches(batch_size: FETCH_CONCURRENCY) replaces find_each with a growing futures array. Each slice of FETCH_CONCURRENCY users is fully processed and GC.compact is called before the next slice loads, preventing all users' logins arrays and Sawyer response objects from coexisting in memory simultaneously. 3. Periodic GC.compact in fetch_and_upsert_per_user (REST path) Compact the heap every FETCH_CONCURRENCY iterations to release Sawyer / Faraday response objects that accumulate in a tight sequential loop before the GC gets a chance to collect them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fetch_and_upsert_per_user now delegates to fetch_and_upsert_graphql, reducing HTTP round-trips from N_logins to N_logins / GRAPHQL_BATCH_SIZE while still using each app-user's own token for their independent rate-limit budget. fetch_and_upsert_graphql gains an optional fallback_client: parameter so the REST pagination fallback uses the same token context as the GraphQL phase instead of falling back to the shared Settings.github_client.
7b97679 to
e50117a
Compare
GRAPHQL_BATCH_SIZE: 20 → 5, GRAPHQL_PAGE_SIZE: 30 → 10 (now env-configurable). The previous defaults triggered GitHub's per-query resource limit due to the multiplicative complexity of batched aliases × nested fields. Smaller values keep each query well within the threshold; the persistent HTTPS connection absorbs the extra round-trips without TLS overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
github_client.following returns both Users and Organizations, but the GraphQL query uses user(login:) which only resolves User accounts. Filtering by type == 'User' at the source removes orgs before they reach any fetch path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e50117a to
e2469cc
Compare
fetch_and_upsert (shared-token REST path) is superseded by the per-user GraphQL approach. Remove the method and its corresponding rake task to eliminate dead code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The method was a thin wrapper around fetch_and_upsert_graphql. Call fetch_and_upsert_graphql directly from the rake task and remove the wrapper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The transport protocol is an implementation detail and does not belong in the public interface name. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…etch fetch_graphql used a single shared token and was a degraded version of fetch, which multiplies the rate limit by using each user's own token. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers: GraphQL happy path, private repo filtering, since threshold, mixed-age edges, org/NOT_FOUND handling, multi-login batching (one HTTP request per batch), REST fallback when hasNextPage, and idempotency. Net::HTTP and Octokit are stubbed so no real GitHub requests are made. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…clusion - User#followings: verify that Organization type accounts are excluded and only User type accounts are returned - StarEvent.fetch_and_upsert: verify that private repos are skipped in the REST fallback path (fetch_each_page), consistent with the GraphQL path behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner
Author
|
Now, this PR works on production 🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
GitHub API の変更により WatchEvent データが取得できなくなったため、
/users/{login}/starredエンドポイントから star 情報を取得する方式に変更する。データはキャッシュ・アーカイブ目的で PostgreSQL に保存する。
主な変更
データ層
StarEvent/Repositoryを Mongoid → ActiveRecord (PostgreSQL) に移行star_eventsテーブル:actor_login,repo_name,repo_owner,starred_at等repositoriesテーブル:stargazers_countなど随時更新されるメタデータを分離repo_ownerカラムを追加しStarEvent.ownerの LIKE クエリを廃止フェッチ戦略
rake star_events:fetchタスクで全ユーザーの star を定期取得削除
mongoidgem、config/mongoid.ymllib/tasks/fetch_repositories.rake🤖 Generated with Claude Code