Skip to content

Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch#96

Merged
tricknotes merged 56 commits intomainfrom
claude/xenodochial-engelbart
Apr 8, 2026
Merged

Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch#96
tricknotes merged 56 commits intomainfrom
claude/xenodochial-engelbart

Conversation

@tricknotes
Copy link
Copy Markdown
Owner

@tricknotes tricknotes commented Mar 19, 2026

背景

GitHub API の変更により WatchEvent データが取得できなくなったため、
/users/{login}/starred エンドポイントから star 情報を取得する方式に変更する。
データはキャッシュ・アーカイブ目的で PostgreSQL に保存する。

主な変更

データ層

  • StarEvent / Repository を Mongoid → ActiveRecord (PostgreSQL) に移行
  • star_events テーブル: actor_login, repo_name, repo_owner, starred_at
  • repositories テーブル: stargazers_count など随時更新されるメタデータを分離
  • repo_owner カラムを追加し StarEvent.owner の LIKE クエリを廃止

フェッチ戦略

  • コントローラーでのオンデマンドフェッチを廃止
  • rake star_events:fetch タスクで全ユーザーの star を定期取得
  • コントローラーは DB に存在するデータのみを表示

削除

  • mongoid gem、config/mongoid.yml
  • lib/tasks/fetch_repositories.rake

🤖 Generated with Claude Code

@tricknotes tricknotes marked this pull request as draft March 19, 2026 18:23
tricknotes and others added 7 commits March 21, 2026 17:56
…orage

GitHub API changes made WatchEvent data unavailable. Instead of relying on
stored events, StarEvent now fetches starred repos from GitHub API
(/users/{login}/starred) on demand and persists them in PostgreSQL for
caching and archival purposes.

- Rewrite StarEvent and Repository from Mongoid to ActiveRecord
- Add fetch_and_upsert class method to StarEvent for GitHub API integration
- Store repository metadata in a separate repositories table
- Update controllers to fetch on demand before reading from DB
- Remove mongoid gem, config/mongoid.yml, and MongoDB from CI/Docker
- Remove fetch_repositories.rake (no longer needed)
- Update views, helpers, rake tasks, and test support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alias_method doesn't work with ActiveRecord attribute methods;
use a regular method definition instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use ::OpenStruct to avoid NameError in Repository model
- Handle Time objects from Octokit in fetch_starred_since
- Add webmock and stub StarEvent.fetch_and_upsert in tests to
  prevent unintended GitHub API calls during test suite

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ruby 3.4+ requires explicit ostruct gem as it was removed from
the default gems. Revert ::OpenStruct to OpenStruct now that the
gem is properly declared.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tricknotes tricknotes force-pushed the claude/xenodochial-engelbart branch from 9d5685a to b63b05b Compare March 21, 2026 08:58
tricknotes and others added 2 commits March 23, 2026 12:27
- Update avatar_image_tag/image_link_to_github_url to handle OpenStruct
  (repo.owner) using respond_to?(:login) duck typing
- Update notify.text.erb to use event.actor_login instead of event['actor']['login']

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Restore Hash fallback in avatar_image_tag (lost during rebase merge)
- Update notify.text.erb to use event.actor_login instead of event['actor']['login']

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Claude
Copy link
Copy Markdown
Contributor

Claude AI commented Mar 24, 2026

@tricknotes I've opened a new pull request, #97, to work on those changes. Once the pull request is ready, I'll request review from you.

@tricknotes tricknotes force-pushed the claude/xenodochial-engelbart branch from e90d9b3 to 0d916ef Compare March 24, 2026 15:11
@tricknotes tricknotes changed the title Replace MongoDB-based StarEvent with GitHub API fetch + PostgreSQL storage Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch Mar 24, 2026
tricknotes and others added 11 commits March 25, 2026 01:10
- Remove StarEvent.fetch_and_upsert calls from controllers
  (activities, dashboard, stars) - controllers now read from DB only
- Remove User#fetch_star_events (no longer needed)
- Add lib/tasks/fetch_star_events.rake for periodic background fetch
- Remove fetch_and_upsert stub from rails_helper (no longer needed)
- Skip Settings.url_options in test env to avoid BASE_URL port
  leaking into action_mailer.default_url_options

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The repo_* keys in the intermediate hash were not DB columns but
appeared alongside DB columns, causing confusion. Refactor so that
fetch_starred_since returns two distinct collections: star_events
(only the star_events table fields) and repos (repository fields).
upsert_repositories now receives repo data directly without needing
the repo_* prefixed keys.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
all_by and by were identical — both delegate to where(actor_login:)
which accepts a single value or an array. Remove the by class method,
rename all_by scope to by, and update the one call site in User.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace LIKE-based owner scope with an equality check on the new
repo_owner column (indexed). Populate repo_owner from repo.owner.login
during fetch, and derive it from repo_name in stub_star_event! for tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Data.define provides an immutable value object with an explicit
interface, no method_missing overhead, and errors on unknown
attributes. Also removes the ostruct gem dependency since Data
is built into Ruby 3.2+.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- by_name: was used in the old Mongoid-based repository! lookup on
  StarEvent, which was removed during the PostgreSQL migration
- watchers_count: compatibility alias left over from when repositories
  were fetched directly from GitHub API responses; views access
  stargazers_count directly on the ActiveRecord model

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Include repo_owner column and its index directly in the initial
create_star_events migration, removing the separate
add_repo_owner_to_star_events migration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Controllers no longer call StarEvent.fetch_and_upsert directly;
GitHub API access is only triggered via the rake task, which is
never executed during the test suite. There are no stub_request
usages, so webmock provides no value.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tricknotes
Copy link
Copy Markdown
Owner Author

It's production ready 🎉

heroku run "rails star_events:fetch_per_user[24] FETCH_CONCURRENCY=20" -a   2.98s user 1.20s system 0% cpu 1:12:56.84 total

tricknotes and others added 2 commits April 5, 2026 15:49
Three targeted improvements to address R15 (memory quota exceeded):

1. Reuse a single persistent HTTPS connection in fetch_and_upsert_graphql
   Previously execute_graphql created a new Net::HTTP object per batch,
   causing repeated TLS handshakes and OpenSSL context allocations that
   accumulated across N_logins/GRAPHQL_BATCH_SIZE batches before GC could
   reclaim them.  Net::HTTP.start now opens one connection for all batches,
   reducing connection objects from O(N_batches) to O(1).

2. Slice users in fetch_per_user rake task with GC.compact between slices
   find_in_batches(batch_size: FETCH_CONCURRENCY) replaces find_each with
   a growing futures array.  Each slice of FETCH_CONCURRENCY users is fully
   processed and GC.compact is called before the next slice loads, preventing
   all users' logins arrays and Sawyer response objects from coexisting in
   memory simultaneously.

3. Periodic GC.compact in fetch_and_upsert_per_user (REST path)
   Compact the heap every FETCH_CONCURRENCY iterations to release Sawyer /
   Faraday response objects that accumulate in a tight sequential loop before
   the GC gets a chance to collect them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fetch_and_upsert_per_user now delegates to fetch_and_upsert_graphql,
reducing HTTP round-trips from N_logins to N_logins / GRAPHQL_BATCH_SIZE
while still using each app-user's own token for their independent rate-limit
budget.

fetch_and_upsert_graphql gains an optional fallback_client: parameter so
the REST pagination fallback uses the same token context as the GraphQL
phase instead of falling back to the shared Settings.github_client.
@tricknotes tricknotes force-pushed the claude/xenodochial-engelbart branch from 7b97679 to e50117a Compare April 5, 2026 07:30
tricknotes and others added 2 commits April 5, 2026 16:38
GRAPHQL_BATCH_SIZE: 20 → 5, GRAPHQL_PAGE_SIZE: 30 → 10 (now env-configurable).
The previous defaults triggered GitHub's per-query resource limit due to the
multiplicative complexity of batched aliases × nested fields.  Smaller values
keep each query well within the threshold; the persistent HTTPS connection
absorbs the extra round-trips without TLS overhead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
github_client.following returns both Users and Organizations, but the GraphQL
query uses user(login:) which only resolves User accounts.  Filtering by
type == 'User' at the source removes orgs before they reach any fetch path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tricknotes tricknotes force-pushed the claude/xenodochial-engelbart branch from e50117a to e2469cc Compare April 5, 2026 07:39
tricknotes and others added 7 commits April 5, 2026 16:43
fetch_and_upsert (shared-token REST path) is superseded by the per-user
GraphQL approach.  Remove the method and its corresponding rake task to
eliminate dead code.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The method was a thin wrapper around fetch_and_upsert_graphql.  Call
fetch_and_upsert_graphql directly from the rake task and remove the wrapper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The transport protocol is an implementation detail and does not belong in
the public interface name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…etch

fetch_graphql used a single shared token and was a degraded version of fetch,
which multiplies the rate limit by using each user's own token.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tricknotes and others added 2 commits April 6, 2026 10:52
Covers: GraphQL happy path, private repo filtering, since threshold,
mixed-age edges, org/NOT_FOUND handling, multi-login batching (one
HTTP request per batch), REST fallback when hasNextPage, and idempotency.
Net::HTTP and Octokit are stubbed so no real GitHub requests are made.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…clusion

- User#followings: verify that Organization type accounts are excluded
  and only User type accounts are returned
- StarEvent.fetch_and_upsert: verify that private repos are skipped
  in the REST fallback path (fetch_each_page), consistent with the
  GraphQL path behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tricknotes tricknotes marked this pull request as ready for review April 7, 2026 05:05
@tricknotes
Copy link
Copy Markdown
Owner Author

Now, this PR works on production 🚀

@tricknotes tricknotes merged commit f5c4a5b into main Apr 8, 2026
3 checks passed
@tricknotes tricknotes deleted the claude/xenodochial-engelbart branch April 8, 2026 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants