-
-
Notifications
You must be signed in to change notification settings - Fork 202
Implement maxUsers cap for DevIndex GitHub Spider #9224
Copy link
Copy link
Closed
Labels
Description
The DevIndex user database (users.jsonl) has grown to ~20MB (44k users). We need to implement a user cap (maxUsers) to prevent unbounded growth.
The strategy is to maintain a maximum number of users by pruning those with the lowest total contributions when new, higher-contributing users are discovered.
Tasks:
- Configuration: Add
maxUserstoapps/devindex/services/config.mjs(e.g., default to 50,000). - Spider Adjustment (
Spider.mjs): The Spider currently adds any valid discovered user to the Tracker (tracker.json). If we are at themaxUserscap, the Spider should only add new candidates if they are likely to displace a bottom-tier user. However, since the Spider doesn't fetch full stats, it might need to add them anyway, leaving the evaluation to the Updater. Alternative: The Spider might need a lightweight check, or we just let the Updater handle all pruning. - Updater Adjustment (
Updater.mjs): The Updater is where the actual evaluation happens. After fetching a user's stats, if the user meets theminTotalContributionsthreshold AND the total tracked users exceedmaxUsers, the Updater must:- Compare the new user's total contributions (
tc) against the lowesttcin the currentusers.jsonl. - If the new user has more, add the new user and prune the bottom user(s) to maintain the cap.
- If the new user has fewer, discard the new user.
- Compare the new user's total contributions (
- Storage adjustments (
Storage.mjs): Ensure the sorting and pruning logic is efficient given the file size.
Note: The exact division of labor between Spider (discovery) and Updater (evaluation) needs to be finalized during implementation.
Reactions are currently unavailable