Skip to content

Implement maxUsers cap for DevIndex GitHub Spider #9224

@tobiu

Description

@tobiu

The DevIndex user database (users.jsonl) has grown to ~20MB (44k users). We need to implement a user cap (maxUsers) to prevent unbounded growth.

The strategy is to maintain a maximum number of users by pruning those with the lowest total contributions when new, higher-contributing users are discovered.

Tasks:

  1. Configuration: Add maxUsers to apps/devindex/services/config.mjs (e.g., default to 50,000).
  2. Spider Adjustment (Spider.mjs): The Spider currently adds any valid discovered user to the Tracker (tracker.json). If we are at the maxUsers cap, the Spider should only add new candidates if they are likely to displace a bottom-tier user. However, since the Spider doesn't fetch full stats, it might need to add them anyway, leaving the evaluation to the Updater. Alternative: The Spider might need a lightweight check, or we just let the Updater handle all pruning.
  3. Updater Adjustment (Updater.mjs): The Updater is where the actual evaluation happens. After fetching a user's stats, if the user meets the minTotalContributions threshold AND the total tracked users exceed maxUsers, the Updater must:
    • Compare the new user's total contributions (tc) against the lowest tc in the current users.jsonl.
    • If the new user has more, add the new user and prune the bottom user(s) to maintain the cap.
    • If the new user has fewer, discard the new user.
  4. Storage adjustments (Storage.mjs): Ensure the sorting and pruning logic is efficient given the file size.

Note: The exact division of labor between Spider (discovery) and Updater (evaluation) needs to be finalized during implementation.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions