-
-
Notifications
You must be signed in to change notification settings - Fork 202
Optimize Spider Core Strategy with Power-Law Distribution #9229
Copy link
Copy link
Closed
Labels
Description
The DevIndex Spider's "Core: High Stars" strategy currently uses a linear random distribution to select repository star ranges between minStars (1000) and 20,000.
Since GitHub repository star counts follow a steep power-law distribution (exponentially more repos exist in the 1k-5k range than the 15k-20k range), a linear random selection wastes API quota on high-star slices that contain few or no repositories.
Tasks:
- Update
apps/devindex/services/Spider.mjsto apply a power curve (e.g.,Math.pow(Math.random(), 3)) to the random offset calculation in thepickStrategymethod. - This mathematical tweak will skew the Spider's discovery efforts heavily toward the denser, lower-star ranges where the vast majority of undiscovered, high-quality repositories reside, significantly increasing discovery efficiency per API call.
Reactions are currently unavailable