Skip to content

Optimize Spider Core Strategy with Power-Law Distribution #9229

@tobiu

Description

@tobiu

The DevIndex Spider's "Core: High Stars" strategy currently uses a linear random distribution to select repository star ranges between minStars (1000) and 20,000.

Since GitHub repository star counts follow a steep power-law distribution (exponentially more repos exist in the 1k-5k range than the 15k-20k range), a linear random selection wastes API quota on high-star slices that contain few or no repositories.

Tasks:

  1. Update apps/devindex/services/Spider.mjs to apply a power curve (e.g., Math.pow(Math.random(), 3)) to the random offset calculation in the pickStrategy method.
  2. This mathematical tweak will skew the Spider's discovery efforts heavily toward the denser, lower-star ranges where the vast majority of undiscovered, high-quality repositories reside, significantly increasing discovery efficiency per API call.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions