Skip to content

Optimize & Minify DevRank Data Schema #9059

@tobiu

Description

@tobiu

The current users.json dataset is ~6MB for ~6k users (~1KB/user). Scaling to 100k users would result in an unmanageable ~100MB payload.

We need to drastically optimize the data structure to reduce file size by ~70% while maintaining functionality.

Optimization Strategy:

  1. Short Keys (Mapping): Map verbose keys to short codes (e.g., login -> l, total_contributions -> tc).
  2. Avatar IDs: Store only the integer ID for avatars (User & Orgs), not full URLs.
  3. Years Array: Convert the years object map to a sequential array starting from first_year.
  4. Org Simplification: Limit orgs to top 5 and store as a compact tuple [login, id].
  5. Smart Formatting: Use a "One-Record-Per-Line" format (valid JSON array, but each object is minified on a single line) to balance compression with git-diffability.

Tasks:

  • Update DevRank.model.Contributor with field mappings and reconstruction logic.
  • Update DevRank.services.Updater to persist data in this new format.
  • Migrate existing users.json to the new schema.

Metadata

Metadata

Assignees

Labels

aienhancementNew feature or requestperformancePerformance improvements and optimizations

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions