Skip to content

kimjune01/pageleft

Repository files navigation

🍄 PageLeft

Semantic search over copyleft-licensed web pages.

pageleft.cc | Manifesto

API

Search

GET /api/search?q=<query>&limit=20&compiles
  • Chunks: each page is split into paragraphs, embedded separately. Queries match the specific paragraph, not a page average.
  • Ranking: semantic_score (cosine sim) * rank_score (PageRank) * quality (compounding review scores). Compilable pages get 2x.
  • To index a page, POST /api/contribute/page with {"url":"..."}. The server fetches, verifies copyleft license, chunks, and queues for embedding.

Params:

  • q — search query or URL to index
  • limit — max results (default 20)
  • compiles — return only pages with reference implementations

Federated workers

Workers donate crawl, embedding, and quality-review compute. Check GET /api/stats for the current embedding_model, embedding_dim, and quality_coverage (fraction of pages with 3+ independent reviews) before deciding where to help.

  1. GET /api/frontier?limit=10 — claim URLs to crawl
  2. POST /api/contribute/page — submit crawled page (license is re-verified server-side)
  3. GET /api/work/embed?limit=10 — claim chunks that need embeddings. Every item has {chunk_id, page_id, text}. Pages without chunks are auto-chunked on demand.
  4. POST /api/embed — compute embedding via the server's model. Send {"text":"..."} or {"texts":["..."]} (max 32), get back {embedding, dim} or {embeddings, dim}. No local model or HF token needed.
  5. POST /api/contribute/embeddings — batch submit: [{"chunk_id":N,"embedding":[...]}] (max 100)
  6. GET /api/work/quality?limit=10 — claim random pages for quality review (returns page_id, url, title, text_content)
  7. POST /api/contribute/quality — submit quality score (page_id, score 0.0–1.0, model used). Each score compounds into the page's quality factor, which scales search ranking. No binary eviction — low-quality pages sink gradually.
  8. POST /api/contribute/compilable — flag a page as compilable (page_id, compilable bool). Pages with reference implementations get a 2x ranking boost.

Leaderboard

GET /api/leaderboard?type=review&n=10

Anonymous contributor rankings by hashed fingerprint. Filter by type (review, embed, crawl) or omit for all. n defaults to 10.

Claude Code Plugin

claude plugin marketplace add kimjune01/pageleft
claude plugin install pageleft@pageleft

Then use /pageleft <query> to search copyleft sources from any project.

Contributing

See CONTRIBUTING.md or read Why Contribute.

License

AGPL-3.0-or-later

About

Copyleft-only semantic search engine: embeddings + PageRank + license detection

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors