Skip to content

feat: add PowerRAG SDK text QA retrieval demo#78

Merged
FutureUnreal merged 3 commits into
datawhalechina:mainfrom
pi-dal:feat/issue-75
Jan 30, 2026
Merged

feat: add PowerRAG SDK text QA retrieval demo#78
FutureUnreal merged 3 commits into
datawhalechina:mainfrom
pi-dal:feat/issue-75

Conversation

@pi-dal

@pi-dal pi-dal commented Jan 29, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Implement Markdown upload, parsing, and top-k chunk retrieval using PowerRAG SDK
  • Add CLI with configurable parameters (top_k, similarity_threshold, etc.)
  • Include comprehensive README with Docker setup, embedding configuration, and API key generation
  • Provide sample document and example questions

Related Issue: #75

- Implement Markdown upload, parsing, and top-k chunk retrieval
- Add CLI with configurable parameters (top_k, similarity_threshold, etc.)
- Include README with setup instructions (Docker, embedding config, API key)
- Provide sample document and example questions

Fixes datawhalechina#75
@pi-dal pi-dal mentioned this pull request Jan 29, 2026
6 tasks
@FutureUnreal

Copy link
Copy Markdown
Member

有两个问题。第一个是作为教学项目,如何使用是次要的,主要需要教会读者的是实现思路,最好是图文结合,希望能够代入一个小白读者的视角自己判断一下,通过读自己的文档能否学会如何实现这个项目。第二个问题是二级标题分的太多了,建议找一下自己觉得写的好的教学文档,参考一下格式和内容。这不是随便用ai完成的任务,我们需要对读者负责

- Move code/C10 content to Extra-chapter/PowerRAG-SDK-Text-QA
- Add comprehensive tutorial documentation with diagrams
- Include detailed implementation guide for text retrieval
- Add step-by-step configuration instructions for embedding setup
- Improve project organization and documentation structure
@pi-dal

pi-dal commented Jan 29, 2026

Copy link
Copy Markdown
Contributor Author

Thank you for the valuable feedback! I completely agree with your points about making this a proper educational resource.

I've made significant improvements based on your suggestions:

1. Implementation-focused tutorial with diagrams:

  • Moved the project to Extra-chapter/PowerRAG-SDK-Text-QA/ following the repository's structure
  • Added comprehensive tutorial with 3 detailed diagrams explaining:
    • End-to-end workflow (upload → parse → vectorize → retrieve)
    • Object relationships (dataset/document/chunk/embedding)
    • API interaction sequence
  • Wrote step-by-step implementation guide from a beginner's perspective
  • Each code section now includes explanations of "why" not just "how"

2. Improved structure:

  • Reduced excessive headings - reorganized into logical sections
  • Combined related content into cohesive explanations
  • Moved environment/API setup to appendix to keep the main flow focused

3. Educational approach:

  • Added "小白自检" (beginner checkpoints) throughout to verify understanding
  • Included common pitfalls and troubleshooting tips
  • Explained core concepts before diving into code

The new documentation aims to teach readers the implementation 思路 (thought process) behind RAG retrieval, not just provide a working script. I'd appreciate your review of the updated version.

Thank you for holding contributors to high standards - it makes the project better for all learners!

- Convert diagram images from JPEG to WebP for better compression
- Update image references in readme.md
- Reduce repository size while maintaining image quality
@FutureUnreal

Copy link
Copy Markdown
Member

感谢你的再次优化

@FutureUnreal FutureUnreal merged commit 700c1ee into datawhalechina:main Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants