Outline: Shrink llms-txt output to <200_000 input tokens#57
Conversation
|
Warning Rate limit exceeded@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 53 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (4)
WalkthroughThis change introduces a Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant OutlineLoader
participant OutlineModel
participant llmstxt.py
User->>llmstxt.py: Generate llms.txt / llms-full.txt
llmstxt.py->>OutlineLoader: Load YAML outline
OutlineLoader->>OutlineModel: Parse items (with markdown_enabled flag)
llmstxt.py->>OutlineModel: Export to Markdown (skip markdown_enabled: false)
OutlineModel-->>llmstxt.py: Filtered Markdown output
llmstxt.py->>User: Write llms.txt / llms-full.txt files
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
| They also influence the behaviour when the records are queried. | ||
| parents: [ sql-syntax ] | ||
| tags: [ sql ] | ||
| # FIXME: This needs about 40_000 input tokens. Maybe a stripped-down variant could help? | ||
| enabled: false |
There was a problem hiding this comment.
The section # SQL: Data types needed to be disabled to save a whole bunch of input tokens.
FIXME: This needs about 40_000 input tokens. Maybe a stripped-down variant could help?
Please bring it back.
There was a problem hiding this comment.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CHANGES.md(1 hunks)src/cratedb_about/bundle/llmstxt.py(1 hunks)src/cratedb_about/outline/cratedb-outline.yaml(13 hunks)src/cratedb_about/outline/model.py(2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:16:33.171Z
Learning: When creating content for an `llms.txt` file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
CHANGES.md (4)
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an llms.txt file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:16:33.171Z
Learning: When creating content for an llms.txt file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an llms.txt file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: #32
File: src/cratedb_about/outline/cratedb-outline.yaml:321-329
Timestamp: 2025-05-15T21:25:54.870Z
Learning: In the CrateDB outline YAML, content organization prioritizes thematic grouping (keeping related topics together) over content type grouping (separating tutorials from reference docs), as demonstrated by placing the multi-tenancy tutorial alongside user management and privileges documentation in the API section.
src/cratedb_about/bundle/llmstxt.py (3)
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an llms.txt file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:20:35.508Z
Learning: When creating content for an llms.txt file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
Learnt from: amotl
PR: crate/about#0
File: :0-0
Timestamp: 2025-04-16T14:16:33.171Z
Learning: When creating content for an llms.txt file (following the llmstxt.org specification), consistent and straightforward language takes precedence over stylistic variation since the primary audience is language models rather than human readers.
src/cratedb_about/outline/cratedb-outline.yaml (1)
Learnt from: amotl
PR: #32
File: src/cratedb_about/outline/cratedb-outline.yaml:321-329
Timestamp: 2025-05-15T21:25:54.870Z
Learning: In the CrateDB outline YAML, content organization prioritizes thematic grouping (keeping related topics together) over content type grouping (separating tutorials from reference docs), as demonstrated by placing the multi-tenancy tutorial alongside user management and privileges documentation in the API section.
🧬 Code Graph Analysis (1)
src/cratedb_about/bundle/llmstxt.py (2)
src/cratedb_about/cli.py (1)
outline(55-75)src/cratedb_about/outline/model.py (2)
to_markdown(72-84)to_llms_txt(92-115)
🔇 Additional comments (6)
src/cratedb_about/outline/model.py (1)
28-28: Good addition of the enabled flag with appropriate default.The
enabled: bool = Trueattribute provides a clean way to control item visibility while maintaining backward compatibility.CHANGES.md (1)
4-4: Clear and accurate changelog entry.The entry succinctly documents the token limit reduction achievement, aligning with the PR objectives and using straightforward language appropriate for the llms.txt specification.
src/cratedb_about/bundle/llmstxt.py (1)
47-51: Clean refactoring with improved separation of concerns.The changes correctly implement the intended behavior:
llms.txtnow contains the filtered outline viato_markdown()(respectingenabledflags)llms-full.txtcontains the complete expanded documentation withoptional=False- The intermediate Path variables improve code readability
This aligns well with the PR objective to reduce token count in the outline while maintaining full documentation access.
src/cratedb_about/outline/cratedb-outline.yaml (3)
296-297: Strategic disabling of high-token content.The comment indicates this data types section requires ~40,000 tokens, making it a logical choice for exclusion to meet the <200,000 token target. The decision is well-documented.
530-552: Appropriate exclusion of optional and example content.Disabling example projects, cloud-specific documentation, and some blog content is a reasonable approach to reduce token count while preserving core functionality documentation. These sections are typically less critical for basic CrateDB usage.
Also applies to: 566-603, 724-750
393-407: Enable Core Driver DocumentationDisabling all driver entries reduces token count but removes essential connection guidance. At a minimum, please re-enable the CrateDB Python Client reference:
• File:
src/cratedb_about/outline/cratedb-outline.yaml
Section: Drivers (around the “CrateDB Python Client” entry)Suggested diff:
- title: "CrateDB Python Client" tags: [ driver ] - enabled: false + enabled: trueThis ensures users still have a basic, well-supported example for connecting to CrateDB.
⛔ Skipped due to learnings
Learnt from: amotl PR: crate/about#32 File: src/cratedb_about/outline/cratedb-outline.yaml:321-329 Timestamp: 2025-05-15T21:25:54.870Z Learning: In the CrateDB outline YAML, content organization prioritizes thematic grouping (keeping related topics together) over content type grouping (separating tutorials from reference docs), as demonstrated by placing the multi-tenancy tutorial alongside user management and privileges documentation in the API section.
Problem
When using the llms-full.txt file with Anthropic Claude Sonnet or Opus, it fails because too many input tokens.
Solution
Omit a few resources to shrink the bundle file.