Bundle: Fix semantics of llms.txt vs. llms-full.txt#42
Conversation
WalkthroughThis update refactors the bundle generation process by introducing a subclass for resource management, adds HTML outline generation, and updates the CLI and tests to accommodate these changes. It also clarifies the semantics of bundle files in the changelog and enhances the outline model with HTML conversion functionality. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant CrateDbLllmsTxtBuilder
participant OutlineDocument
CLI->>CrateDbLllmsTxtBuilder: Instantiate with resource paths and outline_url
CrateDbLllmsTxtBuilder->>OutlineDocument: Load outline from outline_url
CLI->>CrateDbLllmsTxtBuilder: run()
CrateDbLllmsTxtBuilder->>OutlineDocument: to_html()
CrateDbLllmsTxtBuilder->>FileSystem: Write outline.html
Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
✨ Finishing Touches
🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/cratedb_about/outline/model.py (1)
83-84: Add a docstring to maintain API documentation consistency.The new
to_htmlmethod would benefit from a docstring to maintain consistency with other methods in the class and provide clear documentation for users of this API.def to_html(self) -> str: + """Convert outline into HTML format using Markdown as an intermediate step.""" return markdown(self.to_markdown())src/cratedb_about/bundle/llmstxt.py (3)
25-25: Use a more specific type annotation for theoutlinefield.The
outlinefield is annotated witht.Any, but based on its usage in the code and the initialization in__post_init__, it's clearly an instance ofOutlineDocumentreturned byCrateDbKnowledgeOutline.load().- outline: t.Any = dataclasses.field(init=False) + outline: "OutlineDocument" = dataclasses.field(init=False)Consider adding an import for
OutlineDocumentor using a forward reference string as shown.
65-74: Consider adding error handling for HTML generation.While the
copy_readmemethod has error handling for HTML generation, thecopy_sourcesmethod doesn't have similar error handling forself.outline.to_html().def copy_sources(self): """ Provide the source document in the original YAML format, but also converted to HTML. The intermediary Markdown format is already covered by the `llms.txt` file itself. """ shutil.copy( str(self.outline_yaml), self.outdir / "outline.yaml", ) - Path(self.outdir / "outline.html").write_text(self.outline.to_html()) + try: + Path(self.outdir / "outline.html").write_text(self.outline.to_html()) + except Exception as e: + logger.warning(f"Failed to generate HTML outline: {e}")
23-23: Add type annotation foroutline_urlparameter.For consistency with other fields, consider adding a type annotation for the
outline_urlparameter.- outline_url: str + outline_url: t.Optional[str]This would match the type used in
CrateDbKnowledgeOutline.load()(from the provided relevant code snippets), which accepts an optional URL.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
CHANGES.md(1 hunks)src/cratedb_about/bundle/llmstxt.py(4 hunks)src/cratedb_about/cli.py(2 hunks)src/cratedb_about/outline/model.py(2 hunks)tests/test_cli.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/cratedb_about/cli.py (1)
src/cratedb_about/bundle/llmstxt.py (2)
CrateDbLllmsTxtBuilder(78-87)run(29-47)
src/cratedb_about/bundle/llmstxt.py (3)
src/cratedb_about/cli.py (1)
outline(54-74)src/cratedb_about/outline/model.py (3)
to_markdown(71-81)to_llms_txt(86-109)to_html(83-84)src/cratedb_about/outline/core.py (2)
CrateDbKnowledgeOutline(9-76)load(60-76)
🔇 Additional comments (6)
tests/test_cli.py (1)
68-68: LGTM! Test updated to verify the new HTML output format.The test has been correctly updated to check for the existence of the new HTML output file, which aligns with the changes made in the bundle generation process.
src/cratedb_about/cli.py (1)
8-8: LGTM! Updated to use the CrateDB-specific builder class.The change correctly updates the import and instantiation to use
CrateDbLllmsTxtBuilder, which centralizes resource paths and outline loading for CrateDB resources.Also applies to: 98-98
CHANGES.md (1)
5-8: LGTM! Clear documentation of fixed semantics and new features.The changelog clearly explains the important semantic fix for bundle files and documents the new HTML outline feature. The reference link to the relevant issue is a good practice for traceability.
src/cratedb_about/bundle/llmstxt.py (3)
37-45: Comprehensive documentation of the llms.txt and llms-full.txt semantics.The updated comments clearly explain the purpose and differences between
llms.txtandllms-full.txtfiles, which aligns with the PR objectives of fixing the semantics between these files.
49-63: Good addition of docstring and improved resource handling.The method now uses the class field
self.readme_mdinstead of a hardcoded path, making the code more flexible and maintainable. The added docstring clearly explains the purpose of the method.
77-88: Well-structured subclass for CrateDB-specific implementation.The new
CrateDbLllmsTxtBuildersubclass appropriately sets default values for resource paths and initializes theoutlinefield. This improves modularity by separating the generic builder logic from CrateDB-specific resource handling.However, consider adding a docstring to the
__post_init__method to explain its purpose:def __post_init__(self): + """Initialize the outline by loading it from the provided URL.""" self.outline = CrateDbKnowledgeOutline.load(self.outline_url)
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/cratedb_about/bundle/llmstxt.py (2)
25-27: Consider providing initializers for non-init fields.These fields are marked with
init=Falsebut aren't initialized in the base class. While they're set in the subclass, it would be helpful to add documentation or default values to clarify how these fields should be populated when using the base class directly.- outline: OutlineDocument = dataclasses.field(init=False) - readme_md: Traversable = dataclasses.field(init=False) - outline_yaml: Traversable = dataclasses.field(init=False) + outline: OutlineDocument = dataclasses.field(init=False, default=None) + readme_md: Traversable = dataclasses.field(init=False, default=None) + outline_yaml: Traversable = dataclasses.field(init=False, default=None)
80-91: Well-structured subclass implementation.The creation of
CrateDbLllmsTxtBuildersubclass is a good approach to provide specific implementations while maintaining the flexibility of the base class:
- Default values for resource paths are provided
__post_init__is used to properly initialize theoutlinefield- This design supports dependency injection and testing
However, consider adding a comment to the base class explaining that it's intended to be subclassed, with the non-init fields expected to be set by subclasses.
@dataclasses.dataclass class LllmsTxtBuilder: """ Build llms.txt files for CrateDB. + + This is a base class intended to be subclassed. The non-init fields + (outline, readme_md, outline_yaml) should be initialized by subclasses. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/cratedb_about/bundle/llmstxt.py(4 hunks)src/cratedb_about/outline/model.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cratedb_about/outline/model.py
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/cratedb_about/bundle/llmstxt.py
[warning] 76-77: src/cratedb_about/bundle/llmstxt.py#L76-L77
Added lines #L76 - L77 were not covered by tests
🔇 Additional comments (7)
src/cratedb_about/bundle/llmstxt.py (7)
5-5: Appropriate import addition for type annotations.Adding the
Traversableimport is necessary for the new type annotations in the dataclass fields.
11-11: Good import update for the OutlineDocument class.The import of
OutlineDocumentis properly added to support the new dataclass field.
37-45: Well-documented semantic clarification of llms.txt files.The addition of clear comments explaining the purpose and differences between
llms.txtandllms-full.txtfiles is excellent. The implementation now properly uses the instance'soutlinefield, improving code organization.
50-52: Good addition of descriptive docstring.Adding a clear docstring to the
copy_readmemethod improves code readability and maintainability.
55-55: Appropriate use of instance field.Using the instance's
readme_mdfield instead of a hardcoded path improves flexibility and testability.
66-69: Clear docstring explains file purpose and relationships.The docstring effectively explains the purpose of the source files and their relationship to the
llms.txtfile.
71-71: Good use of instance field for improved flexibility.Using the instance's
outline_yamlfield instead of a hardcoded path enhances configurability.
e20be49 to
3179fbb
Compare
3179fbb to
99f4e21
Compare
Problem
The current llms.txt was wrong. Many other publications demonstrate it should be a Markdown file with referenced content NOT inlined.
References
llms.txtvs.llms-full.txt#39