Skip to content

Data backend: Refactor the source of truth to cratedb-outline.yaml#15

Merged
amotl merged 1 commit intomainfrom
source-yaml
May 9, 2025
Merged

Data backend: Refactor the source of truth to cratedb-outline.yaml#15
amotl merged 1 commit intomainfrom
source-yaml

Conversation

@amotl
Copy link
Member

@amotl amotl commented May 9, 2025

About

Added a new structured YAML file organizing CrateDB documentation resources into groups: Docs, API, Examples, and Optional, cratedb-outline.yaml.

Details

Also, provide the new CLI subcommand cratedb-about outline, and the cratedb_about.CrateDbKnowledgeOutline API for retrieving information from the knowledge base outline within Python programs, in order to support cratedb-mcp.

@coderabbitai
Copy link

coderabbitai bot commented May 9, 2025

Walkthrough

This update introduces a YAML-based outline as the new source of truth for CrateDB documentation structure, replacing the previous Markdown approach. It adds a new CLI command to display the outline in various formats, updates documentation and build/test workflows accordingly, and implements supporting data models, utilities, and tests for the new features.

Changes

File(s) Change Summary
.github/workflows/tests.yml Split the combined linter/test/build step into separate steps; removed commands cratedb-about --version and cratedb-about list-questions.
.gitignore Added .coverage, coverage.xml, and dist to ignore coverage reports and build artifacts.
CHANGES.md Updated changelog to record the backend refactor (YAML outline), new CLI subcommand outline, and new API entity CrateDbKnowledgeOutline.
README.md, src/content/about/llms-txt.md Updated documentation to reference cratedb-outline.yaml instead of the old Markdown overview, clarified usage instructions, and detailed the new workflow for generating and querying documentation data.
pyproject.toml Added dependency cattrs<25, optional dependency groups release and test, included YAML files in package data, added pytest and coverage configurations, updated Poe tasks to add build, test, and release tasks, and adjusted linting rules.
src/cratedb_about/cli.py Added a new CLI command outline supporting output formats markdown, yaml, and json, displaying the CrateDB documentation outline.
src/cratedb_about/outline/cratedb-outline.yaml Added a new structured YAML file organizing CrateDB documentation resources into groups: Docs, API, Examples, and Optional.
src/cratedb_about/outline/model.py Added data model classes for the outline document, including loading from YAML and serialization to markdown, plus methods to access sections and items.
src/cratedb_about/util.py Added utility classes for metadata and serializable data structures with JSON and YAML conversion support using cattrs.
src/index/cratedb-overview.md Deleted the old Markdown overview file, superseded by the new YAML outline.
tests/test_cli.py Added CLI tests covering version, help, list-questions, and the new outline command with markdown output.
tests/test_outline.py Added tests for the CrateDbKnowledgeOutline API, verifying section retrieval, item access, error handling, and overall data completeness.
src/cratedb_about/init.py Added export of CrateDbKnowledgeOutline as the public API of the package.
src/cratedb_about/outline/init.py Added import and export of CrateDbKnowledgeOutline for clean API exposure.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant OutlineModel
    participant YAMLResource

    User->>CLI: cratedb-about outline --format markdown
    CLI->>OutlineModel: CrateDbKnowledgeOutline.load()
    OutlineModel->>YAMLResource: Read cratedb-outline.yaml
    YAMLResource-->>OutlineModel: YAML data
    OutlineModel-->>CLI: OutlineDocument instance
    CLI->>OutlineModel: to_markdown()
    OutlineModel-->>CLI: Markdown string
    CLI-->>User: Print outline in markdown
Loading

Possibly related PRs

  • What is CrateDB? / What can you do with CrateDB, and how? #1: This PR initially introduced the original src/index/cratedb-overview.md file and updated README references to it. The current PR builds upon that by replacing the Markdown overview with a structured YAML outline and adding CLI support to render it, making the two PRs strongly connected in terms of documentation source evolution.

  • Build llms.txt files from source cratedb-overview.md #3: Introduced the YAML outline and CLI commands for documentation outline management, closely related to this PR’s restructuring of documentation sources and CLI interface.

Poem

🐇
The outline now in YAML lives,
With sections neat, each doc it gives.
The CLI can show it all—
In markdown, JSON, big or small!
Tests and docs are fresh and bright,
This bunny hops with pure delight.
Hooray for structured docs tonight!

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
src/cratedb_about/util.py (2)

10-14: Well-structured Metadata class definition.

The Metadata class is appropriately defined with type annotations using Union types for optional fields. Consider using Optional[T] from typing instead of Union[T, None] for better readability.

- version: t.Union[float, None] = None
- type: t.Union[str, None] = None
+ version: t.Optional[float] = None
+ type: t.Optional[str] = None

34-37: Simple from_dict implementation could be improved.

The current implementation doesn't handle nested objects properly. Consider using the converter's structure method instead of direct unpacking.

@classmethod
def from_dict(cls, data: t.Dict[str, t.Any]):
-    return cls(**data)
+    converter = make_json_converter(dict_factory=OrderedDict)
+    return converter.structure(data, cls)
src/cratedb_about/cli.py (1)

17-36: Well-implemented CLI command for displaying documentation outline

The implementation is clean and follows the existing code patterns. It properly uses Click's option decorators, has clear documentation, and handles different output formats correctly.

Consider adding error handling for potential exceptions from CrateDBOutline.load() or the serialization methods to provide more user-friendly error messages:

-    cratedb_outline = CrateDBOutline.load()
-    if format_ == "json":
-        print(cratedb_outline.to_json())  # noqa: T201
-    elif format_ == "yaml":
-        print(cratedb_outline.to_yaml())  # noqa: T201
-    elif format_ == "markdown":
-        print(cratedb_outline.to_markdown())  # noqa: T201
-    else:
-        raise ValueError(f"Invalid output format: {format_}")
+    try:
+        cratedb_outline = CrateDBOutline.load()
+        if format_ == "json":
+            print(cratedb_outline.to_json())  # noqa: T201
+        elif format_ == "yaml":
+            print(cratedb_outline.to_yaml())  # noqa: T201
+        elif format_ == "markdown":
+            print(cratedb_outline.to_markdown())  # noqa: T201
+        else:
+            raise ValueError(f"Invalid output format: {format_}")
+    except Exception as e:
+        raise click.ClickException(f"Error generating outline: {str(e)}")
README.md (1)

20-52: Improved documentation structure and clarity

The README now has a clearer organization with separate Install and Usage sections, and includes instructions for the new outline command.

Minor preposition correction needed in line 31:

-Convert documentation outline from `cratedb-outline.yaml` in Markdown format.
+Convert documentation outline from `cratedb-outline.yaml` to Markdown format.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~31-~31: The preposition “to” seems more likely in this position.
Context: ...ion outline from cratedb-outline.yaml in Markdown format. This is the source for...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)

51-51: Bare URL used
null

(MD034, no-bare-urls)

tests/test_cli.py (1)

47-60: Good basic test for the new outline command

The test verifies that the outline command works with the markdown format and checks for expected content in the output.

Consider adding tests for the YAML and JSON formats to ensure all output formats work correctly:

def test_cli_outline_yaml():
    runner = CliRunner()
    result = runner.invoke(
        cli,
        args=["outline", "--format", "yaml"],
        catch_exceptions=False,
    )
    assert result.exit_code == 0, result.output
    assert "title: CrateDB" in result.output
    assert "name: Concepts" in result.output

def test_cli_outline_json():
    runner = CliRunner()
    result = runner.invoke(
        cli,
        args=["outline", "--format", "json"],
        catch_exceptions=False,
    )
    assert result.exit_code == 0, result.output
    assert "\"title\": \"CrateDB\"" in result.output
    assert "\"name\": \"Concepts\"" in result.output
src/cratedb_about/outline/model.py (1)

11-19: Consider adding error handling in read/load methods

The class methods to read and load the outline YAML file work for the happy path, but lack error handling for cases where the file might be missing or malformed.

@classmethod
def read(cls):
-    return resources.read_text("cratedb_about.outline", "cratedb-outline.yaml")
+    try:
+        return resources.read_text("cratedb_about.outline", "cratedb-outline.yaml")
+    except (FileNotFoundError, ImportError) as e:
+        raise RuntimeError(f"Could not read CrateDB outline YAML: {e}") from e

@classmethod
def load(cls):
-    return OutlineDocument.from_yaml(cls.read())
+    try:
+        return OutlineDocument.from_yaml(cls.read())
+    except Exception as e:
+        raise RuntimeError(f"Could not parse CrateDB outline YAML: {e}") from e
src/cratedb_about/outline/cratedb-outline.yaml (2)

20-22: Remove trailing whitespace

There are trailing spaces at the end of lines 20 and 22 that should be removed.

      > CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is based on Lucene, inherits technologies from Elasticsearch, and is compatible with PostgreSQL.
-      
+
      Things to remember when working with CrateDB are:
-      
+
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 20-20: trailing spaces

(trailing-spaces)


[error] 22-22: trailing spaces

(trailing-spaces)


36-76: Fix inconsistent indentation in item entries

The static analysis tool flagged inconsistent indentation for some items (lines 39, 80, 107, 134). All items should use consistent indentation (4 spaces) for better maintainability.

  - name: Docs
    items:

-      - title: "CrateDB README"
+    - title: "CrateDB README"

Apply similar changes to the items in other sections (lines 80, 107, 134) for consistent indentation throughout the file.

Also applies to: 77-103, 104-130, 131-229

🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 39-39: wrong indentation: expected 4 but found 6

(indentation)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75302da and 30ec63d.

📒 Files selected for processing (12)
  • .github/workflows/tests.yml (1 hunks)
  • .gitignore (1 hunks)
  • CHANGES.md (1 hunks)
  • README.md (2 hunks)
  • pyproject.toml (4 hunks)
  • src/content/about/llms-txt.md (1 hunks)
  • src/cratedb_about/cli.py (2 hunks)
  • src/cratedb_about/outline/cratedb-outline.yaml (1 hunks)
  • src/cratedb_about/outline/model.py (1 hunks)
  • src/cratedb_about/util.py (1 hunks)
  • src/index/cratedb-overview.md (0 hunks)
  • tests/test_cli.py (1 hunks)
💤 Files with no reviewable changes (1)
  • src/index/cratedb-overview.md
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/cratedb_about/cli.py (2)
src/cratedb_about/outline/model.py (3)
  • CrateDBOutline (11-18)
  • load (17-18)
  • to_markdown (51-60)
src/cratedb_about/util.py (2)
  • to_json (26-28)
  • to_yaml (30-32)
src/cratedb_about/outline/model.py (1)
src/cratedb_about/util.py (3)
  • Dumpable (17-46)
  • Metadata (11-13)
  • from_yaml (44-46)
🪛 LanguageTool
src/content/about/llms-txt.md

[uncategorized] ~16-~16: Loose punctuation mark.
Context: ... What's Inside - cratedb-outline.yaml: The YAML source file for generating a M...

(UNLIKELY_OPENING_PUNCTUATION)

README.md

[uncategorized] ~31-~31: The preposition “to” seems more likely in this position.
Context: ...ion outline from cratedb-outline.yaml in Markdown format. This is the source for...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)
README.md

51-51: Bare URL used
null

(MD034, no-bare-urls)

🪛 YAMLlint (1.35.1)
src/cratedb_about/outline/cratedb-outline.yaml

[error] 20-20: trailing spaces

(trailing-spaces)


[error] 22-22: trailing spaces

(trailing-spaces)


[warning] 39-39: wrong indentation: expected 4 but found 6

(indentation)


[warning] 80-80: wrong indentation: expected 4 but found 6

(indentation)


[warning] 107-107: wrong indentation: expected 4 but found 6

(indentation)


[warning] 134-134: wrong indentation: expected 4 but found 6

(indentation)

🔇 Additional comments (26)
src/cratedb_about/util.py (5)

1-8: Comprehensive import statements with appropriate type annotations.

The code properly imports necessary modules for type hinting, collections, and third-party libraries (attrs, cattrs).


16-22: TODO comment in Dumpable class should be addressed.

The comment indicates this class should be refactored to pueblo.data. Consider creating an issue to track this if it's intended for future work, or remove the comment if no longer relevant.

Is there a plan to refactor this class to pueblo.data? If so, consider creating a tracking issue.


23-25: Effective implementation of to_dict method.

Good use of attr.asdict with OrderedDict to maintain key order in the serialized output.


26-33: Well-implemented JSON and YAML serialization methods.

Both methods properly use respective converters with consistent OrderedDict factory to ensure consistent output format and key ordering.


38-46: Consistent deserialization methods.

The JSON and YAML deserialization methods are properly implemented using the appropriate converters.

CHANGES.md (2)

4-6: Clear changelog entry for data backend refactoring.

The changelog entry accurately describes the refactoring of the documentation outline source of truth to cratedb-outline.yaml.


6-6: New CLI subcommand properly documented in changelog.

Good practice to document the addition of new CLI functionality in the changelog.

.github/workflows/tests.yml (1)

65-66: CLI command added to workflow tests.

Good addition of the new cratedb-about outline command to the test workflow, ensuring the new functionality is verified during CI.

src/content/about/llms-txt.md (1)

16-17: Documentation updated to reflect new source of truth.

The documentation is correctly updated to reference the new YAML-based source file structure. The line punctuation issue flagged by static analysis is a false positive as the colon is part of the filename reference, not loose punctuation.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~16-~16: Loose punctuation mark.
Context: ... What's Inside - cratedb-outline.yaml: The YAML source file for generating a M...

(UNLIKELY_OPENING_PUNCTUATION)

.gitignore (1)

1-2: Good addition of coverage and build artifacts to .gitignore

These additions properly exclude coverage reports (.coverage, coverage.xml) and distribution build outputs (dist) from version control, which is a good practice.

Also applies to: 9-9

src/cratedb_about/cli.py (1)

7-7: LGTM: Appropriate import for the new CLI command

This import correctly brings in the CrateDBOutline class that will be needed for the new outline command.

README.md (3)

13-14: Documentation updated to reference new YAML file

The README correctly updates references to point to the new source of truth.


17-18: Source file reference updated correctly

The reference has been updated to match the new YAML-based source.


56-57: Updated link to new source file

The link has been correctly updated to point to the new YAML file location.

tests/test_cli.py (1)

1-45: Well-structured CLI tests for existing commands

The tests for CLI version, help, and list-questions commands are well-implemented using Click's testing utilities and follow good practices for testing CLI applications.

pyproject.toml (6)

85-92: Appropriate dependency grouping strategy!

Good organization of optional dependencies into logical groups (release and test) with appropriate versioning constraints. This allows users to install only what they need for specific tasks.


99-100: Good approach for including YAML resources!

This ensures the new cratedb-outline.yaml file will be properly packaged and available at runtime.


136-138: Appropriate test file exclusion for S101!

Correctly excluding the S101 warning (use of assert) in test files is a good practice, as assertions are a standard pattern in tests.


140-155: Comprehensive pytest configuration!

The configuration includes all necessary settings for thorough testing, including coverage reporting, verbosity settings, and markers.


197-205: Well-designed build sequence for the new outline feature!

The build sequence properly generates content files from the YAML source of truth, supporting the key objective of making cratedb-outline.yaml the canonical source for documentation structure.


227-232: Good task separation for release and test operations!

Separating release and test tasks provides clarity and follows Python packaging best practices.

src/cratedb_about/outline/model.py (2)

21-44: Well-designed data model hierarchy!

The data classes create a clean, logical structure for representing the documentation outline. Using attrs and Factory for default values is a good practice to avoid mutable default argument issues.


46-61: Good serialization approach with inheritance

The OutlineDocument class effectively leverages the Dumpable base class for serialization/deserialization while adding a specific to_markdown() method for Markdown output.

src/cratedb_about/outline/cratedb-outline.yaml (3)

1-9: Great documentation of file purpose and conventions!

The header comments clearly explain the structure, group naming conventions, and references to external specifications. This helps maintainers understand the file's organization and purpose.


11-33: Well-structured metadata and header section

The metadata with version information and comprehensive header text provides a good introduction to CrateDB and its key features.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 20-20: trailing spaces

(trailing-spaces)


[error] 22-22: trailing spaces

(trailing-spaces)


34-229: Comprehensive content structure with logical grouping

The organization into Docs, API, Examples, and Optional sections creates a clear hierarchy of information. Each entry has consistent attributes (title, link, description) making it easy to process programmatically.

🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 39-39: wrong indentation: expected 4 but found 6

(indentation)


[warning] 80-80: wrong indentation: expected 4 but found 6

(indentation)


[warning] 107-107: wrong indentation: expected 4 but found 6

(indentation)


[warning] 134-134: wrong indentation: expected 4 but found 6

(indentation)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (8)
README.md (2)

31-32: Minor grammatical improvement opportunity.

The preposition "in" could be replaced with "to" for better grammatical flow.

-Convert documentation outline from `cratedb-outline.yaml` in Markdown format.
+Convert documentation outline from `cratedb-outline.yaml` to Markdown format.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~31-~31: The preposition “to” seems more likely in this position.
Context: ...ion outline from cratedb-outline.yaml in Markdown format. This is the source for...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)


51-51: Format URL as a proper Markdown link.

The URL is currently not formatted as a proper Markdown link, which is flagged by markdownlint.

-variable. The default value is https://cdn.crate.io/about/v1/llms-full.txt.
+variable. The default value is `https://cdn.crate.io/about/v1/llms-full.txt`.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

51-51: Bare URL used
null

(MD034, no-bare-urls)

src/cratedb_about/outline/cratedb-outline.yaml (6)

21-21: Remove trailing whitespace.

YAMLlint indicates trailing whitespace on this line, which should be removed.

-      
+
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 21-21: trailing spaces

(trailing-spaces)


23-23: Remove trailing whitespace.

YAMLlint indicates trailing whitespace on this line, which should be removed.

-      
+
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 23-23: trailing spaces

(trailing-spaces)


40-43: Fix indentation for consistency.

YAMLlint indicates incorrect indentation; should be 4 spaces instead of 6.

-      - title: "CrateDB README"
-        link: https://raw.githubusercontent.com/crate/crate/refs/heads/master/README.rst
-        description: README about CrateDB.
+    - title: "CrateDB README"
+      link: https://raw.githubusercontent.com/crate/crate/refs/heads/master/README.rst
+      description: README about CrateDB.
🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 40-40: wrong indentation: expected 4 but found 6

(indentation)


81-83: Fix indentation for consistency.

YAMLlint indicates incorrect indentation; should be 4 spaces instead of 6.

-      - title: "CrateDB SQL syntax"
-        description: You can use Structured Query Language (SQL) to query your data.
-        link: https://cratedb.com/docs/crate/reference/en/latest/_sources/sql/index.rst.txt
+    - title: "CrateDB SQL syntax"
+      description: You can use Structured Query Language (SQL) to query your data.
+      link: https://cratedb.com/docs/crate/reference/en/latest/_sources/sql/index.rst.txt
🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 81-81: wrong indentation: expected 4 but found 6

(indentation)


108-111: Fix indentation for consistency.

YAMLlint indicates incorrect indentation; should be 4 spaces instead of 6.

-      - title: "CrateDB SQL gallery"
-        link: https://github.com/crate/cratedb-toolkit/raw/refs/tags/v0.0.31/cratedb_toolkit/info/library.py
-        description: A collection of SQL queries and utilities suitable for diagnostics on CrateDB.
+    - title: "CrateDB SQL gallery"
+      link: https://github.com/crate/cratedb-toolkit/raw/refs/tags/v0.0.31/cratedb_toolkit/info/library.py
+      description: A collection of SQL queries and utilities suitable for diagnostics on CrateDB.
🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 108-108: wrong indentation: expected 4 but found 6

(indentation)


135-138: Fix indentation for consistency.

YAMLlint indicates incorrect indentation; should be 4 spaces instead of 6.

-      - title: "Concept: Clustering"
-        link: https://cratedb.com/docs/crate/reference/en/latest/_sources/concepts/clustering.rst.txt
-        description: How the distributed SQL database CrateDB uses a shared nothing architecture to form high-availability, resilient database clusters with minimal effort of configuration.
-        source: docs
+    - title: "Concept: Clustering"
+      link: https://cratedb.com/docs/crate/reference/en/latest/_sources/concepts/clustering.rst.txt
+      description: How the distributed SQL database CrateDB uses a shared nothing architecture to form high-availability, resilient database clusters with minimal effort of configuration.
+      source: docs
🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 135-135: wrong indentation: expected 4 but found 6

(indentation)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30ec63d and 57fadef.

📒 Files selected for processing (12)
  • .github/workflows/tests.yml (1 hunks)
  • .gitignore (1 hunks)
  • CHANGES.md (1 hunks)
  • README.md (2 hunks)
  • pyproject.toml (4 hunks)
  • src/content/about/llms-txt.md (1 hunks)
  • src/cratedb_about/cli.py (2 hunks)
  • src/cratedb_about/outline/cratedb-outline.yaml (1 hunks)
  • src/cratedb_about/outline/model.py (1 hunks)
  • src/cratedb_about/util.py (1 hunks)
  • src/index/cratedb-overview.md (0 hunks)
  • tests/test_cli.py (1 hunks)
💤 Files with no reviewable changes (1)
  • src/index/cratedb-overview.md
✅ Files skipped from review due to trivial changes (1)
  • .github/workflows/tests.yml
🚧 Files skipped from review as they are similar to previous changes (6)
  • .gitignore
  • CHANGES.md
  • src/cratedb_about/cli.py
  • src/cratedb_about/util.py
  • tests/test_cli.py
  • pyproject.toml
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/cratedb_about/outline/model.py (1)
src/cratedb_about/util.py (3)
  • Dumpable (17-47)
  • Metadata (11-13)
  • from_yaml (45-47)
🪛 GitHub Actions: Tests
src/cratedb_about/outline/model.py

[error] 49-49: mypy: Unexpected keyword argument "version" for "Metadata" (call-arg)


[error] 49-49: mypy: Unexpected keyword argument "type" for "Metadata" (call-arg)

🪛 LanguageTool
README.md

[uncategorized] ~31-~31: The preposition “to” seems more likely in this position.
Context: ...ion outline from cratedb-outline.yaml in Markdown format. This is the source for...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)

src/content/about/llms-txt.md

[uncategorized] ~16-~16: Loose punctuation mark.
Context: ... What's Inside - cratedb-outline.yaml: The YAML source file for generating a M...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)
README.md

51-51: Bare URL used
null

(MD034, no-bare-urls)

🪛 YAMLlint (1.35.1)
src/cratedb_about/outline/cratedb-outline.yaml

[error] 21-21: trailing spaces

(trailing-spaces)


[error] 23-23: trailing spaces

(trailing-spaces)


[warning] 40-40: wrong indentation: expected 4 but found 6

(indentation)


[warning] 81-81: wrong indentation: expected 4 but found 6

(indentation)


[warning] 108-108: wrong indentation: expected 4 but found 6

(indentation)


[warning] 135-135: wrong indentation: expected 4 but found 6

(indentation)

🔇 Additional comments (13)
src/content/about/llms-txt.md (1)

16-17: Updated source file reference aligns with architectural change.

The update correctly reflects the new documentation workflow where cratedb-outline.yaml replaces the previous Markdown file as the source of truth, aligning with the PR objective to refactor the data backend.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~16-~16: Loose punctuation mark.
Context: ... What's Inside - cratedb-outline.yaml: The YAML source file for generating a M...

(UNLIKELY_OPENING_PUNCTUATION)

README.md (4)

13-14: Properly updated reference to the new outline file.

The README now correctly points to the new YAML source file instead of the previous Markdown overview.


17-18: Consistent reference to the new source file.

This change maintains consistency by referring to the YAML file as the source for generating the llms.txt files.


27-38: Well-structured documentation of the new workflow.

The updated README now clearly separates the documentation generation process into its own section, making it easier for users to understand the new workflow with the YAML outline file.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~31-~31: The preposition “to” seems more likely in this position.
Context: ...ion outline from cratedb-outline.yaml in Markdown format. This is the source for...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)


56-56: Updated GitHub link to reflect new file path.

This change correctly updates the link to point to the new YAML file location.

src/cratedb_about/outline/cratedb-outline.yaml (4)

1-9: Well-documented file header with clear references.

The file header clearly explains the purpose and structure of the document, providing useful references to llms.txt and RSS specifications.


11-13: Proper versioning metadata.

Including version information is good practice for data files, especially when they serve as a source of truth.


16-34: Comprehensive header with rich description.

The header section provides a thorough description of CrateDB and important points to remember, which is valuable for documentation purposes.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 21-21: trailing spaces

(trailing-spaces)


[error] 23-23: trailing spaces

(trailing-spaces)


35-230: Well-structured document with comprehensive content organization.

The YAML file effectively organizes CrateDB documentation into logical sections (Docs, API, Examples, Optional) with consistent entry formatting. Each entry provides valuable metadata including title, link, description, and sometimes additional notes or source information.

🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 40-40: wrong indentation: expected 4 but found 6

(indentation)


[warning] 81-81: wrong indentation: expected 4 but found 6

(indentation)


[warning] 108-108: wrong indentation: expected 4 but found 6

(indentation)


[warning] 135-135: wrong indentation: expected 4 but found 6

(indentation)

src/cratedb_about/outline/model.py (4)

1-9: Clean imports with proper organization.

The imports are well-organized, starting with standard library imports, followed by third-party libraries, and finally local imports.


11-19: Efficient class methods for outline access.

The CrateDBOutline class provides clean class methods to read and load the outline YAML file, which follows good software design principles by abstracting file access details.


21-45: Well-defined data model with appropriate type hints.

The data model classes (OutlineHeader, OutlineItem, OutlineSection, OutlineData) use type hints and sensible defaults, creating a clean representation of the outline structure.


52-62: Clean Markdown generation implementation.

The to_markdown method in OutlineDocument uses an efficient approach with StringIO to build the Markdown representation, with proper formatting and structure.

@amotl amotl force-pushed the source-yaml branch 6 times, most recently from 20ff33e to b031d47 Compare May 9, 2025 20:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/cratedb_about/outline/model.py (1)

69-69: Fix Metadata initialization to match class definition.

The Metadata class from util.py doesn't accept version and type as keyword arguments directly, which causes mypy errors.

-    meta: Metadata = Factory(lambda: Metadata(version=1, type="outline"))
+    meta: Metadata = Factory(lambda: Metadata())

Then set the attributes after initialization in __attrs_post_init__:

def __attrs_post_init__(self):
    self.meta.version = 1
    self.meta.type = "outline"
🧹 Nitpick comments (6)
README.md (1)

71-71: Consider formatting the URL as a proper Markdown link.

The URL is currently written as a bare URL, which was flagged by the linter.

-variable. The default value is https://cdn.crate.io/about/v1/llms-full.txt.
+variable. The default value is [https://cdn.crate.io/about/v1/llms-full.txt](https://cdn.crate.io/about/v1/llms-full.txt).
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

71-71: Bare URL used
null

(MD034, no-bare-urls)

tests/test_outline.py (1)

18-18: Fix typo in function name.

There's a typo in the test function name.

-def test_outline_get_ection(cratedb_outline):
+def test_outline_get_section(cratedb_outline):
src/cratedb_about/outline/model.py (4)

35-41: Consider making OutlineHeader inherit from DictTools for consistency.

I notice that OutlineItem and OutlineSection inherit from DictTools, but OutlineHeader doesn't. This creates an inconsistency in the class hierarchy. Since all three are data model elements, they should probably have the same base class for consistency.

-@define
-class OutlineHeader:
+@define
+class OutlineHeader(DictTools):
    """Data model element of an `OutlineDocument`"""

89-95: Improve docstring for the get_section method.

The current docstring is minimal. Consider enhancing it with parameter and return type descriptions, as well as an example of usage. This makes the API more approachable for developers.

def get_section(self, name: str) -> t.Optional[OutlineSection]:
-    """Return an individual section by name."""
+    """
+    Return an individual section by name.
+    
+    Args:
+        name: The name of the section to retrieve
+        
+    Returns:
+        The section if found, None otherwise
+        
+    Example:
+        ```python
+        outline = CrateDbKnowledgeOutline.load()
+        section = outline.get_section("Getting Started")
+        ```
+    """
    for section in self.data.sections:
        if section.name == name:
            return section
    return None

96-114: Type hint for section_name could be more precise.

The parameter section_name can be None, but this isn't reflected in its type annotation. Consider using Optional[str] for more accurate typing.

def get_items(
-    self, section_name: str = None, as_dict: bool = False
+    self, section_name: t.Optional[str] = None, as_dict: bool = False
) -> t.Union[t.List[t.Dict[str, t.Any]], t.List[OutlineItem]]:

11-23: Add more detailed docstring for the CrateDbKnowledgeOutline class.

While the class has a basic docstring, it would be helpful to add more details about how to use the read() and load() methods, including examples. This makes the API more approachable for developers.

class CrateDbKnowledgeOutline:
    """
    Load CrateDB knowledge outline from YAML file `cratedb-outline.yaml`.
+    
+    This class provides methods to read the raw YAML content and to load it
+    as a structured document model.
+    
+    Examples:
+        ```python
+        # Get raw YAML content
+        yaml_content = CrateDbKnowledgeOutline.read()
+        
+        # Load as structured document
+        outline = CrateDbKnowledgeOutline.load()
+        
+        # Get all section names
+        sections = outline.section_names
+        ```
    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57fadef and b031d47.

📒 Files selected for processing (16)
  • .github/workflows/tests.yml (1 hunks)
  • .gitignore (1 hunks)
  • CHANGES.md (1 hunks)
  • README.md (2 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (5 hunks)
  • src/content/about/llms-txt.md (1 hunks)
  • src/cratedb_about/__init__.py (1 hunks)
  • src/cratedb_about/cli.py (2 hunks)
  • src/cratedb_about/outline/__init__.py (1 hunks)
  • src/cratedb_about/outline/cratedb-outline.yaml (1 hunks)
  • src/cratedb_about/outline/model.py (1 hunks)
  • src/cratedb_about/util.py (1 hunks)
  • src/index/cratedb-overview.md (0 hunks)
  • tests/test_cli.py (1 hunks)
  • tests/test_outline.py (1 hunks)
💤 Files with no reviewable changes (1)
  • src/index/cratedb-overview.md
✅ Files skipped from review due to trivial changes (3)
  • src/cratedb_about/outline/init.py
  • docs/backlog.md
  • src/cratedb_about/init.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • CHANGES.md
  • .gitignore
  • src/cratedb_about/cli.py
  • .github/workflows/tests.yml
  • tests/test_cli.py
  • src/cratedb_about/util.py
  • src/cratedb_about/outline/cratedb-outline.yaml
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/test_outline.py (2)
src/cratedb_about/outline/model.py (6)
  • CrateDbKnowledgeOutline (11-22)
  • OutlineDocument (60-113)
  • load (21-22)
  • section_names (85-87)
  • get_section (89-94)
  • get_items (96-113)
src/cratedb_about/cli.py (1)
  • outline (21-35)
src/cratedb_about/outline/model.py (1)
src/cratedb_about/util.py (5)
  • DictTools (17-23)
  • Dumpable (27-50)
  • Metadata (11-13)
  • from_yaml (48-50)
  • to_dict (18-19)
🪛 markdownlint-cli2 (0.17.2)
README.md

71-71: Bare URL used
null

(MD034, no-bare-urls)

🪛 LanguageTool
src/content/about/llms-txt.md

[uncategorized] ~16-~16: Loose punctuation mark.
Context: ... What's Inside - cratedb-outline.yaml: The YAML source file for generating a M...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (11)
src/content/about/llms-txt.md (1)

16-17: Documentation content accurately reflects the new source of truth.

The document has been properly updated to reference cratedb-outline.yaml as the new source file, aligning with the PR's objective of refactoring the source of truth from Markdown to YAML.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~16-~16: Loose punctuation mark.
Context: ... What's Inside - cratedb-outline.yaml: The YAML source file for generating a M...

(UNLIKELY_OPENING_PUNCTUATION)

README.md (3)

13-20: Clear description of the new structured documentation approach.

The README has been updated to clearly explain the new workflow using the YAML-based outline as the source of truth, which aligns perfectly with the PR objectives.


31-55: Comprehensive documentation for the new outline functionality.

The added "Outline" section provides excellent documentation for both CLI and API usage of the new outline functionality, including concrete examples for converting the outline to different formats and retrieving specific sections programmatically.


76-79: Updated references to correctly point to new resources.

The hyperlinks have been correctly updated to reference the new YAML outline file location and related resources.

tests/test_outline.py (2)

7-10: Well-designed test fixture for reusing the outline document.

Using a fixture to load the outline document once and reuse it across tests is an efficient approach.


12-47: Comprehensive test coverage for the new outline functionality.

The tests thoroughly cover key aspects of the CrateDbKnowledgeOutline class including:

  • Section name validation
  • Section retrieval
  • Error handling for non-existent sections
  • Item retrieval in different formats
  • All items retrieval

This provides good confidence in the robustness of the new outline functionality.

pyproject.toml (5)

72-72: Added cattrs dependency to support structured data models.

The addition of cattrs is appropriate for handling the serialization/deserialization needs of the new YAML-based outline.


86-93: Well-organized optional dependency groups.

Adding separate dependency groups for release and test tasks is a good practice for maintaining clean dependency management.


100-101: Proper package data configuration for YAML files.

Including YAML files in the package data ensures the outline file will be properly packaged and accessible via the API.


198-207: Updated build task to use the new YAML source.

The build task has been properly updated to generate content from the new YAML source, reflecting the change in the source of truth.


229-234: Clear separation of release and test tasks.

The configuration now properly separates building/releasing from testing, which improves the clarity of the task definitions.

CLI: Provide new subcommand `cratedb-about outline`.
API: Provide `cratedb_about.CrateDbKnowledgeOutline` for retrieving
information from the knowledge base outline within Python programs.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/cratedb_about/outline/model.py (1)

84-84: ⚠️ Potential issue

Fix Metadata initialization to match class definition.

Based on the Metadata class definition in util.py, it doesn't accept keyword arguments directly.

-meta: Metadata = Factory(lambda: Metadata(version=1, type="outline"))
+meta: Metadata = Factory(lambda: Metadata())

Then add a post-initialization method to set the attributes:

def __attrs_post_init__(self):
    self.meta.version = 1
    self.meta.type = "outline"
🧹 Nitpick comments (1)
README.md (1)

74-74: Consider formatting the URL as a proper link.

-variable. The default value is https://cdn.crate.io/about/v1/llms-full.txt.
+variable. The default value is [https://cdn.crate.io/about/v1/llms-full.txt](https://cdn.crate.io/about/v1/llms-full.txt).
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

74-74: Bare URL used
null

(MD034, no-bare-urls)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b031d47 and 66ed252.

📒 Files selected for processing (16)
  • .github/workflows/tests.yml (1 hunks)
  • .gitignore (1 hunks)
  • CHANGES.md (1 hunks)
  • README.md (2 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (5 hunks)
  • src/content/about/llms-txt.md (1 hunks)
  • src/cratedb_about/__init__.py (1 hunks)
  • src/cratedb_about/cli.py (2 hunks)
  • src/cratedb_about/outline/__init__.py (1 hunks)
  • src/cratedb_about/outline/cratedb-outline.yaml (1 hunks)
  • src/cratedb_about/outline/model.py (1 hunks)
  • src/cratedb_about/util.py (1 hunks)
  • src/index/cratedb-overview.md (0 hunks)
  • tests/test_cli.py (1 hunks)
  • tests/test_outline.py (1 hunks)
💤 Files with no reviewable changes (1)
  • src/index/cratedb-overview.md
✅ Files skipped from review due to trivial changes (1)
  • tests/test_cli.py
🚧 Files skipped from review as they are similar to previous changes (10)
  • docs/backlog.md
  • src/cratedb_about/outline/init.py
  • .github/workflows/tests.yml
  • .gitignore
  • src/cratedb_about/init.py
  • CHANGES.md
  • src/cratedb_about/cli.py
  • src/cratedb_about/util.py
  • tests/test_outline.py
  • src/cratedb_about/outline/cratedb-outline.yaml
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/cratedb_about/outline/model.py (1)
src/cratedb_about/util.py (5)
  • DictTools (17-23)
  • Dumpable (27-50)
  • Metadata (11-13)
  • from_yaml (48-50)
  • to_dict (18-19)
🪛 markdownlint-cli2 (0.17.2)
README.md

74-74: Bare URL used
null

(MD034, no-bare-urls)

🪛 LanguageTool
src/content/about/llms-txt.md

[uncategorized] ~16-~16: Loose punctuation mark.
Context: ... What's Inside - cratedb-outline.yaml: The YAML source file for generating a M...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (8)
src/cratedb_about/outline/model.py (3)

90-90: Good null check for header title.

Properly handling the case where header.title might be None by providing a default value.


11-38: Well-structured API for accessing the knowledge outline.

The CrateDbKnowledgeOutline class has a clear, well-documented interface with good examples in the docstring. The separation of read() for raw content and load() for structured data follows good design principles.


125-142: Comprehensive error handling in get_items method.

The get_items method correctly handles the case when a section isn't found by raising a descriptive ValueError that includes the list of available sections, which will be helpful for debugging.

pyproject.toml (3)

72-72: Good addition of cattrs dependency.

Adding the cattrs library as a dependency is appropriate since it's used for serialization/deserialization in the new outline functionality.


86-93: Well-structured optional dependencies.

The separation of release and test dependencies makes the package more maintainable and allows users to install only what they need.


198-207: Updated build task aligns with new outline format.

The build task has been properly updated to generate content from the new YAML outline source instead of the previous Markdown file, maintaining consistency with the refactoring.

src/content/about/llms-txt.md (1)

16-17: Updated documentation to reflect new source file.

The documentation has been correctly updated to reference the new YAML source file instead of the previous Markdown file.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~16-~16: Loose punctuation mark.
Context: ... What's Inside - cratedb-outline.yaml: The YAML source file for generating a M...

(UNLIKELY_OPENING_PUNCTUATION)

README.md (1)

31-58: Clear and comprehensive API usage documentation.

The README now includes excellent examples of both CLI and API usage for the new outline functionality, making it easy for users to understand how to work with the refactored code.

@amotl amotl marked this pull request as ready for review May 9, 2025 21:09
@amotl amotl merged commit d0cac21 into main May 9, 2025
3 checks passed
@amotl amotl deleted the source-yaml branch May 9, 2025 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant