Skip to content

Outline: Make cratedb-about outline understand --url option#21

Merged
amotl merged 3 commits intomainfrom
progress
May 11, 2025
Merged

Outline: Make cratedb-about outline understand --url option#21
amotl merged 3 commits intomainfrom
progress

Conversation

@amotl
Copy link
Member

@amotl amotl commented May 10, 2025

... to load any outline resource on local and remote filesystems.

@coderabbitai
Copy link

coderabbitai bot commented May 10, 2025

Walkthrough

The changes introduce support for specifying a custom outline source for the cratedb-about outline CLI command via a --url option or ABOUT_OUTLINE_URL environment variable, allowing both local and remote filesystems. The core logic for loading outlines is refactored into a new module, and related documentation, dependencies, and tests are updated accordingly.

Changes

File(s) Change Summary
src/cratedb_about/outline/core.py New module defining the CrateDbKnowledgeOutline dataclass, providing methods to load outlines from built-in or external sources via URL.
src/cratedb_about/cli.py Refactored to import CrateDbKnowledgeOutline from the new module and updated the outline CLI command to accept a --url option and environment variable.
src/cratedb_about/outline/__init__.py Updated to import CrateDbKnowledgeOutline from core.py and added OutlineDocument to public exports.
src/cratedb_about/outline/model.py Removed the old CrateDbKnowledgeOutline class and related imports.
README.md, CHANGES.md Documentation updated to describe the new CLI/API options for specifying outline sources and requirements for remote file support.
pyproject.toml Added a new optional dependency group manyio for remote file support and included it in the all group.
docs/backlog.md Backlog reorganized with detailed, categorized task descriptions and expanded notes.
tests/test_outline.py Refactored fixture naming, updated imports, and added tests for loading outlines from file, HTTP, and invalid URLs.
tests/test-outline.yaml Added a new YAML test file defining a knowledge outline for testing purposes.
tests/test_cli.py Removed the old test_cli_outline test function.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant CrateDbKnowledgeOutline
    participant OutlineDocument
    participant Filesystem/HTTP

    User->>CLI: cratedb-about outline --url <url>
    CLI->>CrateDbKnowledgeOutline: load(url)
    alt url provided
        CrateDbKnowledgeOutline->>Filesystem/HTTP: read(url)
        Filesystem/HTTP-->>CrateDbKnowledgeOutline: YAML content
    else no url
        CrateDbKnowledgeOutline->>Filesystem/HTTP: read(builtin YAML)
        Filesystem/HTTP-->>CrateDbKnowledgeOutline: YAML content
    end
    CrateDbKnowledgeOutline->>OutlineDocument: from_yaml(YAML content)
    OutlineDocument-->>CLI: OutlineDocument instance
    CLI-->>User: Output in requested format
Loading

Possibly related PRs

Poem

In the warren of code, a new path unfurls,
Outlines from URLs, both near and far worlds.
With CLI and Python, the sources expand—
Local or remote, at your command!
The rabbit hops on, dependencies in tow,
Testing each burrow where outlines now flow.
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between feb3d65 and a41cd2c.

📒 Files selected for processing (8)
  • CHANGES.md (1 hunks)
  • README.md (3 hunks)
  • pyproject.toml (2 hunks)
  • src/cratedb_about/cli.py (2 hunks)
  • src/cratedb_about/outline/core.py (1 hunks)
  • tests/test-outline.yaml (1 hunks)
  • tests/test_cli.py (0 hunks)
  • tests/test_outline.py (1 hunks)
💤 Files with no reviewable changes (1)
  • tests/test_cli.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test-outline.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
  • CHANGES.md
  • pyproject.toml
  • README.md
  • src/cratedb_about/cli.py
  • src/cratedb_about/outline/core.py
  • tests/test_outline.py
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
pyproject.toml (1)

81-81: LGTM: Good approach for handling optional dependencies.

The new manyio dependency group and its inclusion in the all group is a clean way to support the new URL loading functionality without forcing all users to install unnecessary dependencies.

Consider using a variable or reference for the pueblo version to avoid having to update it in multiple places (line 77 and 95).

- "pueblo==0.0.11",
+ "pueblo=={PUEBLO_VERSION}",
...
- "pueblo[fileio]==0.0.11",
+ "pueblo[fileio]=={PUEBLO_VERSION}",

Also applies to: 94-96

README.md (1)

58-58: LGTM: CLI documentation is clear and comprehensive.

The documentation clearly explains the default behavior and new URL loading capabilities, along with dependency requirements and filesystem support.

Consider adding a brief note about error handling or behavior when a URL is invalid or unreachable.

Also applies to: 62-69

src/cratedb_about/outline/core.py (2)

33-40: Consider adding more robust error handling.

While the current implementation works and error handling is tested, you might want to add more specific error handling in the read() method to provide user-friendly error messages for different failure scenarios (invalid URLs, network errors, etc.).

def read(self) -> str:
    if self.url is None:
        return self.BUILTIN.read_text()
    from pueblo.io import to_io

+    try:
        with to_io(self.url) as f:
            return f.read()
+    except FileNotFoundError:
+        raise FileNotFoundError(f"Outline file not found: {self.url}")
+    except (ValueError, IOError) as e:
+        raise ValueError(f"Failed to read outline from {self.url}: {str(e)}")

41-43: Consider documenting the url parameter.

The load method takes a url parameter, but there's no docstring explaining what formats are supported. It would be helpful to add a docstring explaining that it supports both local file paths and HTTP URLs.

@classmethod
def load(cls, url=None) -> "OutlineDocument":
+    """
+    Load the outline document from the specified URL or the built-in resource.
+    
+    Args:
+        url: Optional URL to load the outline from. Can be a local file path,
+             an HTTP/HTTPS URL, or None to use the built-in resource.
+             
+    Returns:
+        Parsed OutlineDocument instance.
+        
+    Raises:
+        FileNotFoundError: If the URL points to a non-existent file.
+        ValueError: If the URL is invalid or the content cannot be parsed.
+    """
    return OutlineDocument.from_yaml(cls(url=url).read())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a31264d and df3ff7a.

📒 Files selected for processing (9)
  • CHANGES.md (1 hunks)
  • README.md (2 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (2 hunks)
  • src/cratedb_about/cli.py (2 hunks)
  • src/cratedb_about/outline/__init__.py (1 hunks)
  • src/cratedb_about/outline/core.py (1 hunks)
  • src/cratedb_about/outline/model.py (0 hunks)
  • tests/test_outline.py (1 hunks)
💤 Files with no reviewable changes (1)
  • src/cratedb_about/outline/model.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/cratedb_about/outline/__init__.py (2)
src/cratedb_about/outline/core.py (1)
  • CrateDbKnowledgeOutline (9-43)
src/cratedb_about/outline/model.py (1)
  • OutlineDocument (48-162)
src/cratedb_about/cli.py (1)
src/cratedb_about/outline/core.py (2)
  • CrateDbKnowledgeOutline (9-43)
  • load (42-43)
🔇 Additional comments (13)
src/cratedb_about/outline/__init__.py (1)

1-2: LGTM: Good separation of concerns.

The refactoring moves the outline loading functionality to a dedicated module while properly exposing the necessary classes in the public API. This separation makes the code more maintainable and easier to understand.

Also applies to: 6-6

CHANGES.md (1)

4-6: LGTM: Changelog entry is clear and informative.

The changelog entry correctly documents the new --url option and environment variable alternative, providing users with the necessary information to understand and use the new feature.

README.md (1)

79-83: LGTM: API documentation is well-structured.

The API documentation nicely complements the CLI documentation, showing both the default usage and the new URL loading capability.

src/cratedb_about/cli.py (3)

9-9: LGTM: Import updated correctly.

The import statement has been properly updated to reflect the refactoring of the CrateDbKnowledgeOutline class.


24-32: LGTM: CLI option implementation follows best practices.

The new --url option is well-implemented with:

  • Both long and short forms
  • Environment variable support
  • Appropriate defaults
  • Clear help text

The pattern matches other options in the codebase for consistency.


36-36:

Details

❓ Verification inconclusive

LGTM: Function signature and implementation updated properly.

The function signature has been correctly updated to include the new parameter, and the parameter is properly passed to the underlying implementation.

While the implementation looks correct, let's verify that the error handling for invalid URLs works as expected:

Also applies to: 42-42


🏁 Script executed:

#!/bin/bash
# Try to import the module and check error handling for invalid URLs
python -c "from cratedb_about.outline import CrateDbKnowledgeOutline; try: CrateDbKnowledgeOutline.load('invalid://url'); print('Error handling not working properly'); except Exception as e: print(f'Good: Error properly handled: {type(e).__name__}')"

Length of output: 634


🏁 Script executed:

#!/bin/bash
# Locate the CrateDbKnowledgeOutline class and inspect its load method
rg -n "class CrateDbKnowledgeOutline" -A20
rg -n "def load" -A20

Length of output: 1595


🏁 Script executed:

#!/bin/bash
# Inspect the read method of CrateDbKnowledgeOutline to verify URL error handling
rg -n "def read" -A20 src/cratedb_about/outline/core.py

Length of output: 420


🏁 Script executed:

#!/bin/bash
# Test read() error handling with invalid URL
python3 - << 'EOF'
from cratedb_about.outline.core import CrateDbKnowledgeOutline
try:
    # Attempt to read from an invalid URL
    CrateDbKnowledgeOutline(url="invalid://url").read()
    print("Error handling not working properly")
except Exception as e:
    print(f"Good: Error properly handled: {type(e).__name__}: {e}")
EOF

Length of output: 137


Approve signature update; verify invalid-URL error handling

The signature change is spot on, but please manually confirm that passing an invalid URL into CrateDbKnowledgeOutline.read/.load raises a clear exception. Consider adding a unit test for this scenario.

• File: src/cratedb_about/outline/core.py
– Method: read(self) / load(cls, url)

docs/backlog.md (3)

4-11: Well-organized backlog additions for the outline feature.

The new tasks in the "Iteration +1" section provide a clear roadmap for enhancing the outline functionality, including support for remote resources and improved filtering capabilities. These align well with the PR's objective of making the outline command more flexible.


14-17: Clear roadmap for advanced outline features.

The planned tasks for "Iteration +2" to integrate with external resources like Discourse, HTML, and GitHub projects are well-defined and build logically on the current work being done to support remote resources.


20-21: Properly documented completed features.

The "Done" section correctly reflects the implementation of loading from custom outline files, which is the core feature introduced in this PR.

src/cratedb_about/outline/core.py (1)

1-44: Clean implementation of the outline loading functionality.

The CrateDbKnowledgeOutline class is well-designed with clear responsibilities. It handles both built-in and custom URL sources for loading the outline, enabling the flexibility required by the PR objectives.

The docstring provides clear examples and the implementation follows good Python practices like using dataclasses and type annotations.

tests/test_outline.py (3)

7-9: Good fixture renaming for clarity.

Renaming the fixture from cratedb_outline to cratedb_outline_builtin better reflects its purpose of loading the built-in outline rather than a custom one.


84-98: Good test coverage for URL loading functionality.

The new tests for loading from a local file and an HTTP URL validate the core functionality of the PR. Using a specific tagged version in the HTTP URL test is a good practice for test stability.


100-103: Error handling test validates failure cases.

The test for invalid URLs ensures proper error handling, which is crucial for a good user experience.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/backlog.md (1)

17-21: Fix bare URLs in markdown

The URLs should be properly formatted according to markdown best practices.

- Outline: Unlock Discourse, auto-select https://community.cratedb.com/raw/1015
- Outline: Unlock HTML resources, auto-convert using the best standalone program.
  => https://www.urltoany.com/url-to-markdown
- Outline: Unlock GitHub projects: https://github.com/mattduck/gh2md
+ Outline: Unlock Discourse, auto-select [https://community.cratedb.com/raw/1015](https://community.cratedb.com/raw/1015)
+ Outline: Unlock HTML resources, auto-convert using the best standalone program.
  => [https://www.urltoany.com/url-to-markdown](https://www.urltoany.com/url-to-markdown)
+ Outline: Unlock GitHub projects: [https://github.com/mattduck/gh2md](https://github.com/mattduck/gh2md)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

18-18: Bare URL used
null

(MD034, no-bare-urls)


20-20: Bare URL used
null

(MD034, no-bare-urls)


21-21: Bare URL used
null

(MD034, no-bare-urls)

src/cratedb_about/outline/core.py (1)

33-43: Fix docstring return type for read() method

The docstring incorrectly states the return type for the read() method.

    """
    Load the file from an external URL or the built-in resource.

    Returns:
-        Parsed OutlineDocument instance.
+        String containing the raw YAML content.

    Raises:
        FileNotFoundError: If the URL points to a non-existent file.
        ValueError: If the URL is invalid or the content cannot be parsed.
    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df3ff7a and aebfee7.

📒 Files selected for processing (7)
  • CHANGES.md (1 hunks)
  • README.md (2 hunks)
  • docs/backlog.md (1 hunks)
  • pyproject.toml (2 hunks)
  • src/cratedb_about/cli.py (2 hunks)
  • src/cratedb_about/outline/core.py (1 hunks)
  • tests/test_outline.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • CHANGES.md
  • README.md
  • pyproject.toml
  • tests/test_outline.py
  • src/cratedb_about/cli.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/cratedb_about/outline/core.py (3)
src/cratedb_about/cli.py (1)
  • outline (36-50)
src/cratedb_about/outline/model.py (1)
  • OutlineDocument (48-162)
src/cratedb_about/util.py (1)
  • from_yaml (48-50)
🪛 markdownlint-cli2 (0.17.2)
docs/backlog.md

18-18: Bare URL used
null

(MD034, no-bare-urls)


20-20: Bare URL used
null

(MD034, no-bare-urls)


21-21: Bare URL used
null

(MD034, no-bare-urls)

🪛 GitHub Check: codecov/patch
src/cratedb_about/outline/core.py

[warning] 53-54: src/cratedb_about/outline/core.py#L53-L54
Added lines #L53 - L54 were not covered by tests

🔇 Additional comments (5)
docs/backlog.md (2)

4-9: Improved task organization and categorization!

Using prefixes like "Outline:", "Query:", and "Any:" makes it clearer to identify which domain each task belongs to. This organization will help with prioritization and tracking progress.


24-25: Good task tracking

Moving completed items to the "Done" section with appropriate categorization helps maintain a clean backlog and provides a clear history of accomplishments.

src/cratedb_about/outline/core.py (3)

8-31: Clean implementation of the CrateDbKnowledgeOutline dataclass

The class is well-structured with clear documentation, appropriate type hints, and a constant for the built-in resource. The examples in the docstring are particularly helpful for users of this class.


46-46: Good use of lazy import for optional dependency

Importing pueblo.io only when needed is a good practice for optional dependencies. This ensures the module can still be imported even if the dependency is not installed, which aligns with the PR's mention of the optional cratedb-about[manyio] extra.


56-73: Well-implemented load method with clear documentation

The load class method is well-documented with appropriate type hints and error descriptions. It properly delegates to the instance method and converts the raw YAML to a structured document.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
src/cratedb_about/outline/core.py (4)

18-19: Update example to match implementation: read() is an instance method, not a class method.

In the docstring, the example shows yaml_content = CrateDbKnowledgeOutline.read() which suggests read() is a class method, but it's actually an instance method. This could mislead users.

-        # Get raw YAML content
-        yaml_content = CrateDbKnowledgeOutline.read()
+        # Get raw YAML content
+        yaml_content = CrateDbKnowledgeOutline().read()

33-43: Returns type description is accurate but could be more consistent.

The docstring for read() correctly states it returns a string, but the language used differs from load() method's docstring. Consider standardizing the "Returns:" section format across methods.


46-47: Consider moving import to module level.

While late imports are a common pattern to avoid import overhead for rarely used modules, the pueblo.io import is a core dependency for this functionality and would be used whenever a URL is provided. Moving it to the top of the file would improve readability.

 import typing as t
 from importlib import resources
+from pueblo.io import to_io
 
 from cratedb_about.outline.model import OutlineDocument
 
 
@dataclasses.dataclass
...
    def read(self) -> str:
        ...
        if self.url is None:
            return self.BUILTIN.read_text()
-        from pueblo.io import to_io

57-57: Add type hint for url parameter.

The url parameter in the load() method lacks a type hint, while the class attribute on line 29 has one. For consistency and better type checking, consider adding the same type hint.

-    def load(cls, url=None) -> "OutlineDocument":
+    def load(cls, url: t.Optional[str] = None) -> "OutlineDocument":
tests/test_outline.py (2)

93-95: Use a more stable URL or mock for testing.

The test uses a specific version tag (v0.0.3) in the GitHub URL. This could break if that tag is removed or if the repository structure changes. Consider using a more stable URL or better yet, mock this HTTP request for more reliable testing.

-    outline = CrateDbKnowledgeOutline.load(
-        "https://github.com/crate/about/raw/refs/tags/v0.0.3/src/cratedb_about/outline/cratedb-outline.yaml"
-    )
+    # Option 1: Use a mocked HTTP response
+    with patch('pueblo.io.to_io') as mock_to_io:
+        mock_file = MagicMock()
+        mock_file.read.return_value = (Path("src/cratedb_about/outline/cratedb-outline.yaml")
+                                       .read_text())
+        mock_to_io.return_value.__enter__.return_value = mock_file
+        outline = CrateDbKnowledgeOutline.load("https://example.com/outline.yaml")
+    
+    # Option 2: Use a more stable URL that's less likely to change
+    # outline = CrateDbKnowledgeOutline.load(
+    #     "https://raw.githubusercontent.com/crate/about/main/src/cratedb_about/outline/cratedb-outline.yaml"
+    # )

136-138: Include assertion message for clarity on test failure.

When a test fails, a custom assertion message helps identify what went wrong. Add descriptive messages to your assertions to make debugging easier.

-    assert ex.match("Outline file not found")
+    assert ex.match("Outline file not found"), "Expected error message mentioning 'Outline file not found'"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aebfee7 and feb3d65.

📒 Files selected for processing (6)
  • CHANGES.md (1 hunks)
  • README.md (2 hunks)
  • pyproject.toml (2 hunks)
  • src/cratedb_about/cli.py (2 hunks)
  • src/cratedb_about/outline/core.py (1 hunks)
  • tests/test_outline.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • CHANGES.md
  • README.md
  • src/cratedb_about/cli.py
  • pyproject.toml
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/cratedb_about/outline/core.py (2)
src/cratedb_about/outline/model.py (1)
  • OutlineDocument (48-162)
src/cratedb_about/util.py (1)
  • from_yaml (48-50)
🪛 GitHub Check: codecov/patch
src/cratedb_about/outline/core.py

[warning] 53-54: src/cratedb_about/outline/core.py#L53-L54
Added lines #L53 - L54 were not covered by tests

... to load any outline resource on local and remote filesystems.
Alternatively, use the `ABOUT_OUTLINE_URL` environment variable.
@amotl amotl marked this pull request as ready for review May 11, 2025 10:06
@amotl amotl merged commit b7837c4 into main May 11, 2025
6 checks passed
@amotl amotl deleted the progress branch May 11, 2025 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant