Query: Permit loading context file from local filesystem#29
Conversation
|
Warning Rate limit exceeded@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 25 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (8)
WalkthroughThe updates introduce support for loading a context file from the local filesystem via the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Settings
participant LocalFS as Local Filesystem
participant HTTP
User->>Settings: get_prompt()
Settings->>Settings: llms_txt_payload
alt ABOUT_CONTEXT_URL points to local file
Settings->>LocalFS: Read file content
LocalFS-->>Settings: Return content
else ABOUT_CONTEXT_URL is HTTP URL
Settings->>HTTP: Fetch content via GET
HTTP-->>Settings: Return content
else Neither
Settings->>Settings: Raise NotImplementedError
end
Settings-->>User: Return prompt with context
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
docs/backlog.md (1)
9-11: Appropriate addition to backlog.This backlog item for comparing the sizes of CrateDB's
llms.txtagainst other vendors aligns well with the feature of loading context files from the local filesystem.Consider using proper Markdown link syntax for the GitHub issue URL:
- https://github.com/crate/about/issues/20 + [#20](https://github.com/crate/about/issues/20)This would address the markdownlint MD034 warning about bare URLs.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
11-11: Bare URL used
null(MD034, no-bare-urls)
README.md (1)
181-183: Documentation update properly reflects the new capability.The documentation has been updated to clearly indicate that
ABOUT_CONTEXT_URLcan now accept both remote URLs and local filesystem paths, which aligns with the PR objective of enhancing flexibility in specifying context files.Consider wrapping the default URL in backticks for consistency with the environment variable formatting:
-The default value is https://cdn.crate.io/about/v1/llms-full.txt. +The default value is `https://cdn.crate.io/about/v1/llms-full.txt`.This would also address the markdownlint MD034 warning about bare URLs.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
183-183: Bare URL used
null(MD034, no-bare-urls)
src/cratedb_about/query/model.py (2)
75-76: Consider path normalization for cross-platform compatibility.The current implementation might have issues with path separators across different operating systems. Consider normalizing the path or using platform-specific path handling.
url = cls.llms_txt_url path = Path(url) + # Normalize path for cross-platform compatibility + path = path.expanduser().resolve() if path.exists(): return path.read_text()
78-78: Consider adding HTTP response validation.When fetching content via HTTP, it's good practice to check the response status code to ensure the request was successful.
if url.startswith("http"): - return requests.get(url, timeout=10).text + response = requests.get(url, timeout=10) + response.raise_for_status() # Raises HTTPError for bad responses + return response.text
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
CHANGES.md(1 hunks)README.md(1 hunks)docs/backlog.md(1 hunks)src/cratedb_about/query/model.py(3 hunks)tests/conftest.py(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
README.md
183-183: Bare URL used
null
(MD034, no-bare-urls)
docs/backlog.md
11-11: Bare URL used
null
(MD034, no-bare-urls)
🔇 Additional comments (6)
CHANGES.md (1)
17-18: Appropriate changelog entry.The changelog entry clearly communicates the new feature allowing loading context files from the local filesystem via the
ABOUT_CONTEXT_URLconfiguration, matching the PR objectives.tests/conftest.py (1)
9-16: Good test environment cleanup.The
prune_environfixture has been appropriately updated to remove theABOUT_CONTEXT_URLenvironment variable during tests, which prevents test results from being affected by external configuration.The change to use a list of environment variables with a loop makes the code more maintainable for future additions.
src/cratedb_about/query/model.py (4)
3-3: Good addition of the Path import.The import of
Pathfrompathlibis necessary for the new file system functionality and is a good choice over using string manipulation for path handling.
37-40: Great additions to example questions.The new examples related to integrations with SQLAlchemy, pandas, and MongoDB provide good coverage of common integration scenarios users might ask about.
60-62: Good documentation practice.Adding a docstring to the existing property enhances code readability and maintainability.
85-85: Good refactoring of context loading.Using the new
llms_txt_payloadproperty to abstract the loading mechanism is a clean approach that separates concerns and makes the code more maintainable.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/cratedb_about/query/model.py (1)
65-79: 🛠️ Refactor suggestionConsider adding path validation for security.
Loading files from the filesystem without path validation could pose a security risk. Consider validating the path to prevent access to sensitive files.
@classproperty def llms_txt_payload(cls) -> str: """ Retrieve payload of context file. TODO: Add third option `pueblo.to_io`, to load resources from anywhere. See `cratedb_about.outline.core`. """ url = cls.llms_txt_url path = Path(url) + # Validate path is within allowed directories to prevent path traversal attacks + allowed_dirs = [Path.cwd(), Path.home() / "cratedb_context"] + if path.exists() and path.is_file() and any( + path.resolve().is_relative_to(allowed_dir) for allowed_dir in allowed_dirs + ): - if path.exists(): return path.read_text() if url.startswith("http"): return requests.get(url, timeout=10).text - raise NotImplementedError(f"Unable to load context file. Source: {url}") + raise NotImplementedError(f"Unable to load context file or path not allowed. Source: {url}")🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 76-76: src/cratedb_about/query/model.py#L76
Added line #L76 was not covered by tests
🧹 Nitpick comments (1)
README.md (1)
183-183: Fix bare URL according to markdown linting rules.The URL on line 183 is flagged by markdownlint as a bare URL (MD034). Consider wrapping it in angle brackets.
-The default value is https://cdn.crate.io/about/v1/llms-full.txt. +The default value is <https://cdn.crate.io/about/v1/llms-full.txt>.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
183-183: Bare URL used
null(MD034, no-bare-urls)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CHANGES.md(1 hunks)README.md(1 hunks)codecov.yml(1 hunks)docs/backlog.md(1 hunks)src/cratedb_about/query/model.py(3 hunks)tests/conftest.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- codecov.yml
🚧 Files skipped from review as they are similar to previous changes (3)
- CHANGES.md
- docs/backlog.md
- tests/conftest.py
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
README.md
183-183: Bare URL used
null
(MD034, no-bare-urls)
🪛 GitHub Check: codecov/patch
src/cratedb_about/query/model.py
[warning] 76-76: src/cratedb_about/query/model.py#L76
Added line #L76 was not covered by tests
[warning] 89-89: src/cratedb_about/query/model.py#L89
Added line #L89 was not covered by tests
🔇 Additional comments (5)
README.md (1)
182-183: Documentation accurately reflects the new flexibility.The documentation now clearly states that
ABOUT_CONTEXT_URLcan point to either a remote URL or a local filesystem path, which aligns with the implementation changes in the code.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
183-183: Bare URL used
null(MD034, no-bare-urls)
src/cratedb_about/query/model.py (4)
3-3: Appropriate import for new functionality.The
Pathimport frompathlibis necessary for handling local filesystem paths.
37-40: Useful example questions about integrations.These new example questions about integrations and specific libraries (SQLAlchemy, pandas, MongoDB) are valuable additions that cover important use cases for users.
60-62: Clear docstring addition.Adding a docstring to the
llms_txt_urlmethod improves code documentation.
85-85: Good implementation of the new functionality.Using the new
llms_txt_payloadproperty simplifies the code and properly separates concerns between fetching the content and using it.
| url = cls.llms_txt_url | ||
| path = Path(url) | ||
| if path.exists(): | ||
| return path.read_text() |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Add test coverage for file reading.
This line is flagged as not covered by tests. Consider adding tests that verify the functionality of loading context from a local file.
🏁 Script executed:
#!/bin/bash
# Check for tests that cover the local file loading functionality
echo "Searching for test files that might test local file loading:"
fd "test.*model.*\.py" --exec grep -l "ABOUT_CONTEXT_URL" {} \;
echo -e "\nSearching for specific test cases for file loading:"
fd "test.*model.*\.py" --exec grep -l "Path.*exists" {} \;Length of output: 380
Add test coverage for local file reading in src/cratedb_about/query/model.py
There are currently no tests exercising the return path.read_text() call on line 76. Please add a test case that:
- Uses pytest’s
tmp_path(or similar) to create a temporary file with known contents - Points the code’s file-loading logic at that file (e.g. via
ABOUT_CONTEXT_URL) - Asserts that
path.read_text()returns the expected string
Example outline in tests/test_model_file_loading.py:
import pytest
from pathlib import Path
from src.cratedb_about.query.model import load_local_context # or the actual function name
def test_load_local_context(tmp_path, monkeypatch):
sample_file = tmp_path / "sample.txt"
sample_content = "hello world"
sample_file.write_text(sample_content)
# Monkey-patch the URL or path resolver to return our temp file
monkeypatch.setenv("ABOUT_CONTEXT_URL", str(sample_file))
result = load_local_context()
assert result == sample_content🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 76-76: src/cratedb_about/query/model.py#L76
Added line #L76 was not covered by tests
🤖 Prompt for AI Agents
In src/cratedb_about/query/model.py at line 76, the file reading functionality
using path.read_text() lacks test coverage. Add a pytest test in a new or
existing test file (e.g., tests/test_model_file_loading.py) that uses the
tmp_path fixture to create a temporary file with known content, monkeypatch the
ABOUT_CONTEXT_URL environment variable to point to this file, call the function
that reads the file, and assert that the returned content matches the known
content. This will ensure the file reading logic is properly tested.
There was a problem hiding this comment.
@coderabbitai: A software test for this new feature would be nice. However, load_local_context() does not exist.
There was a problem hiding this comment.
@amotl You're absolutely right, my apologies for the confusion. There's no load_local_context() function - the file reading functionality is implemented in the llms_txt_payload classproperty of the Settings class.
Here's a corrected test suggestion:
import pytest
from pathlib import Path
from src.cratedb_about.query.model import Settings
def test_llms_txt_payload_from_file(tmp_path, monkeypatch):
# Create a temporary file with known content
test_file = tmp_path / "test_context.txt"
test_content = "Test CrateDB context content"
test_file.write_text(test_content)
# Point ABOUT_CONTEXT_URL to the temporary file
monkeypatch.setenv("ABOUT_CONTEXT_URL", str(test_file))
# Verify the classproperty correctly reads the file
assert Settings.llms_txt_payload == test_contentFor complete coverage, you might also want to add tests for the HTTP path and error case:
def test_llms_txt_payload_from_http(monkeypatch, requests_mock):
# Mock HTTP URL and response
test_url = "http://example.com/context.txt"
test_content = "Test CrateDB HTTP content"
requests_mock.get(test_url, text=test_content)
# Set the environment variable to the HTTP URL
monkeypatch.setenv("ABOUT_CONTEXT_URL", test_url)
# Verify HTTP request is made and content is returned
assert Settings.llms_txt_payload == test_content
def test_llms_txt_payload_invalid_source(monkeypatch):
# Set the environment variable to an invalid path
monkeypatch.setenv("ABOUT_CONTEXT_URL", "/non/existent/path/that/is/not/http")
# Verify appropriate error is raised
with pytest.raises(NotImplementedError):
_ = Settings.llms_txt_payload✏️ Learnings added
Learnt from: amotl
PR: crate/about#29
File: src/cratedb_about/query/model.py:76-76
Timestamp: 2025-05-15T11:27:22.782Z
Learning: In the cratedb_about package, the `Settings.llms_txt_payload` classproperty in `src/cratedb_about/query/model.py` handles loading context files from either a local filesystem path or an HTTP URL based on the `ABOUT_CONTEXT_URL` environment variable.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
@coderabbitai: Thanks. Software tests have been added per 102a8f0.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Get details of the commit that added the tests
git show 102a8f0a6f2 --name-only
echo "---"
git show 102a8f0a6f2
Length of output: 2663
@amotl Thank you for implementing the tests! I've reviewed the commit (102a8f0) and can see that you've added comprehensive tests for all three scenarios:
- Loading context from a local file
- Loading context from an HTTP URL
- Handling invalid sources with appropriate error messages
The implementation follows best practices with proper use of pytest fixtures (tmp_path, monkeypatch, and requests-mock). This ensures the llms_txt_payload classproperty is thoroughly tested for all code paths.
Great work on improving the test coverage!
(ʘᴥʘ)ノ♥
🧠 Learnings used
Learnt from: amotl
PR: crate/about#29
File: src/cratedb_about/query/model.py:76-76
Timestamp: 2025-05-15T11:27:22.782Z
Learning: In the cratedb_about package, the `Settings.llms_txt_payload` classproperty in `src/cratedb_about/query/model.py` handles loading context files from either a local filesystem path or an HTTP URL based on the `ABOUT_CONTEXT_URL` environment variable.
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
| if path.exists(): | ||
| return path.read_text() |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add error handling for file operations.
The current implementation doesn't handle file-specific exceptions like PermissionError or issues reading the file.
path = Path(url)
if path.exists():
- return path.read_text()
+ try:
+ return path.read_text()
+ except (PermissionError, UnicodeDecodeError) as e:
+ raise OSError(f"Error reading file {path}: {e}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if path.exists(): | |
| return path.read_text() | |
| url = cls.llms_txt_url | |
| path = Path(url) | |
| if path.exists(): | |
| try: | |
| return path.read_text() | |
| except (PermissionError, UnicodeDecodeError) as e: | |
| raise OSError(f"Error reading file {path}: {e}") | |
| if url.startswith("http"): | |
| return requests.get(url, timeout=10).text | |
| raise NotImplementedError(f"Unable to load context file. Source: {url}") |
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 76-76: src/cratedb_about/query/model.py#L76
Added line #L76 was not covered by tests
🤖 Prompt for AI Agents
In src/cratedb_about/query/model.py around lines 75 to 76, the file reading
operation lacks error handling for exceptions such as PermissionError or other
IO issues. Wrap the file existence check and read_text call in a try-except
block to catch and handle these exceptions gracefully, possibly logging the
error or returning a default value to prevent the program from crashing.
There was a problem hiding this comment.
@coderabbitai: We will adjust this spot so it will handle all kinds of exceptions, without specialising.
There was a problem hiding this comment.
@amotl Understood! Using a generic exception handler is a good approach too - it will still make the code more robust while keeping it simpler. Something like a general try/except block would catch all potential file operation issues without needing to enumerate specific error types.
✏️ Learnings added
Learnt from: amotl
PR: crate/about#29
File: src/cratedb_about/query/model.py:75-76
Timestamp: 2025-05-15T11:27:34.514Z
Learning: For the `cratedb_about` repository, the maintainer prefers generic exception handling rather than catching specific exception types.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
@coderabbitai: Thanks. b41f6c1 includes relevant improvements.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Fetch the commit details
git show b41f6c1117
Length of output: 1969
@amotl Thank you for implementing those improvements in b41f6c1! The changes look excellent - I particularly like the path normalization for cross-platform compatibility and the improved HTTP response handling with raise_for_status(). The switch to generic exception handling also aligns well with your preferred approach. These enhancements will make the code more robust when loading context files from either local or remote sources.
🧠 Learnings used
Learnt from: amotl
PR: crate/about#29
File: src/cratedb_about/query/model.py:76-76
Timestamp: 2025-05-15T11:27:22.782Z
Learning: In the cratedb_about package, the `Settings.llms_txt_payload` classproperty in `src/cratedb_about/query/model.py` handles loading context files from either a local filesystem path or an HTTP URL based on the `ABOUT_CONTEXT_URL` environment variable.
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/cratedb_about/query/model.py (2)
70-71: Consider tracking the TODO separately.The TODO comment about adding a third option with
pueblo.to_iomight be better tracked in an issue system rather than in code comments, especially if it represents planned future work.
76-78: Add a simple try-except for file reading.While your preference for generic exception handling is noted (and respected in the
get_promptmethod), consider adding a try-except block specifically around thepath.read_text()call to catch file reading errors at their source.if path.exists(): - return path.read_text() + try: + return path.read_text() + except Exception as e: + raise Exception(f"Error reading file {path}: {e}")tests/test_query.py (1)
104-106: Remove redundant comment.There are two comments explaining the same thing - one at line 105 is redundant with line 104. Consider removing one of them for clarity.
# Set the environment variable to the HTTP URL. - # Point `ABOUT_CONTEXT_URL` to the temporary URL. monkeypatch.setenv("ABOUT_CONTEXT_URL", test_url)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGES.md(1 hunks)README.md(1 hunks)codecov.yml(1 hunks)docs/backlog.md(1 hunks)pyproject.toml(1 hunks)src/cratedb_about/query/model.py(3 hunks)tests/conftest.py(1 hunks)tests/test_query.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (5)
- CHANGES.md
- docs/backlog.md
- tests/conftest.py
- README.md
- codecov.yml
🧰 Additional context used
🧠 Learnings (1)
src/cratedb_about/query/model.py (1)
Learnt from: amotl
PR: crate/about#29
File: src/cratedb_about/query/model.py:75-76
Timestamp: 2025-05-15T11:27:34.514Z
Learning: For the `cratedb_about` repository, the maintainer prefers generic exception handling rather than catching specific exception types.
🪛 GitHub Check: codecov/patch
src/cratedb_about/query/model.py
[warning] 94-94: src/cratedb_about/query/model.py#L94
Added line #L94 was not covered by tests
🔇 Additional comments (6)
src/cratedb_about/query/model.py (3)
37-40: Good addition of new example questions!These new integration-related examples improve the user experience by expanding the range of available questions, particularly around SQLAlchemy, pandas, and MongoDB integration topics.
60-62: Documentation improvement noted.Nice addition of docstring to the existing method, improving code readability.
65-84: Local file loading implementation looks good.The implementation correctly supports both local file paths and HTTP URLs. I notice you're normalizing paths for cross-platform compatibility with
expanduser().resolve(), which is a good practice.The HTTP implementation appropriately includes
raise_for_status()to handle HTTP errors properly.tests/test_query.py (3)
85-95: Great test for file loading functionality!This test effectively validates the new functionality to load context from a local file. The use of
tmp_pathfixture is appropriate for creating temporary test files.
98-110: Well-structured HTTP test.Good use of
requests_mockto test the HTTP loading functionality without making actual external requests. The test clearly validates that the HTTP content is correctly fetched and returned.
112-119: Good error case testing.This test properly verifies that attempting to load from an invalid source raises the expected
NotImplementedError. It's important to test error paths as well as successful paths.
... per `ABOUT_CONTEXT_URL`.
About
ABOUT_CONTEXT_URLwas limited to HTTP URLs before. Now, it also accepts a filesystem path.