llms-txt: Introduce caching for expanding outline into Markdown file by amotl · Pull Request #33 · crate/about

amotl · 2025-05-15T18:20:33Z

About

Apply a bit of monkey patching to support the llms-txt package with content caching.

coderabbitai · 2025-05-15T18:20:41Z

## Walkthrough

This update introduces a centralized HTTP cache client utility, refactors related components to use it, and adds configuration via a new settings class. Caching is now used in both outline expansion and query loading. Dependency and test updates support the new caching and configuration infrastructure.

## Changes

| File(s)                                                                 | Change Summary                                                                                                             |
|-------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| CHANGES.md                                                              | Updated changelog to note caching for outline-to-Markdown expansion in `llms-txt`.                                         |
| pyproject.toml                                                          | Added `platformdirs<5` as a new dependency.                                                                                |
| src/cratedb_about/settings.py                                           | Introduced `Settings` class for global config, including HTTP timeout, cache TTL, and cache path logic using `platformdirs`.|
| src/cratedb_about/util.py                                               | Added `get_cache_client` function to create a configured, cached HTTP client with error handling and logging.              |
| src/cratedb_about/outline/model.py                                      | Updated `to_llms_txt` in `OutlineDocument` to use the new cache client for HTTP requests via `llms_txt`.                   |
| src/cratedb_about/query/model.py                                        | Refactored to use `get_cache_client` and global cache TTL from settings, replacing manual client setup.                    |
| src/cratedb_about/cli.py                                                | Enhanced `outline_url_option` with clearer help, metavar, and a callback converting non-URL strings to expanded Paths.     |
| tests/test_util.py                                                      | Added tests for `get_cache_client`, covering normal operation and error handling with logging.                             |

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant User
    participant OutlineDocument
    participant get_cache_client
    participant llms_txt
    participant HTTP

    User->>OutlineDocument: to_llms_txt()
    OutlineDocument->>get_cache_client: obtain cache-enabled client
    get_cache_client-->>OutlineDocument: returns CacheClient
    OutlineDocument->>llms_txt: create_ctx (with patched HTTP client)
    llms_txt->>HTTP: Fetch resources (cached)
    HTTP-->>llms_txt: Response (possibly from cache)
    llms_txt-->>OutlineDocument: Context as string
    OutlineDocument-->>User: Markdown output

Possibly related PRs

Outline: Add to_llms_txt API method and --format=llms-txt CLI option #23: Originally added the to_llms_txt method and CLI support for the llms-txt format, which this PR extends with caching.
Build llms.txt files from source cratedb-overview.md #3: Introduced initial generation of llms.txt files from markdown, foundational to the caching enhancements here.

Suggested reviewers

kneth
surister
bmunkholm

Poem

In the warren of code, a cache now appears,
With settings and clients, it quells our old fears.
Outline and query, both hop to the store,
While tests keep on checking, and logs help explore.
🐇✨
Now our requests are swift—let’s cache a bit more!


<!-- walkthrough_end -->


---

<details>
<summary>📜 Recent review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 7e7d382ed95560ebff217194b760b1fa7c41e1d4 and dcf6e2f1bab928481b9d909c3a8d4774527f7018.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `src/cratedb_about/cli.py` (1 hunks)

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* src/cratedb_about/cli.py

</details>

</details>
<!-- internal state start -->


<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNwSPbABsvkCiQBHbGlcSHFcLzpIACIfZkQwXAAPTkgASQxcCkVsBg8GNAYEDCJIADN8PhIk7kxaeBL+PC8GjwaCSABZZwBrWnwAdywy+EjoyAG0ZAcBZnUaejkw2A9sREp0ZnwI9GRub19/IJDIDEcBDYBmS40YFcZYTFIUDAl8LylkNG5uL1kG0pbDA9EjyWq4IoAsL4SBkFSRezYH6VUIVPi4e5xBLJUK1Bg9NCkAA09m4ongIwKPnkaFo9UaTEy7EYhWKpTK2AwYng+AwaBauFkt3ckCxiRSPEKBOeqH8ZUoZDy9Ak8DQKFwyCssisaUlpGF932FG4+HW/DKy1QSNo1DayA68GYvHwUmWHkeGFoLUa+AtjJomVhNUwiB5WHaMO6FD6g2Go2kkCW7WytFyUIKkMabCKmCQ8QNHn2Pj8gWCiFCqDQEm0XnhHjRJZVJAG6FCsFwuG4iA4AHoe0R1LBsAINExmD2GFQaD2VPg8D2i14e9cNEZ9MZwFAyPRfTgCMQyMoFgpWOwuLx+MJROJPon5EwlFRVOotDp1yYoHBUJWsGg8IRSHIKcojHNhMi4KgWwcJwXDvBRHxUNRNG0XQwEMDdTAMABhAAJABBAA5ABxABRABlDRmFoDgDGiOiDAsSA8LSA8gNtehoOYZx5F3HMSmkNx7j40gvHwdlKjdSBOX8SIpiiKQKFDXkJimKTuBtY8HS5LxsCUdBTmbWFMlgzlH0kgADMUcXMk8TXITISQaeoCnERoMTaYycm5ZTeNZKEG3cnhsjyRBkF3apak9KFArnCJWheDo1SjGMhnKeMDUrOl1DDWzIjAjUgyQVzSgzNkUCdbIpHy5AGgs44XBssc7OZOp0BaIgMGQAZB34dy+H8PJAzID08mqkta00mEGn2VFIiSeA1AFGlPR4Sg0S4rkPHwbhxDmAAvagw0QVcGMsPCvBoKcjuhSSlAYWsrt5MKLQilEogk/YBBaBgjPCeABIMKACJhYSEw6V6KGPD7h2+371HkO6HsOp70H8F5nPYl5JNB0TSnqMoylXOjojXTDuFkZ0RDEDQCGYLwaOJ06mJYwCjyiTjuPNB4ngBkUlDJT1FXkFpyyxwLzPJynrxplgvBskYEUmZBrUxpZaXpUo1XIFs8WlDwjmweB/HyrhzOiX5qHW+pFOAABWaJzNuYHepWPhQftGF+e3RV/sQElGRGIhsEejASQkthcFpag1QGBVIC4pQTvMM6LqPa6OkCxHnGRzquYhqG+E+2H2GygHgfIIn6MBzDEAoBgJ2A2gBAAfVnedYu9Egey2JQvA0cmGfoxjmNYtmOMcLjYN4j1SEQQSPHMghm6xZvrPjkgMUUMX7nMgB5ZpWgAEXwBhHHYRra1ClTlfUzGtKYY1KltFlM3EvgcOgaArBLY5y2V9ZFjyHFivNeutCQkFuBkcsJBaRc2tteP45UTSQyhK1Kk3pSjmUnDAmgzcxBJHMiSQKEdYBbwwIMS8kcGhfBfisMAcIvpRA/l/RgLRmQqjVOZUguA8GshIHgthmQAAUABKR26RQhrDBggZA91/qBlUmqf01RQgbXAXwVq4IijSIXiAlIo5KgQPbJ2Ah8cciKx6oFUqIFBGSNDI0cynJ5ghEoifHo/dqBFHEVA+AekMSoC0SsegyiUiOVCLWPaox5COmQQVLB/hbR4OSIQhQrxKAFRigfcgN1kr9CGP7BJNB7RCV5DQUJ6AVr+FwMHXO6hphZChP4Xg0h2A5wyrImeHh1Y0NKlCWssgNjODnCtQKzDv5yOZFIwBkBLJeHiKvFINluqbzwG1S60USlo1xvAH6vFSkqMYIUnKElYozXKJULiHYAQnWHqnEOHtbqiCRuIFG4UahvR3IXGGOy4biABkDEGnSHmBVDB1agwcExvNie9L5X0fklz+V1OOCcIF3A8CQrei98DLzmYgBZuAhHrC8GUMOO0wz8i4AIfA7xIAAF5IAADF+TrBEboPQ9gsg2WNtoXOnJQb0FqnU+w8AwXVP8JXEm1cwBGFrvXSc7EW5t1wD2eqshu6KBIH3AetEh5nRZoeRuiIYI8T9EC+e3N+IlgtiFW6RtrxSTNLucWMiVhy0gGM1h8jQitSFRtbA/JUmB2Djnc5bs/KNGob4jw4sADS5CBiRFoKQLC+yUgABl8C0koI1J6WRcgED4Ms2A+l0E3TVHgUY8Nyh8peVgLhG9eHaIEV6myMSPnlGyMwGZ8qFiKqpXgDQFa5YFkgEoMof4Lq0I8J/NNkBqw6TtGpDSUQOgXHsHOOuUQyidskkQUSAgA3mXWFckox1jHcEbSsVeERW2dRoLA3capHgUFoA+KI7QSCkD4EIy4AA2AADP++woheS0EQCI9pJZx1iEqMUjwHqJmBgaNlfk8ADq1pulIkt7AqAtD2iBPhnrmTOhVGZVqqGOpwanTAaAs6A4iuDRhotu790BuPcVY6Rhk5MTuTnYF9ws73Lzu8yGMLPBwp+gi32a5IDO3djdEFIq+RishS9ETBdxPF0yKXTjjMpUyrrg3BVrd+3KvYwCY62rGbD31WxY8HMp6mp5nPAweEDItnMmRDeHGL5TBqjVTyqYlQ3UVN8BwE0uk/G+jnMA3U9L0aDiHSUVAI6UGOhI0dJARjkHtAMEGl8EiRCkL4agDSBB4GkKbc9q9HQkFijZY9N0ACM/6NBAfWIyMDFT6DmWq9Y69brGsdD/YB4DnX0sinun5tqiAYQkajTQhJtAwC8kQc6MkkN5C9Y7Be/r4JYA2QxNQWyFWemEYVoWagxa1gbIXhbXAVsjaIBsi0AQVBYIdFEi5aN9wpFhu0aOu1MHYKIDJAwCkPyM73G+L8HZIa+RsHoNEHtdABBgCVdESD630nyDIA4fwVH4HA9x/NP+cFsGHQjRWC05AQqIG4kQlYWAqk1KoxyYs+2y2QDIgARTTeofI534ynDQIjmIGgXWao0IgAIAoSDjCLbVI7oQicFqFIyho/I/gkjVPYvdG9lLUMjptLm4svMnqIM91h02EdRCPd5izraAvlkwOITGDYsMYmyNgIgpDVmBRh9F2tScmbnUunxhTAmnnZ1rc9IM0LPmafhdpxFMm8J0hsdNzzDvT23pmbKozvaTOxR7OZ09/dZBLJ6qVigC1TszOq7tOreAuBlFEtQFJ23OyXv4R2emCUUmaOyBtwUDeds9+bvtrlG8amoJmVYK7jtAZMQz/QHu3gSBgCK5qudzhVSMPzzny3NlFHhjvZgPIXNs8W6t7VI9hmUdN2L/OMvluK9L70xAaVBgC+P77SXodCvQeEmGzUeQ1BzE1C1WeQSVAUGdAOkGhbWatLkDDetHhfrRDXAQ7R4UIFnCgXOJRXkINfwegBDQjTAh1W7BvJAV1F7Bad7dXNICsJ0FEZAI9WXAXS4TvXGAcEoTvSXOWHXFaQPOHDDV/B5BwZESGS0Cbe4DkFAnKQoPIHaL4LAbaWtANGdZLUXDeDYMdCdYqLnBLCFDiI/SBcJMSKjaxQNBjJLWoFLXQxSRnMgVsGgJ0DJEGQpLpagxAV1DQFNTyHwbNCYHqcyLXQYZuF3SIWlaACgYIQfYQnwvw3nfnGgMiAtcBGyG7RoNUFIgXVJWnDDRKGwxLAjAHTnVqQKGdcwksMVAgpIqXLCPhLCWxW9FDPDKIZjKxUpbIIIigEkcsJ+YkbrEtIg2wldWrWKSBC0TAEnZQjDE+U+RSUdYOKEY9JEMJUUSwySSgbIQtHqJQKhLwVQ+gfwMAKgJAHRIMeYnKZjJRNYWmdeUKcBIKFpCgFUNye4SoEVTXXwaoG45SQUMkEYrIQoEgA9fESDdWbKQ3QLXIBMFRSgPkXwL2QWLkeQD0L0AEEkXYiSHg7EkY6wigo3V3VUDDbZH6ItWKMxfGf4RoaoIqdMArSFPgDFMDEPW5cPWPSPDwQTCPKFdtaGCTX5aTZfBlGtRQ1fU2MdSAbhCfTAoRPvLgTQXeMlXkfkAAbU0AAFUMAwwNT2gSQ29M1cAABdU0ulYDG/CXcffrPvMRLGdfBEe/OVRuf/ecQA8mcyHVSVL/IwIpDUHsAM5uT02QYA0PWzMeI1SeSA92GA5AJAgMmkjfGZAMxAIMkIEM8QLVSvBKFMeE3LGEJM+Q7yXOAKHeOUjA1oqSbMqtEsjDbdFgCyP/Z/TQIdGyJ01FEUEYRSUIAMkkReTMysvhZtdgZuedXxFJBSCHaRY7NA+U6svAggs/F3E3XcXrGgxo5o6s0yDYchCgLiXwTrGEzqEdDrUDMIEIAc4M4cptTA5ucdUYCFFJKRZALYfEaKGEUMZgbwZ+TAa4kgNUrAC4RpbQABFY2vT4jydotDENdcgQ/w7cltGovzdJYpY7QKf4wCjDAJYfSgNbYfQkdiRnDC+4LCoCp4+nZ4f0HlZAaIaAE4XE/o8YSonAnYigPYyil46IJleMegDoYwtGHCTco8wjZjPImgcYVAKlDEZLUIJ1MipIAErANgZ454VqHg99LAKxb4FTegXGWQ6NEIf+O7WQAMk/Facyd8kECgeWcONxVBFaXGFkHaVYkoDklOLk9OIsqPKbITAU0TRPIuZPP6f5WTXkDwRUgTCkC0VbGkBA/SRMk4C7EI2S4siUp6EkchePQU2FLTUKr4NGQTMTHuGc2gERH00mb/X/N01sicFoIAyq0A1mcAieTmaeZzc1cyDuVoEMigLwZudQsMC+HZHofgCipWIyEaKINWLKHImZCOKsZwGyewnQy6K0m6M2bUmwNNAAHwZTSDTRIgdhuj8opCAUUrB2PGmlWXWmO1qiwjTV1FdW4BHReq4ueEmoijqBXRhAim+nUEQRdxoGWFIq2goqUOwuQBIEHEGSwDGSETIlZW2tnQkjVC+wDRSv2yEP4rYopH4DmA7DoB10TENgujAFqh6uyStAAVuDcywS1whJ6HlgyqwEmuTC8l+teOdDp0kiGuUnnWCC4HxsCkFsXRIHcJ4j4Bd0hi6h6miHPV7B7HGAkgVp227D7GiE2NlBn3wKiD5U6VoAAG4XZKBup1hnDd8F0UBZFeQFJJp9JzIF8MQbJBAqZQhmM/tIBSE2B48QxFDKlpB3gpB+KYR/yVBZsdIQb9tINSqRgExSEWwYoKKxabaEpKBmljxNFULQ7sYWANp6B6yjoPKeMvKUYodeTo9/K1ME9+A8qQqdMZNuqsl+Fg4Br+aMAbIhFOE5FISO7zJWU78arjMlV6r4B39r4l074pptJdJvCrKN4lrbLtDUsNFEiGafAmaWaFDAS2LGR7aEzeQwAUaOVILLcQtgxBZ6BWoCdg67dnaDtLx3bDKG9NVuAGsGlGhJrmU/rL6lBc6zqyh5A1hwEJVSZ0IPwjIdwZj/wwDMZQIzw/A0AoI2rYIlg30nwkJXxUJDBIHQJ1Bm5fE8V/AmxY5aBIjI5pD1wDBIGBB/0BBf1aB5QGAABOAAdgAA4OGAAmAAFgYF/TQF/Q4YEAYFUFtnBLYcuF4cuBIFtkuBUFoFtgYG4e4ZwZoc3EgEAzKFkbQBa3/QYDKF4f/Saya0uH/Q4aaxIAMYECawYF4a4e4YJjYbQCEbYaaxYbQFtnUcgbYZIDYdoEuB4boBYdtltgA3BIJm4aa3cZYd4YEDYYA1sfHTYfsasaa1oF4Z8c0dfTKF/RICcaawPQEBYe4Y4YcaKZYdoBYf/RYYYAUY4cybYbYd4dtm4bYbKDYZMY4fUY0YgBPEJsIbA2bhIf+jIebm3BwaAA -->

<!-- internal state end -->
<!-- finishing_touch_checkbox_start -->

<details open="true">
<summary>✨ Finishing Touches</summary>

- [ ] <!-- {"checkboxId": "7962f53c-55bc-4827-bfbf-6a18da830691"} --> 📝 Generate Docstrings

</details>

<!-- finishing_touch_checkbox_end -->
<!-- tips_start -->

---

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

<details>
<summary>❤️ Share</summary>

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)
- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)
- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)
- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

</details>

<details>
<summary>🪧 Tips</summary>

### Chat

There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=crate/about&utm_content=33):

> ‼️ **IMPORTANT**
> Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.
- Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:
  - `@coderabbitai explain this code block.`
  -	`@coderabbitai modularize this function.`
- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`
  - `@coderabbitai read src/utils.ts and explain its main purpose.`
  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`
  - `@coderabbitai help me debug CodeRabbit configuration file.`

### Support

Need help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

### CodeRabbit Commands (Invoked using PR comments)

- `@coderabbitai pause` to pause the reviews on a PR.
- `@coderabbitai resume` to resume the paused reviews.
- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
- `@coderabbitai full review` to do a full review from scratch and review all the files again.
- `@coderabbitai summary` to regenerate the summary of the PR.
- `@coderabbitai generate docstrings` to [generate docstrings](https://docs.coderabbit.ai/finishing-touches/docstrings) for this PR.
- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.
- `@coderabbitai resolve` resolve all the CodeRabbit review comments.
- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.
- `@coderabbitai help` to get help.

### Other keywords and placeholders

- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.
- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.
- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.

### CodeRabbit Configuration File (`.coderabbit.yaml`)

- You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository.
- Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json`

### Documentation and Community

- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.
- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.
- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

</details>

<!-- tips_end -->

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (8)

docs/backlog.md (1)

12-12: Link the timeout issue for test_cli_bundle.
Consider creating or linking to a GitHub issue for the heavy test_cli_bundle test to ensure it’s tracked and addressed systematically.
src/cratedb_about/outline/model.py (2)
94-108: Good implementation of caching for llms-txt package

The integration of the hishel caching library for llms-txt HTTP requests is a solid approach. This will significantly improve performance when expanding outlines into Markdown files by avoiding repeated network requests.

A few suggestions to further enhance this implementation:
-        # Patch `llms_txt` package to use caching per Hishel.
-        # https://hishel.com/
-        controller = hishel.Controller(allow_stale=True)
-        storage = hishel.SQLiteStorage(ttl=3600)
+        # Patch `llms_txt` package to use caching per Hishel.
+        # https://hishel.com/
+        # Get TTL from environment variable or use default (3600s = 1 hour)
+        ttl = int(os.getenv("ABOUT_CACHE_TTL", 3600))
+        controller = hishel.Controller(allow_stale=True)
+        storage = hishel.SQLiteStorage(ttl=ttl)
Also, consider stopping the mock patch after use to prevent any unintended side effects:
         with hishel.CacheClient(controller=controller, storage=storage, timeout=10.0) as client:
             # Patch the client object.
-            mock.patch("llms_txt.core.httpx", client).start()
+            patcher = mock.patch("llms_txt.core.httpx", client)
+            patcher.start()
 
             # Import module lazily to relax dependency surface.
             from llms_txt import create_ctx
 
             # Expand links and output in Markdown format.
             markdown = self.to_markdown()
             ctx = create_ctx(markdown, optional=optional, n_workers=None)
+            # Stop the patcher to avoid affecting other code
+            patcher.stop()
             return str(ctx)
3-5: Don't forget to import os for the suggested improvement

If you implement the suggestion to make the TTL configurable via environment variable, you'll need to add an import for the os module.
import typing as t
from io import StringIO
from unittest import mock
+import os

import hishel
src/cratedb_about/cli.py (2)
16-24: Consider tightening validation around remote URLs and expose a Path type for local files

outline_url_option currently accepts a free-form string that may point either to a local file or to a remote URL.
Two issues come up in practice:

Passing a malformed URL is silently accepted and only fails later in CrateDbKnowledgeOutline.load, far away from the CLI surface.

A local relative path is accepted, but it is not clear from the help text that this is supported.

You could improve UX and early-fail behaviour by:
-outline_url_option = click.option(
-    "--url",
-    "-u",
-    envvar="ABOUT_OUTLINE_URL",
-    type=str,
-    required=False,
-    default=None,
-    help="URL to the outline file. By default, the built-in outline is used.",
-)
+outline_url_option = click.option(
+    "--url",
+    "-u",
+    envvar="ABOUT_OUTLINE_URL",
+    type=str,
+    required=False,
+    default=None,
+    metavar="URL|FILE",
+    help=(
+        "Outline source. Provide either an HTTP(S) URL or a local file path. "
+        "If omitted, the built-in outline is used."
+    ),
+    callback=lambda _, __, v: v if not v or v.startswith(("http://", "https://")) else Path(v).expanduser(),
+)
This surfaces “file vs. URL” semantics early and prevents obviously invalid schemes such as htp://….

72-90: --format inconsistency may confuse users

The outline command still advertises/accepts "llms-txt" while bundle now hard-codes the alias "llm". From a UX perspective having to remember two spellings for the same output is surprising.

Two options:

Accept both aliases on both commands, using click.Choice(["llm", "llms-txt"]), and normalise internally.

Rename everywhere to a single canonical value ("llm" seems shorter) and emit a deprecation warning when the old alias is detected.

Either way, removing the manual if format_ != "llm": … guard is possible because click.Choice already guarantees validity.
src/cratedb_about/query/model.py (1)
91-110: fetch() lacks support for file:// scheme and treats non-HTTP URLs as errors

The current path detection works for local plain paths, but:

file:///absolute/path (a common way to reference local files in environment variables) is not supported.

Any other remote scheme (ftp://, s3://, etc.) raises NotImplementedError, making the error handling path noisy.

A small refactor makes the function more future-proof:
-        path = Path(url)
-        path = path.expanduser().resolve()
-        if path.exists() and path.is_file():
-            return path.read_text()
-        if url.startswith("http"):
+        if url.startswith("file://"):
+            return Path(url[7:]).expanduser().read_text()
+        # Local file (absolute or relative path)
+        path = Path(url).expanduser()
+        if path.is_file():
+            return path.read_text()
+        # HTTP(S) resources
+        if url.startswith(("http://", "https://")):
             response = self.http_client.get(url)
             response.raise_for_status()
             return response.text
-        raise NotImplementedError(f"Unable to load context file. Source: {url}")
+        raise NotImplementedError(f"Unsupported scheme in context URL: {url}")
This preserves existing behaviour while gracefully extending support.
tests/test_query.py (2)
116-135: HTTP payload test is skipped – hishel provides a MockTransport helper

Instead of skipping test_model_payload_from_http, you can simulate the network response deterministically:
from hishel.mock import MockTransport

def test_model_payload_from_http(monkeypatch):
    url = "https://example.com/context.txt"
    content = "HTTP context"
    monkeypatch.setenv("ABOUT_CONTEXT_URL", url)

    transport = MockTransport()
    transport.add(url, text=content)

    loader = CrateDbKnowledgeContextLoader()
    loader.http_client._transport = transport  # inject mocked transport

    result = loader.get_prompt()
    assert content in result
This keeps coverage high and exercises the caching layer.

137-151: caplog needs explicit log-level to capture the message

logger.exception() logs at ERROR level.
If the root logger’s level is higher, the assertion on caplog.messages may fail spuriously.

Add:
with caplog.at_level(logging.ERROR):
    result = loader.get_prompt()
to guarantee the message is captured regardless of test runner settings.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a341f8 and b0b6dfe.

📒 Files selected for processing (16)

.gitignore (1 hunks)
CHANGES.md (1 hunks)
README.md (3 hunks)
codecov.yml (1 hunks)
docs/backlog.md (1 hunks)
pyproject.toml (2 hunks)
src/cratedb_about/bundle/llmstxt.py (2 hunks)
src/cratedb_about/cli.py (3 hunks)
src/cratedb_about/outline/model.py (2 hunks)
src/cratedb_about/query/core.py (5 hunks)
src/cratedb_about/query/model.py (2 hunks)
tests/assets/outline.yaml (1 hunks)
tests/conftest.py (1 hunks)
tests/test_cli.py (5 hunks)
tests/test_outline.py (2 hunks)
tests/test_query.py (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/cratedb_about/query/core.py (1)

src/cratedb_about/query/model.py (3)

CrateDbKnowledgeContextLoader (125-135)

KnowledgeContextLoader (46-122)

get_prompt (111-122)

🔇 Additional comments (27)

pyproject.toml (2)

106-106: Approve test dependency addition.
Adding requests-mock<2 under [tool].optional-dependencies.test aligns with mocking HTTP in tests. No concerns.

76-76:
Details

❓ Verification inconclusive

Verify compatibility and licensing of hishel.
Ensure that hishel<0.2 provides the required HTTP caching and SQLite storage functionality and that its license is compatible with Apache-2.0. Confirm that the latest published version under 0.2 supports Python >=3.9.

Run:

🏁 Script executed:
#!/bin/bash
# Verify `hishel` versions and metadata on PyPI
curl -s https://pypi.org/pypi/hishel/json | jq '.info.version, .info.license'
Length of output: 92

🏁 Script executed:
#!/bin/bash
# Check required Python versions and classifiers for hishel
curl -s https://pypi.org/pypi/hishel/json | jq '.info.requires_python, .info.license, .info.classifiers'
Length of output: 709
Verify hishel compatibility and functionality

The PyPI metadata for hishel 0.1.2 shows:

Python requirement “>=3.9” ✔

License classifier “BSD License” (OSI-approved, compatible with Apache-2.0) ✔

license field is unset (you may want to add it explicitly)

I could not confirm HTTP caching or SQLite storage support from the metadata alone. Please review the package’s documentation or source to ensure it provides the required caching and storage features.

• File: pyproject.toml
Lines: 76
  "hishel<0.2",
.gitignore (1)

7-7: Approve ignoring SQLite cache files.
Adding *.sqlite to .gitignore properly excludes cache artifacts from commits, preventing accidental check-ins of local database files.

tests/assets/outline.yaml (1)

26-28: Approve updated outline test fixture.
The replacement of the "Testing README" entry with "Example Domain" matches the intended CLI output change. Ensure that corresponding tests in tests/test_outline.py have been updated to assert on "Example Domain".

Please verify that tests/test_outline.py assertions reference "Example Domain".

codecov.yml (1)

1-14: Approve Codecov configuration.
The new codecov.yml sets a project coverage target of 85% and makes patch coverage informational, which aligns with the CI strategy. Indentation and YAML syntax are correct.

docs/backlog.md (1)

9-11: Approve backlog item for llms-txt size comparison.
Tracking the comparison of llms.txt file sizes in issue #20 is clear. Ensure the issue remains up-to-date with measurement results.

tests/test_outline.py (2)

109-109: Assertion aligned with updated test asset content.

The change appropriately updates the test expectation to look for "Example Domain" instead of the previous "Testing README", reflecting changes in the test asset file.

125-125: Assertion aligned with updated test asset content.

The change appropriately updates the test expectation to look for "Example Domain" instead of the previous "Testing README", reflecting changes in the test asset file.

src/cratedb_about/bundle/llmstxt.py (3)

19-19: Good addition of configurable outline source.

Adding the outline_url parameter to the LllmsTxtBuilder dataclass enables flexible specification of alternative outline sources, supporting the new CLI --url option and ABOUT_OUTLINE_URL environment variable.

23-23: Improved logging with more descriptive message.

The updated log message is more informative, clearly indicating both the format and output directory.

38-38: Good implementation of configurable outline loading.

The change properly uses the provided outline_url parameter when loading the outline, enabling users to specify alternative outline sources.

Note: There's a TODO comment above this line about optimizing to avoid redundant data processing, which is a good candidate for future improvement.

tests/conftest.py (1)

9-16: Improved test environment cleanup.

Refactoring the environment variable cleanup to use a list-based approach improves maintainability and ensures tests run in a clean environment. The inclusion of the new environment variables ABOUT_CONTEXT_URL and ABOUT_OUTLINE_URL properly supports the new features introduced in this PR.

CHANGES.md (1)

17-23: Comprehensive changelog entries.

The added changelog entries accurately document all the key enhancements in this PR:

Support for --url/ABOUT_OUTLINE_URL option in the Bundle command

Improved handling of --format option

Support for loading context files from local filesystem

Caching for HTTP context payloads

Caching for expanding outlines into Markdown

These detailed entries will help users understand the new capabilities.

src/cratedb_about/query/core.py (4)

43-45: Well-designed integration of the KnowledgeContextLoader

Good decision to use a dataclass field with a default factory to instantiate the context loader. This provides flexibility while maintaining backward compatibility.

86-90: Clean refactoring to use the new context loader

The transition from using static Settings class to the instance-based context loader is clean and maintains the same functionality.

117-129: Good simplification of the prompt message handling

The refactored logic for adding the prompt message to input_messages is cleaner and easier to follow than the previous implementation.

157-158: Consistent use of context loader throughout the code

The instructions field is consistently accessed from the context loader.

README.md (4)

156-157: Updated bundle command example with correct format option

The documentation now correctly reflects the required --format=llm parameter and explicitly includes the output directory.

158-161: Clear explanation of outline file options

Good addition explaining the default behavior and override options through the --url flag or environment variable.

170-173: Helpful tip about list-questions command

This is a useful addition that helps users discover the built-in question suggestions.

182-188: Well-documented configuration options for caching

Excellent addition of notes explaining how to configure the context URL and cache TTL using environment variables. This clearly documents the new caching functionality introduced in this PR.

tests/test_cli.py (5)

48-56: Updated test to match new CLI requirements

The test has been properly updated to use the required --format llm argument and set the appropriate environment variables.

61-61: Updated log assertion to match new message format

Good adjustment to the expected log message.

75-76: Updated test for missing outdir to include format

Test correctly includes the now required format parameter.

85-99: New test for missing format parameter

Excellent addition of a test case to verify that the CLI properly requires the --format option. This improves test coverage for the new CLI behavior.

114-114: Updated error message assertion

The assertion has been correctly updated to match the more specific error message format from Click.
tests/test_query.py (1)

28-31: Patch env vars before instantiating the loader to ensure consistency

Here the loader fixture is created before ABOUT_CONTEXT_URL is patched.
Although the property url is evaluated lazily, the HTTP client and cache were already initialised with the previous value (raising odd edge-cases if both values point to different hosts).

Safer pattern:
def test_model_loader_url_env_success(mocker):
    mocker.patch.dict("os.environ", {"ABOUT_CONTEXT_URL": "http://example.com"})
    loader = CrateDbKnowledgeContextLoader()
    assert loader.url == "http://example.com"
Repeating this approach in similar tests avoids hidden state discrepancies.

src/cratedb_about/query/model.py

src/cratedb_about/outline/model.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/cratedb_about/cli.py (2)
24-25: Suggestion: Improve help text formatting

The help text is well-written but could benefit from more consistent formatting.
- help="Outline source. Provide either an HTTP(S) URL or a local file path. "
- "If omitted, the built-in outline is used.",
+ help="Outline source. Provide either an HTTP(S) URL or a local file path. "
+      "If omitted, the built-in outline is used.",
26-28: Consider adding path existence validation

While the current implementation correctly handles path expansion, it doesn't validate if the file actually exists when a local path is provided. This might lead to confusing errors later in the execution flow.
- callback=lambda _, __, v: v
- if not v or v.startswith(("http://", "https://"))
- else Path(v).expanduser(),
+ callback=lambda _, __, v: v
+     if not v or v.startswith(("http://", "https://"))
+     else Path(v).expanduser().resolve(),
Additionally, you might want to add a more comprehensive validation function:
def validate_outline_source(ctx, param, value):
    if not value or value.startswith(("http://", "https://")):
        return value
    path = Path(value).expanduser().resolve()
    if not path.exists():
        raise click.BadParameter(f"File does not exist: {path}")
    return path

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00f3ea1 and 7e7d382.

📒 Files selected for processing (1)

src/cratedb_about/cli.py (1 hunks)

🔇 Additional comments (1)

src/cratedb_about/cli.py (1)

23-28: Good improvement to CLI option usability and flexibility!

The enhancements to the outline_url_option improve both documentation and behavior:

Adding metavar="URL|FILE" makes the expected input format clear in help output

The updated help text clearly explains the accepted input types and default behavior

The callback function intelligently handles both URL and file path inputs, with proper home directory expansion for file paths

amotl temporarily deployed to pypi May 15, 2025 18:20 — with GitHub Actions Inactive

amotl force-pushed the llms-txt-caching branch from 9f206fc to 2f4622c Compare May 15, 2025 18:20

amotl changed the title ~~llms-txt: Introduced caching for expanding outline into Markdown file~~ llms-txt: Introduce caching for expanding outline into Markdown file May 15, 2025

amotl temporarily deployed to pypi May 15, 2025 18:20 — with GitHub Actions Inactive

amotl force-pushed the llms-txt-caching branch from 2f4622c to b0b6dfe Compare May 15, 2025 19:30

amotl force-pushed the query-cached-context branch from 5a341f8 to 919362e Compare May 15, 2025 19:35

Base automatically changed from query-cached-context to main May 15, 2025 19:36

coderabbitai bot reviewed May 15, 2025

View reviewed changes

src/cratedb_about/query/model.py Show resolved Hide resolved

amotl force-pushed the llms-txt-caching branch from b0b6dfe to 5766f90 Compare May 15, 2025 19:37

amotl temporarily deployed to pypi May 15, 2025 19:37 — with GitHub Actions Inactive

amotl commented May 15, 2025

View reviewed changes

src/cratedb_about/outline/model.py Outdated Show resolved Hide resolved

amotl added 2 commits May 15, 2025 22:56

Outline/llms-txt: Introduce caching for expanding into Markdown file

ac6a1d7

Outline/llms-txt: Improve caching, use centralized factory function

00f3ea1

amotl force-pushed the llms-txt-caching branch from 5766f90 to 00f3ea1 Compare May 15, 2025 20:58

amotl temporarily deployed to pypi May 15, 2025 20:58 — with GitHub Actions Inactive

amotl temporarily deployed to pypi May 15, 2025 21:02 — with GitHub Actions Inactive

coderabbitai bot reviewed May 15, 2025

View reviewed changes

CLI: Improve definition of outline_url_option

dcf6e2f

amotl force-pushed the llms-txt-caching branch from 7e7d382 to dcf6e2f Compare May 15, 2025 21:23

amotl marked this pull request as ready for review May 15, 2025 21:23

amotl temporarily deployed to pypi May 15, 2025 21:23 — with GitHub Actions Inactive

amotl merged commit 5def693 into main May 15, 2025
6 checks passed

amotl deleted the llms-txt-caching branch May 15, 2025 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llms-txt: Introduce caching for expanding outline into Markdown file#33

llms-txt: Introduce caching for expanding outline into Markdown file#33
amotl merged 3 commits intomainfrom
llms-txt-caching

amotl commented May 15, 2025

Uh oh!

coderabbitai bot commented May 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amotl commented May 15, 2025

About

Uh oh!

coderabbitai bot commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented May 15, 2025 •

edited

Loading