docs(package): Add user documents for admin-tools scripts.#1225
Conversation
WalkthroughAdds two new documentation pages under docs/src/user-guide (reference-overview and reference-admin-tools) and updates the Reference toctree in docs/src/user-guide/index.md to include them. Changes are documentation-only; no code or public API edits. Changes
Sequence Diagram(s)sequenceDiagram
participant User as Operator
participant Tool as admin-tools (archive-manager.sh / dataset-manager.sh)
participant CLP as CLP service
participant FS as Filesystem (archives/datasets)
rect #F0FFF4
User->>Tool: invoke (list / del --ids / del --filter / --dataset)
Tool->>CLP: verify service running
alt CLP running
CLP-->>Tool: OK
Tool->>FS: scan/list archives or datasets
FS-->>Tool: results
Tool-->>User: display results / prompt confirm
alt deletion confirmed
Tool->>FS: remove files
FS-->>Tool: deletion result
Tool-->>User: report summary
end
else CLP not running
CLP-->>Tool: error
Tool-->>User: error (requires CLP started)
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
admin-tools scripts.
admin-tools scripts.admin-tools scripts.
quinntaylormitchell
left a comment
There was a problem hiding this comment.
Good; my only two comments that aren't commitable suggestions are 1. do you think "admin-tools" should be monospaced (not a big deal), and 2. were you going to talk about dataset-manager.sh at all?
Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca>
quinntaylormitchell
left a comment
There was a problem hiding this comment.
Comments specifically on the dataset-manager sections that hadn't been pushed when I did my last review; be mindful that most of my previous comments should still be relevant even though they say "outdated" now
Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca>
There was a problem hiding this comment.
Actionable comments posted: 11
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
docs/src/user-guide/index.md(1 hunks)docs/src/user-guide/reference-admin-tools.md(1 hunks)docs/src/user-guide/reference-overview.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/user-guide/reference-admin-tools.md
[uncategorized] ~34-~34: Possible missing comma found.
Context: ...lp-jsongroups archives into different datasets whereasclp-text` currently does not. ...
(AI_HYDRA_LEO_MISSING_COMMA)
🔇 Additional comments (1)
docs/src/user-guide/index.md (1)
90-95: LGTM: Reference toctree updated correctly.The new entries integrate cleanly; placing reference-overview first is sensible. No issues found.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
docs/src/user-guide/reference-overview.md (2)
16-21: Stray apostrophe issue is resolved.The parenthetical now correctly reads "
clp-text's (andclp's)". Thanks for fixing.
31-31: Schema card now links to the correct target.
:link: reference-unstructured-schema-filematches the title and toctree. Good correction.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/src/user-guide/reference-overview.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-18T14:35:20.485Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
Applied to files:
docs/src/user-guide/reference-overview.md
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
docs/src/user-guide/reference-overview.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
docs/src/user-guide/reference-overview.md (2)
6-36: Nice use of sphinx-design grid/cards; structure reads cleanly.Cards, links, and copy are concise and consistent. This page will work well as a reference hub.
6-36: Link targets and index wiring verifiedAll referenced pages exist, and the user-guide index (
docs/src/user-guide/index.md) includes the corresponding toctree entries under the Reference section. No further changes needed.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (8)
docs/src/user-guide/reference-admin-tools.md (8)
110-113: Make object-storage support explicit for dataset-manager.This section mentions object-storage concepts later but doesn't state support here. Please confirm and document explicitly.
::: {note} Currently, `dataset-manager.sh` is only supported for `clp-json` (not `clp-text`). ::: + +:::{note} +Object storage support: Confirm whether `dataset-manager.sh` supports managing datasets on object-storage backends (e.g., S3, GCS). If unsupported, add a limitation similar to `archive-manager.sh`; if supported, clarify any behavioural differences (e.g., deletion of entire `key_prefix`). +:::
12-15: Grammar: “after CLP is started.”Tighten the phrasing for clarity.
- The admin scripts can only be used after CLP starts. For help using CLP, see + The admin scripts can only be used after CLP is started. For help using CLP, see
59-60: Consistent punctuation in numbered list.Match the earlier items that use a trailing colon.
-3. Delete archives by ID +3. Delete archives by ID:
75-83: Normalise placeholders and make units explicit.Keep placeholders consistent with the “find” example and specify milliseconds.
```bash sbin/admin-tools/archive-manager.sh del by-filter \ <begin-epoch-timestamp-millis> \ <end-epoch-timestamp-millis> ``` @@ - * Replace `<begin-epoch-time-millis>` with the timestamp of the time range's beginning. - * Replace `<end-epoch-time-millis>` with the timestamp of time range's end. + * Replace `<begin-epoch-timestamp-millis>` with the timestamp of the time range's beginning (in milliseconds since the Unix epoch). + * Replace `<end-epoch-timestamp-millis>` with the timestamp of the time range's end (in milliseconds since the Unix epoch).
118-124: Strengthen deletion warning: irreversible and backups.Make the risk explicit and advise safeguards.
:::{warning} When deleting a dataset, `dataset-manager.sh` removes all files under the dataset's storage directory (`key_prefix` in object storage). Therefore, any non-archive files in this storage -directory will also be deleted. Users should avoid storing non-archive files in the archive storage -directory. +directory will also be deleted. This operation is irreversible. Users should avoid storing non-archive +files in the archive storage directory and ensure appropriate backups/retention policies are in place. :::
69-74: Clarify whether time-range bounds are inclusive or exclusive.State the boundary semantics so users can predict deletions precisely.
:::{note} - Only archives whose log events fall entirely within the specified time range will be deleted. + Only archives whose log events fall entirely within the specified time range will be deleted. + Both the `begin` and `end` timestamps are inclusive. If an archive's first or last event exactly + matches a bound, it is considered within the range. Archives whose log events fall outside the specified time range will not be deleted, even if those archives also contain log events that fall inside the specified time range. :::If this is not accurate, please adjust the “inclusive” language to reflect the actual implementation.
86-96: Specify units for UTC offset and align variable naming with units.Make the example unambiguous and include a brief behavioural note. This also aligns with prior learnings about offset adjustments.
- To avoid this issue, you can adjust the given timestamps to account for the offset: + To avoid this issue, you can adjust the given timestamps to account for the offset (in milliseconds): @@ - adjusted_epoch_timestamp_millis = epoch_timestamp_millis - signed_utc_offset + adjusted_epoch_timestamp_millis = epoch_timestamp_millis - signed_utc_offset_millis + + // For negative UTC offsets (e.g., UTC-5), this increases the adjusted timestamp; for positive offsets (e.g., UTC+2), it decreases it.Note: This matches previously documented guidance on offset adjustments.
51-57: Unify placeholder naming and specify units (ms).Use a consistent placeholder form and make units explicit to avoid ambiguity.
- --begin-ts <begin-epoch-time-millis> \ - --end-ts <end-epoch-time-millis> + --begin-ts <begin-epoch-timestamp-millis> \ + --end-ts <end-epoch-timestamp-millis> @@ - * Replace `<begin-epoch-time-millis>` with the timestamp of the time range's beginning. - * Replace `<end-epoch-time-millis>` with the timestamp of time range's end. + * Replace `<begin-epoch-timestamp-millis>` with the timestamp of the time range's beginning (in milliseconds since the Unix epoch). + * Replace `<end-epoch-timestamp-millis>` with the timestamp of the time range's end (in milliseconds since the Unix epoch).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/src/user-guide/reference-admin-tools.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-15T21:48:46.674Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:57-66
Timestamp: 2025-08-15T21:48:46.674Z
Learning: When dealing with timezone offset adjustments for retention periods in CLP, the correct formula is `adjusted_retention_period = retention_period - UTC_offset` (not using absolute value). For negative UTC offsets, this increases the retention period; for positive UTC offsets, this decreases it, properly compensating for how local timestamps are interpreted by the UTC-based garbage collector.
Applied to files:
docs/src/user-guide/reference-admin-tools.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
docs/src/user-guide/reference-admin-tools.md (4)
11-15: Tighten phrasing: “after CLP is started.”Use the passive form for clarity and consistency with prior docs.
-The admin scripts can only be used after CLP starts. For help using CLP, see +The admin scripts can only be used after CLP is started. For help using CLP, see [clp-json quick start](quick-start/clp-json.md) or [clp-text quick start](quick-start/clp-text.md) for details.
71-75: Clarify inclusivity/exclusivity of time-range bounds.Please state whether begin/end are inclusive or exclusive so users can predict deletions at boundaries.
Would you confirm the intended semantics? If you confirm, I can submit a follow-up patch to add a single sentence (e.g., “Both begin and end timestamps are inclusive” or “begin is inclusive; end is exclusive”) plus a quick example.
#!/bin/bash # Try to infer semantics from scripts/docs. # 1) Search archive-manager.sh for by-filter usage and any mention of inclusive/exclusive. rg -nC3 -g '!**/node_modules/**' -g '!**/build/**' 'by-filter|--begin-ts|--end-ts|inclusive|exclusive|time range' sbin admin docs scripts # 2) Scan any server-side handlers (if present) that implement the filter. rg -nC3 -P 'delete.*(time|timestamp)|filter.*(time|timestamp)' --type=py --type=go --type=ts --type=java --type=cpp
115-118: Make object-storage support explicit for dataset-manager.This section references object-storage concepts later (“key_prefix”) but doesn’t state whether dataset-manager supports object-storage backends. Make this explicit to avoid confusion.
::: {note} Currently, `dataset-manager.sh` is only supported for `clp-json` (not `clp-text`). ::: + +:::{note} +Object-storage support: <Confirm whether `dataset-manager.sh` supports managing datasets on object-storage backends (e.g., S3, GCS). If supported, keep the "key_prefix" terminology below; if not, add an explicit limitation similar to `archive-manager.sh`.> +:::
123-129: Strengthen deletion warning: call out irreversibility and backups.Explicitly warn that deletion cannot be undone and recommend backups/snapshots.
:::{warning} When deleting a dataset, `dataset-manager.sh` removes all files under the dataset's storage directory (`key_prefix` in object storage). Therefore, any non-archive files in this storage -directory will also be deleted. Users should avoid storing non-archive files in the archive storage -directory. +directory will also be deleted. This operation is irreversible. Users should avoid storing non-archive +files in the archive storage directory and ensure appropriate backups/retention policies are in place. :::
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/src/user-guide/reference-admin-tools.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-15T21:48:46.674Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:57-66
Timestamp: 2025-08-15T21:48:46.674Z
Learning: When dealing with timezone offset adjustments for retention periods in CLP, the correct formula is `adjusted_retention_period = retention_period - UTC_offset` (not using absolute value). For negative UTC offsets, this increases the retention period; for positive UTC offsets, this decreases it, properly compensating for how local timestamps are interpreted by the UTC-based garbage collector.
Applied to files:
docs/src/user-guide/reference-admin-tools.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (5)
docs/src/user-guide/reference-admin-tools.md (5)
33-38: Comma after “datasets” is correct.Punctuation and wording look good here.
50-59: Good consistency between flags and placeholders; units are explicit.The --begin-ts/--end-ts flags match the angle-bracket placeholders, and milliseconds are clearly stated.
61-66: Consistent list punctuation.Adding the trailing colon aligns with the earlier items.
77-87: Placeholders and units are consistent in the delete-by-filter example.Looks good.
95-100: UTC offset formula and units are correct.Using signed_utc_offset_millis and subtracting it matches prior guidance for timezone adjustments.
quinntaylormitchell
left a comment
There was a problem hiding this comment.
Mostly lgtm, only one comment.
Also: pushed directly some changes re. linking the reference overview page, and clarifying that the timestamp arguments passed to the commands should be unix epoch milliseconds.
| information from log events that have it.) Using the script on archives with non-UTC timestamps | ||
| can lead to an effective time range that is different from the intended value. |
There was a problem hiding this comment.
| information from log events that have it.) Using the script on archives with non-UTC timestamps | |
| can lead to an effective time range that is different from the intended value. | |
| information from log events.) Using the script on archives with non-UTC timestamps can lead to an | |
| effective time range that is different from the intended value. |
There was a problem hiding this comment.
I prefer to keep "that have it" since a careful reader should realize that it's only a problem if they are already logging time zone information (which isn't always the case).
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/src/user-guide/index.md (1)
41-45: Route the Reference card to the overview — good. Consider clarifying the card blurb.Linking the Reference card to the new overview page is a sensible IA improvement. Minor copy nit: "etc." is vague; explicitly mention the new content (admin tools) to set user expectations.
Apply this diff to tighten the blurb:
-Reference docs like format specifications, etc. +Reference docs: search syntax, admin tools, schema files, and format specifications.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/src/user-guide/index.md(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
| :hidden: | ||
| :caption: Reference | ||
|
|
||
| reference-overview | ||
| reference-json-search-syntax | ||
| reference-text-search-syntax | ||
| reference-admin-tools | ||
| reference-unstructured-schema-file | ||
| ::: |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Reference toctree updates look correct; please verify link health and page presence.
The insertion of reference-overview and reference-admin-tools in the Reference toctree aligns with the new navigation. Please verify that:
- both pages exist under
docs/src/user-guide/, - each has a top-level heading so Sphinx renders nice titles,
- all intra-doc links resolve.
Optionally, add :titlesonly: to keep the sidebar concise.
Suggested tweaks (optional):
:::{toctree}
:hidden:
:caption: Reference
+:titlesonly:
reference-overview
reference-json-search-syntax
reference-text-search-syntax
reference-admin-tools
reference-unstructured-schema-file
:::Run this script to verify presence and link health:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Check files exist
printf "Checking required files...\n"
fd -a --strip-cwd-prefix -t f 'reference-(overview|admin-tools)\.md' docs/src/user-guide
# 2) Ensure each has an H1 heading
printf "\nChecking top-level headings...\n"
rg -nN '^#\s+' docs/src/user-guide/reference-overview.md docs/src/user-guide/reference-admin-tools.md
# 3) Build linkcheck (assumes Sphinx env is set up)
printf "\nRunning Sphinx linkcheck (this may take a while)...\n"
sphinx-build -b linkcheck docs/src docs/_build/linkcheck
printf "\nLinkcheck report:\n"
sed -n '1,200p' docs/_build/linkcheck/output.txt || trueLength of output: 369
The initial script failed due to the incompatible --strip-cwd-prefix flag. Let’s try a corrected version:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Check files exist
printf "Checking required files...\n"
fd -a -t f 'reference-(overview|admin-tools)\.md' docs/src/user-guide
# 2) Ensure each has a top-level H1 heading
printf "\nChecking top-level headings...\n"
rg -nN '^#\s+' docs/src/user-guide/reference-overview.md docs/src/user-guide/reference-admin-tools.md
# 3) Build Sphinx linkcheck (requires Sphinx environment)
printf "\nRunning Sphinx linkcheck (this may take a while)...\n"
sphinx-build -b linkcheck docs/src docs/_build/linkcheck
printf "\nLinkcheck report (first 200 lines):\n"
sed -n '1,200p' docs/_build/linkcheck/output.txt || trueLength of output: 860
Reference toctree pages verified; link health pending manual check
Files reference-overview.md and reference-admin-tools.md exist under docs/src/user-guide/ and both begin with a top-level H1 heading. Please manually verify that all intra-doc links resolve correctly in a full Sphinx build.
Locations to verify:
- docs/src/user-guide/reference-overview.md
- docs/src/user-guide/reference-admin-tools.md
Suggested optional tweak to keep the sidebar concise:
:::{toctree}
:hidden:
:caption: Reference
+:titlesonly:
reference-overview
reference-json-search-syntax
reference-text-search-syntax
reference-admin-tools
reference-unstructured-schema-file
:::📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| :hidden: | |
| :caption: Reference | |
| reference-overview | |
| reference-json-search-syntax | |
| reference-text-search-syntax | |
| reference-admin-tools | |
| reference-unstructured-schema-file | |
| ::: | |
| :::{toctree} | |
| :hidden: | |
| :caption: Reference | |
| :titlesonly: | |
| reference-overview | |
| reference-json-search-syntax | |
| reference-text-search-syntax | |
| reference-admin-tools | |
| reference-unstructured-schema-file | |
| ::: |
🤖 Prompt for AI Agents
In docs/src/user-guide/index.md around lines 88 to 96, the reviewer asks you to
manually verify that intra-document links in the referenced pages resolve in a
full Sphinx build and suggests an optional sidebar tweak; run a full local
Sphinx build (make html) and open the produced HTML to confirm all links from
docs/src/user-guide/reference-overview.md and
docs/src/user-guide/reference-admin-tools.md resolve without broken anchors, fix
any relative link targets or headings in those files so links point to existing
anchors, and if you want a more concise sidebar adjust the toctree (e.g.,
collapse or remove verbose entries or use :maxdepth: or a more selective list)
so the sidebar stays compact.
…#1225) Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca> Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Description
As described in the PR title, this PR adds user documents for admin-tools.
Checklist
breaking change.
Validation performed
Built the page locally.
Summary by CodeRabbit