Skip to content

feat: add pyseekdb CLI entry point in pyproject.toml#161

Merged
hnwyllmm merged 4 commits into
oceanbase:developfrom
alswl:feat/cli
Jan 31, 2026
Merged

feat: add pyseekdb CLI entry point in pyproject.toml#161
hnwyllmm merged 4 commits into
oceanbase:developfrom
alswl:feat/cli

Conversation

@alswl

@alswl alswl commented Jan 31, 2026

Copy link
Copy Markdown
Contributor

Introduce a CLI for pyseekdb to support debugging and managing SeekDB databases/collections (close #55).

Changes

  • CLI 模块
    • 新增 src/pyseekdb/cli/__init__.pysrc/pyseekdb/cli/main.py,实现命令行入口与子命令分发。
  • 连接方式(与 spec 一致)
    • 嵌入式:--path(默认当前目录下 seekdb.db)。
    • 服务端:--host--port(默认 2881)、--tenant--database--user--password(或环境变量 SEEKDB_PASSWORD)。
  • 子命令
    • dblist / create <name> / delete <name>(数据库管理)。
    • collections(别名 coll):list / create <name> [--dimension N] / delete <name> / info <name> [--sample N](collection 管理及预览)。
    • sql:执行原始 SQL(调试用)。
    • query:按文本对指定 collection 做向量/语义查询(--text--n)。
    • get:按 id 或 limit 拉取 collection 文档(--ids--limit)。
  • 输出:全局 -o table(默认)或 -o json
  • 入口:在 pyproject.toml 中增加 [project.scripts],将 pyseekdb 指向 pyseekdb.cli:main

Solution Description

针对 issue #55,基于现有 Client / AdminClient 和 spec 中的连接与能力说明,实现一个统一的 CLI,便于:

  • 管理:列出/创建/删除数据库与 collections,查看 collection 信息与样本数据。
  • 调试:执行任意 SQL、对 collection 做 query/get,支持 table 与 json 两种输出。

CLI 复用 Client()AdminClient() 的工厂逻辑,支持嵌入式(--path)与服务端(--host/--port 等)两种模式;子命令通过 argparse 组织,无额外依赖。

Test plan

  • 安装后执行 pyseekdb --helppyseekdb db --helppyseekdb collections --help 等,确认子命令与参数展示正确。
  • 在嵌入式或服务端环境下分别验证:db listcollections listcollections info <name> --sample Nsql "SHOW TABLES"query <collection> --text "..."get <collection> --limit N-o json 输出,确认无报错且结果符合预期。

related to #55

Summary by CodeRabbit

  • New Features
    • Packs a command-line interface (CLI) as a public executable entry point.
    • Adds DB management commands: list, create, delete.
    • Adds collection commands: list, create, delete, info (with optional sample).
    • Adds query, document retrieval, and raw SQL execution commands.
    • Supports embedded/server connection modes, configurable credentials, and output formats (table or JSON).

✏️ Tip: You can customize this high-level summary in your review settings.

## Summary
Introduce a new script entry point for the pyseekdb command-line interface.

### Changes
- Added `[project.scripts]` section in `pyproject.toml` to define the `pyseekdb` command, linking it to the `main` function in the `pyseekdb.cli` module.

## Test plan
- Verified the new CLI entry point by running the command and ensuring it executes without errors.
@CLAassistant

CLAassistant commented Jan 31, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@coderabbitai

coderabbitai Bot commented Jan 31, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds a CLI package and console script entrypoint for pyseekdb; implements a full command-line tool (argument parsing, connection factories, admin/collection/query/sql/get commands, output formatting, and a main(argv) entrypoint).

Changes

Cohort / File(s) Summary
Configuration
pyproject.toml
Added console script entrypoint: pyseekdb = "pyseekdb.cli:main".
CLI Package Init
src/pyseekdb/cli/__init__.py
New package init that re-exports main and sets __all__ = ["main"].
CLI Implementation
src/pyseekdb/cli/main.py
New CLI implementation with argument parsing, global connection args (embedded vs server + env password), client/admin factory helpers, SQL execution helpers, table/JSON output formatters, and subcommands: db (list/create/delete), collections (list/create/delete/info), sql, query, and get. Exposes `main(argv: list[str]

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User as "User"
    participant CLI as "CLI Parser\nrgba(52,152,219,0.5)"
    participant Factory as "Connection Factory\nrgba(46,204,113,0.5)"
    participant Client as "Client/Admin Proxy\nrgba(155,89,182,0.5)"
    participant DB as "Database/Server\nrgba(231,76,60,0.5)"
    participant Formatter as "Output Formatter\nrgba(241,196,15,0.5)"

    User->>CLI: run `pyseekdb` with args
    CLI->>CLI: parse global + subcommand args
    CLI->>Factory: request client/admin (embedded | server)
    Factory->>DB: open connection
    DB-->>Factory: connection established
    Factory-->>Client: return proxy
    CLI->>Client: execute subcommand (db/collections/sql/query/get)
    Client->>DB: perform operation
    DB-->>Client: return results
    Client-->>CLI: results returned
    CLI->>Formatter: format results (table|json)
    Formatter-->>CLI: formatted output
    CLI-->>User: display output
    CLI->>DB: close server connection (if applicable)
    DB-->>CLI: closed
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped to craft a little tool,
Flags and commands to keep things cool.
Databases, collections, queries too,
Tables or JSON — pick your view.
A tiny CLI from a rabbit to you.

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title focuses on the entry point addition in pyproject.toml but the PR primarily introduces a comprehensive CLI module with 400+ lines of implementation; this is partially related but not the main point. Consider a more descriptive title like 'feat: add pyseekdb CLI with database and collection management' that captures the full scope of the implementation.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #55: CLI for database/collection management, support for embedded and server modes, subcommands (db, collections, sql, query, get), output formatting, and minimal dependencies.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the CLI feature: pyproject.toml entry point, init.py package setup, and main.py with all required subcommands and functionality.
Docstring Coverage ✅ Passed Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/pyseekdb/cli/main.py`:
- Around line 200-229: cmd_collections_info currently emits two separate JSON
documents when args.output == "json" (one for info via _print_json(info) and
another for sample), breaking machine parsing; change the logic so when
args.output == "json" you build a single payload (for example {"info": info,
"sample": sample_or_empty}) and call _print_json exactly once after retrieving
sample (use coll.peek when args.sample and count>0), otherwise keep the existing
human-readable prints; update the branches around _print_json and sample
handling in cmd_collections_info to assemble and print one JSON object.
- Around line 16-35: In _add_connection_args, make embedded vs server options
mutually exclusive by replacing the argument_group with
parser.add_mutually_exclusive_group(required=False) (or a non-required mutual
group) and add the conflicting flags (--path and --host) into that group so the
parser enforces that users cannot supply both; keep other server flags (--port,
--tenant, --database, --user, --password) as normal arguments but ensure only
--path or --host can be chosen at parse time.
🧹 Nitpick comments (2)
src/pyseekdb/cli/main.py (2)

72-75: Remove unused # noqa: SLF001 directives.

Ruff reports these as unused because SLF001 isn’t enabled. Removing them here (and the other cleanup lines) will keep lint clean.

♻️ Suggested cleanup
-    server = client._server  # noqa: SLF001
-    return server._execute(sql)  # noqa: SLF001
+    server = client._server
+    return server._execute(sql)

338-394: Add connection args to db and collections subparsers for consistent option placement.

Currently, sql, query, and get subcommands accept connection flags after the subcommand (via _add_connection_args() on their parsers), while db and collections only accept them before the subcommand. This inconsistency makes the UX unpredictable. Adding _add_connection_args() to all subparsers would allow consistent flag placement across all commands.

🔧 Suggested adjustment
     db = sub.add_parser("db", help="Database (admin) operations")
+    _add_connection_args(db)
     db_sub = db.add_subparsers(dest="db_command", required=True)

     _add_collection_subparsers(sub.add_parser("collections", help="Collection operations"))
+    collections_p = sub.add_parser("collections", help="Collection operations")
+    _add_connection_args(collections_p)
+    _add_collection_subparsers(collections_p)
     _add_collection_subparsers(sub.add_parser("coll", help="Alias for collections"))
+    coll_p = sub.add_parser("coll", help="Alias for collections")
+    _add_connection_args(coll_p)
+    _add_collection_subparsers(coll_p)

Comment thread src/pyseekdb/cli/main.py
Comment on lines +16 to +35
def _add_connection_args(parser: argparse.ArgumentParser) -> None:
"""Add global connection options (embedded vs server)."""
g = parser.add_argument_group("connection (choose one)")
g.add_argument(
"--path",
metavar="DIR",
default=None,
help="Embedded mode: path to seekdb data directory (default: seekdb.db in cwd)",
)
g.add_argument("--host", default=None, help="Server mode: host (e.g. localhost)")
g.add_argument("--port", type=int, default=2881, help="Server mode: port (default: 2881)")
g.add_argument("--tenant", default="sys", help="Server mode: tenant (default: sys)")
g.add_argument("--database", "-d", default="test", help="Database name (default: test)")
g.add_argument("--user", default="root", help="Server mode: user (default: root)")
g.add_argument(
"--password",
"-p",
default=None,
help="Server mode: password (or set SEEKDB_PASSWORD)",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "main.py" -path "*/cli/*" | head -5

Repository: oceanbase/pyseekdb

Length of output: 88


🏁 Script executed:

cat -n src/pyseekdb/cli/main.py | head -60

Repository: oceanbase/pyseekdb

Length of output: 2470


🏁 Script executed:

rg "def _add_connection_args" -A 30 src/pyseekdb/cli/main.py

Repository: oceanbase/pyseekdb

Length of output: 1329


🏁 Script executed:

rg "args\.path\|args\.host" src/pyseekdb/cli/main.py -A 2 -B 2

Repository: oceanbase/pyseekdb

Length of output: 44


🌐 Web query:

argparse mutually exclusive group best practice Python

💡 Result:

argparse mutually exclusive group: best practices

  1. Use it for “choose one of these options” flags

    • Create a group with parser.add_mutually_exclusive_group() and add the options to that group; argparse will reject cases where the user supplies more than one. [1]
  2. If the user must pick one, set required=True on the group (not the individual args)

    • required=True means “at least one of these options must be provided”. [1]
    • Don’t try to set required=True on each argument inside the group (that’s not how exclusivity is intended to be expressed). [1]
  3. Prefer subcommands (add_subparsers) when you’re modeling verbs/modes

    • For CLI patterns like tool start|stop|restart, subparsers usually produce clearer help/usage and extensibility than a mutually exclusive set of boolean flags. [2]
  4. Help formatting: put the exclusive group inside an argument group if you need a title/description

    • Mutually exclusive groups don’t take title/description directly, but you can create an add_argument_group(...) and then call group.add_mutually_exclusive_group(...) inside it. [1]
  5. Avoid unsupported/removed nesting

    • As of Python 3.14, calling add_argument_group() or add_mutually_exclusive_group() on a mutually exclusive group raises an exception (it was deprecated in 3.11 and removed in 3.14). [1]

Minimal example

import argparse

p = argparse.ArgumentParser()
mode = p.add_mutually_exclusive_group(required=True)
mode.add_argument("--json", action="store_true")
mode.add_argument("--yaml", action="store_true")

args = p.parse_args()

Sources: Python argparse docs (Mutual exclusion, grouping, deprecations) [1]; Stack Overflow discussion showing subparsers as the better fit for verb-style commands [2].

[1] (docs.python.org)
[2] (stackoverflow.com)

Citations:


🏁 Script executed:

find . -name "*.py" -path "*/client*" | grep -v __pycache__ | head -10

Repository: oceanbase/pyseekdb

Length of output: 458


🏁 Script executed:

rg "class Client" -A 15 src/pyseekdb/

Repository: oceanbase/pyseekdb

Length of output: 1070


🏁 Script executed:

rg "def __init__" src/pyseekdb/client/ -A 20 | head -80

Repository: oceanbase/pyseekdb

Length of output: 6236


🏁 Script executed:

cat -n src/pyseekdb/client/__init__.py | head -100

Repository: oceanbase/pyseekdb

Length of output: 4013


🏁 Script executed:

rg "class Client" -B 2 -A 20 src/pyseekdb/client/__init__.py

Repository: oceanbase/pyseekdb

Length of output: 44


🏁 Script executed:

rg "def __init__.*path.*host" -A 30 src/pyseekdb/client/

Repository: oceanbase/pyseekdb

Length of output: 44


🏁 Script executed:

cat -n src/pyseekdb/client/__init__.py | sed -n '100,150p'

Repository: oceanbase/pyseekdb

Length of output: 2123


🏁 Script executed:

rg "^def Client" -A 50 src/pyseekdb/client/__init__.py

Repository: oceanbase/pyseekdb

Length of output: 2254


Make embedded vs server mode mutually exclusive.

--path and --host can both be provided, which silently favors embedded mode. Using add_mutually_exclusive_group() enforces the choice at parse time and prevents ambiguous configuration.

🔧 Suggested fix
-    g = parser.add_argument_group("connection (choose one)")
-    g.add_argument(
-        "--path",
-        metavar="DIR",
-        default=None,
-        help="Embedded mode: path to seekdb data directory (default: seekdb.db in cwd)",
-    )
-    g.add_argument("--host", default=None, help="Server mode: host (e.g. localhost)")
+    g = parser.add_argument_group("connection")
+    mode = g.add_mutually_exclusive_group()
+    mode.add_argument(
+        "--path",
+        metavar="DIR",
+        default=None,
+        help="Embedded mode: path to seekdb data directory (default: seekdb.db in cwd)",
+    )
+    mode.add_argument("--host", default=None, help="Server mode: host (e.g. localhost)")
🤖 Prompt for AI Agents
In `@src/pyseekdb/cli/main.py` around lines 16 - 35, In _add_connection_args, make
embedded vs server options mutually exclusive by replacing the argument_group
with parser.add_mutually_exclusive_group(required=False) (or a non-required
mutual group) and add the conflicting flags (--path and --host) into that group
so the parser enforces that users cannot supply both; keep other server flags
(--port, --tenant, --database, --user, --password) as normal arguments but
ensure only --path or --host can be chosen at parse time.

Comment thread src/pyseekdb/cli/main.py Outdated
Comment thread src/pyseekdb/cli/__init__.py Outdated
alswl added 2 commits January 31, 2026 16:03
… output

## Summary
Updated the `cmd_collections_info` function to include a sample of collection data in the JSON output format. This change allows users to see a preview of the collection's contents alongside its metadata.

### Changes
- Modified the JSON output to include both collection info and a sample of documents when the `--sample` argument is provided.
- Removed redundant sample printing logic from the non-JSON output section.

## Test plan
- Verified the updated command by running it with various output formats and sample sizes to ensure correct functionality.
Updated the CLI documentation in `__init__.py` and `main.py` to use lowercase "seekdb" for consistency across the project. This change enhances the clarity and uniformity of the command-line interface documentation.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/pyseekdb/cli/main.py`:
- Around line 392-397: The get subparser currently allows both --ids and --limit
which is ambiguous; modify the parser setup for the "get" command (the get_p
parser) to enforce mutual exclusivity by creating a mutually exclusive argument
group (e.g., group = get_p.add_mutually_exclusive_group()) and move the --ids
and --limit definitions into that group (keep their types/defaults and help
text), or alternatively add a runtime check in the get command handler to raise
an error if both args.ids and args.limit are provided; update code referencing
get_p, --ids, and --limit accordingly.
- Around line 177-183: The dimension check in cmd_collections_create currently
treats 0 as falsy and drops the configuration; change the logic to test for None
explicitly so a provided value of 0 is honored: build config =
HNSWConfiguration(dimension=args.dimension) if args.dimension is not None else
None and then pass that config into client.create_collection (references:
cmd_collections_create, HNSWConfiguration, client.create_collection).

Comment thread src/pyseekdb/cli/main.py
Comment on lines +177 to +183
def cmd_collections_create(args: argparse.Namespace) -> int:
"""Create collection."""
client = _make_client(args)
try:
config = HNSWConfiguration(dimension=args.dimension) if args.dimension else None
client.create_collection(args.name, configuration=config)
print(f"Created collection: {args.name}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate --dimension explicitly; 0 is currently ignored.

The truthy check skips config when --dimension 0 is passed, silently creating a collection without the intended dimension.

🛠️ Suggested fix
-        config = HNSWConfiguration(dimension=args.dimension) if args.dimension else None
+        if args.dimension is not None and args.dimension <= 0:
+            print("dimension must be positive", file=sys.stderr)
+            return 1
+        config = HNSWConfiguration(dimension=args.dimension) if args.dimension is not None else None
🤖 Prompt for AI Agents
In `@src/pyseekdb/cli/main.py` around lines 177 - 183, The dimension check in
cmd_collections_create currently treats 0 as falsy and drops the configuration;
change the logic to test for None explicitly so a provided value of 0 is
honored: build config = HNSWConfiguration(dimension=args.dimension) if
args.dimension is not None else None and then pass that config into
client.create_collection (references: cmd_collections_create, HNSWConfiguration,
client.create_collection).

Comment thread src/pyseekdb/cli/main.py
Comment on lines +392 to +397
get_p = sub.add_parser("get", help="Get documents from collection")
_add_connection_args(get_p)
get_p.add_argument("collection", help="Collection name")
get_p.add_argument("--ids", default=None, help="Comma-separated IDs")
get_p.add_argument("--limit", type=int, default=10, help="Max rows (default: 10)")
get_p.add_argument("--include", nargs="+", default=None, help="Include fields")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Prevent --ids and --limit from being passed together.

Spec says “by id or by limit”; allowing both is ambiguous.

🛠️ Suggested fix
-    get_p.add_argument("--ids", default=None, help="Comma-separated IDs")
-    get_p.add_argument("--limit", type=int, default=10, help="Max rows (default: 10)")
+    get_grp = get_p.add_mutually_exclusive_group()
+    get_grp.add_argument("--ids", default=None, help="Comma-separated IDs")
+    get_grp.add_argument("--limit", type=int, default=10, help="Max rows (default: 10)")
🤖 Prompt for AI Agents
In `@src/pyseekdb/cli/main.py` around lines 392 - 397, The get subparser currently
allows both --ids and --limit which is ambiguous; modify the parser setup for
the "get" command (the get_p parser) to enforce mutual exclusivity by creating a
mutually exclusive argument group (e.g., group =
get_p.add_mutually_exclusive_group()) and move the --ids and --limit definitions
into that group (keep their types/defaults and help text), or alternatively add
a runtime check in the get command handler to raise an error if both args.ids
and args.limit are provided; update code referencing get_p, --ids, and --limit
accordingly.

@alswl

alswl commented Jan 31, 2026

Copy link
Copy Markdown
Contributor Author

All issues are solved, except several minor issues (improvement and styles).

@alswl

alswl commented Jan 31, 2026

Copy link
Copy Markdown
Contributor Author

@hnwyllmm Please review this pull request.

@hnwyllmm hnwyllmm merged commit 82f1f4e into oceanbase:develop Jan 31, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: client tool to debug and manage collections

3 participants