refactor: simplify agent CLI to context, types, and state (#418)#420
Conversation
Greptile SummaryThis PR simplifies the agent CLI from six subcommands to four ( Key changes:
One remaining issue:
|
| Filename | Overview |
|---|---|
| packages/data-designer/src/data_designer/cli/utils/agent_introspection.py | Major refactor removing schema/builder APIs and adding path-based introspection (_get_source_file, get_family_source_files, get_config_module_path). All previous review fixes (last-occurrence index, posix paths, _SECTION_HEADERS additions, dynamic builder file resolution) are correctly implemented. One remaining issue: _get_source_file still returns an absolute path (not empty string) when data_designer is absent from the path, contradicting the PR description and producing malformed {config_root}//absolute/path output in _format_family_header. |
| packages/data-designer/src/data_designer/cli/utils/agent_text_formatter.py | Clean rewrite replacing schema/builder formatters with path-based family headers and _format_model_aliases_context. {config_root} template pattern is used consistently. _strip_config_prefix correctly strips data_designer/config/ for built-in types; the acknowledged passthrough for other paths is only problematic when _get_source_file returns an absolute path (see introspection comment). _format_model_aliases_context gracefully handles the no-usable-aliases and empty-items cases. |
| packages/data-designer/src/data_designer/cli/commands/agent.py | Straightforward removal of schema_command and builder_command and their imports. Remaining commands (context, types, state model-aliases, state persona-datasets) are unchanged in structure. |
| packages/data-designer-config/src/data_designer/config/base.py | Removes schema_text(), _format_annotation, _get_docstring_summary, _GOOGLE_SECTION_HEADERS, and now-unused re, Enum, Literal, get_args, get_origin imports. Clean deletion with no regressions; the same docstring-parsing logic has been re-implemented (with improvements) in agent_introspection.py. |
| packages/data-designer-config/src/data_designer/config/column_configs.py | Docstring updates only: adds (required) annotations on mandatory fields and Inherited Attributes: sections, and removes bare name: str redeclaration from ExpressionColumnConfig. No behavioural changes. |
| packages/data-designer-config/src/data_designer/config/processors.py | Docstring improvements for DropColumnsProcessorConfig and SchemaTransformProcessorConfig. Bare name: str redeclarations were removed (resolving the previously flagged issue where the parent Field(description=...) metadata was being silently dropped). |
| packages/data-designer/tests/cli/utils/test_agent_text_formatter.py | Tests updated to match the new formatter output: removed schema/builder tests, added test_format_context_text_includes_config_module_path, test_format_context_text_no_usable_aliases_shows_warning, and updated fixture paths to use realistic data_designer/config/... paths that exercise _strip_config_prefix. |
| packages/data-designer/tests/cli/utils/test_agent_introspection.py | Tests updated: removed schema/builder tests, added test_get_family_catalog_includes_description, test_get_family_source_files_returns_relative_paths, and test_get_config_module_path_returns_config_directory. Assertions are appropriate and not fragile. |
| packages/data-designer-config/src/data_designer/config/sampler_constraints.py | Added missing docstrings to ScalarInequalityConstraint and ColumnInequalityConstraint, including Inherited Attributes: sections. No code changes. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
CLI["data-designer agent <cmd>"] --> CTX["context"]
CLI --> TYPES["types [family]"]
CLI --> SMA["state model-aliases"]
CLI --> SPD["state persona-datasets"]
CTX --> GC["get_context()"]
GC --> GCM["get_config_module_path()\n→ .../data_designer/config"]
GC --> GCB["_get_config_builder_file()\n→ _get_source_file(DataDesignerConfigBuilder)"]
GC --> GBF["_get_base_config_file()\n→ _get_source_file(ConfigBase)"]
GC --> GFC["get_family_catalog(family)\n→ [{type: cls.__name__, description: ...}]"]
GC --> GFS["get_family_source_files(family)\n→ [data_designer/config/foo.py, ...]"]
TYPES --> GT["get_types(family?)"]
GT --> GFC
GT --> GFS
GFS --> GSF["_get_source_file(cls)\n→ relative path or abs path"]
GSF -->|data_designer in path| REL["data_designer/config/foo.py"]
GSF -->|data_designer NOT in path| ABS["full_path.as_posix() ⚠️"]
GCM --> FCT["format_context_text(data)\n## Config Module section"]
GFC --> FCT
GFS --> FCT
FCT --> FFH["_format_family_header(info)\nfile: {config_root}/stripped_path"]
FFH --> SCP["_strip_config_prefix(path)\nstrips data_designer/config/"]
SCP -->|normal| OK["file: {config_root}/foo.py ✓"]
SCP -->|absolute path| BAD["file: {config_root}//abs/path ✗"]
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/utils/agent_introspection.py
Line: 300-302
Comment:
**Absolute path fallback produces malformed `{config_root}/` output**
The PR description explicitly states `_get_source_file` was fixed to "return empty string (not absolute path) when `data_designer` is absent", and the previous review reply confirmed the same ("Fixed — returns empty string now"). However, the implementation still returns `full_path.as_posix()` for plugin types whose path contains no `data_designer` component.
When this absolute path (e.g. `/home/user/my_plugin.py`) propagates through `get_family_source_files` — which only guards against `if path` (non-empty) — it reaches `_format_family_header`, which renders:
```
file: {config_root}//home/user/my_plugin.py
```
The double slash makes this a malformed agent-facing path that cannot be resolved by substituting the `{config_root}` template variable. The docstring now documents "absolute path" intent, but the rendering layer in `_format_family_header` (line 85 of `agent_text_formatter.py`) unconditionally prepends `{config_root}/` and only strips the `data_designer/config/` prefix — it has no fallback for absolute paths.
Returning empty string here (consistent with the `TypeError`/`OSError` branch above) would let `get_family_source_files`' `if path and` guard silently drop the entry and avoid the malformed output:
```suggestion
if not indices:
return ""
return Path(*parts[indices[-1] :]).as_posix()
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 162322a
- Remove schema and builder subcommands and all supporting code - Add description column (docstring first paragraph) to types table - Add config_file per family (relative to data_designer package) - Add config_package_path and library_version to context output - Clean section hierarchy: ## for sections, ### for family sub-tables - Add docstrings to ScalarInequalityConstraint and ColumnInequalityConstraint
- Remove unused get_import_path (only used by deleted schema/builder) - Remove unused class_name from catalog dicts - Fix N+1: get_family_source_file uses get_args directly instead of rediscovering all types via discover_family_types
- Add parameters:/params: to _SECTION_HEADERS for docstring parsing - Fix config_package_path to return parent of data_designer package so Path(base) / relative_file resolves correctly - Use last occurrence of data_designer in _get_source_file to handle nested paths (e.g. dev checkouts) - Return list of deduplicated files per family (get_family_source_files) instead of assuming all types live in one file - Add config_builder_file to context output
- Use _get_source_file(DataDesignerConfigBuilder) instead of hardcoded string for config_builder_file, consistent with family file resolution - Fix test assertion that assumed "config" in path (only true in dev)
- _get_source_file returns "" instead of absolute path when data_designer is not in the path, consistent with error branch - Add Config Module section to context output pointing agent to the config module as the only part of the codebase to work with - Rename config_package_path to config_module_path (returns config dir)
Schema rendering is no longer needed in the config layer — the agent CLI now provides file paths so agents can read source files directly.
- Redeclare `name: str` in DropColumnsProcessorConfig and SchemaTransformProcessorConfig so agents see the required field without reading the base class - Add base config file path to agent context output - Optimize agent context formatting: strip redundant path prefixes, remove family count summary, separate usable/unusable model aliases, rename sections for clarity
- Remove bare name: str redeclarations in processor configs that silently dropped the parent Field(description=...) - Use Path.as_posix() in _get_source_file for consistent forward slashes
…ited Attributes - Add (required) to all required parameters in Attributes sections - Add Inherited Attributes section to all config subclasses listing fields from parent classes (SingleColumnConfig, ProcessorConfig, Constraint) - Fix stale with_trace descriptions in LLM subclass inherited sections - Remove discriminator fields from Attributes sections - Remove redundant name: str redeclaration from ExpressionColumnConfig
- Show per-alias reason for unusable models instead of blanket "missing API keys" label - Surface model_config_present: tell agent when no config file exists - Fix test fixtures to use realistic data_designer/config/ paths that exercise _strip_config_prefix
Move `name (required)` to the top of the Inherited Attributes section in LLMCodeColumnConfig, LLMStructuredColumnConfig, and LLMJudgeColumnConfig so required fields appear before optional ones.
- Use {config_root}/file.py path syntax across all agent output
- Add config_root preamble to standalone `agent types` output
- Replace type_name (discriminator) with type (class name) in tables
- Show only usable model aliases; warn agent to surface config issues
- Add directive scoping agents to the config module only
- Reword import hint and config module description for directness
_get_source_file() returned "" for types outside the data_designer package (e.g., plugin configs). Now returns the absolute path so the agent still gets a readable file reference.
main() calls ensure_cli_default_model_settings() before any agent command, so model config is always seeded. The model_config_present=False branch was dead code.
Covers the remaining branch in _format_model_aliases_context where all aliases are unusable and the agent gets a warning to surface to the user.
Address two Greptile review comments: - Add "inherited attributes:" to _SECTION_HEADERS so docstring parsing stops before that section even without a preceding blank line. - Use .as_posix() in get_config_module_path() for consistent forward-slash paths across platforms.
84e9a2c
d6659fd to
84e9a2c
Compare
| if not indices: | ||
| return full_path.as_posix() | ||
| return Path(*parts[indices[-1] :]).as_posix() |
There was a problem hiding this comment.
Absolute path fallback produces malformed
{config_root}/ output
The PR description explicitly states _get_source_file was fixed to "return empty string (not absolute path) when data_designer is absent", and the previous review reply confirmed the same ("Fixed — returns empty string now"). However, the implementation still returns full_path.as_posix() for plugin types whose path contains no data_designer component.
When this absolute path (e.g. /home/user/my_plugin.py) propagates through get_family_source_files — which only guards against if path (non-empty) — it reaches _format_family_header, which renders:
file: {config_root}//home/user/my_plugin.py
The double slash makes this a malformed agent-facing path that cannot be resolved by substituting the {config_root} template variable. The docstring now documents "absolute path" intent, but the rendering layer in _format_family_header (line 85 of agent_text_formatter.py) unconditionally prepends {config_root}/ and only strips the data_designer/config/ prefix — it has no fallback for absolute paths.
Returning empty string here (consistent with the TypeError/OSError branch above) would let get_family_source_files' if path and guard silently drop the entry and avoid the malformed output:
| if not indices: | |
| return full_path.as_posix() | |
| return Path(*parts[indices[-1] :]).as_posix() | |
| if not indices: | |
| return "" | |
| return Path(*parts[indices[-1] :]).as_posix() |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/utils/agent_introspection.py
Line: 300-302
Comment:
**Absolute path fallback produces malformed `{config_root}/` output**
The PR description explicitly states `_get_source_file` was fixed to "return empty string (not absolute path) when `data_designer` is absent", and the previous review reply confirmed the same ("Fixed — returns empty string now"). However, the implementation still returns `full_path.as_posix()` for plugin types whose path contains no `data_designer` component.
When this absolute path (e.g. `/home/user/my_plugin.py`) propagates through `get_family_source_files` — which only guards against `if path` (non-empty) — it reaches `_format_family_header`, which renders:
```
file: {config_root}//home/user/my_plugin.py
```
The double slash makes this a malformed agent-facing path that cannot be resolved by substituting the `{config_root}` template variable. The docstring now documents "absolute path" intent, but the rendering layer in `_format_family_header` (line 85 of `agent_text_formatter.py`) unconditionally prepends `{config_root}/` and only strips the `data_designer/config/` prefix — it has no fallback for absolute paths.
Returning empty string here (consistent with the `TypeError`/`OSError` branch above) would let `get_family_source_files`' `if path and` guard silently drop the entry and avoid the malformed output:
```suggestion
if not indices:
return ""
return Path(*parts[indices[-1] :]).as_posix()
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
Simplifies the agent CLI by removing schema and builder rendering in favor of providing source file paths. The agent can read the config files directly when it needs schema details, which is more flexible and eliminates a complex rendering layer (-430 lines net).
Three subcommands remain:
context(bootstrap),types(catalog with descriptions and file paths), andstate(model aliases, persona datasets).Changes
Config layer cleanup
ConfigBase.schema_text()and all supporting helpers (_format_annotation,_get_docstring_summary,_GOOGLE_SECTION_HEADERS) frombase.py— schema rendering no longer lives in the config layertest_schema_text.py(204 lines)ScalarInequalityConstraint,ColumnInequalityConstraintDropColumnsProcessorConfigdocstring to surfacedrop=TruepreferenceAgent CLI simplification
schemaandbuildersubcommands and all supporting coderoot(absolute path to config module),builder,base, andimport— explicitly tells the agent this is the only part of the codebase to work withdescription(docstring first paragraph) andfile(relative to config root) to the types table##for sections,###for family sub-tablesRobustness fixes (from Greptile review)
_get_source_fileuses last occurrence ofdata_designerin path parts (handles nested dev checkouts)_get_source_filereturns empty string (not absolute path) whendata_designeris absentget_family_source_filesreturns deduplicated list (handles types spanning multiple files)config_builder_fileresolved dynamically via_get_source_file, not hardcodedparameters:andparams:to docstring section headersExample output
Test plan
schema,builder) return "No such command"