Skip to content

display_sample_record() silently omits plugin-generated columns #345

@3mei

Description

@3mei

Priority Level

High (Major functionality broken)

Describe the bug

Affected code: data_designer/config/utils/visualization.py, lines 200–206

Description:
The display_sample_record function builds the "Generated Columns" table by querying a hardcoded list of built-in column types:

non_code_columns = (
    config_builder.get_columns_of_type(DataDesignerColumnType.SAMPLER)
    + config_builder.get_columns_of_type(DataDesignerColumnType.EXPRESSION)
    + config_builder.get_columns_of_type(DataDesignerColumnType.LLM_TEXT)
    + config_builder.get_columns_of_type(DataDesignerColumnType.LLM_STRUCTURED)
    + config_builder.get_columns_of_type(DataDesignerColumnType.EMBEDDING)
    + config_builder.get_columns_of_type(DataDesignerColumnType.CUSTOM)
)

Plugin column types (registered via PluginManager / entry points) are not included in this list. As a result, any column generated by a plugin is silently skipped in the rich display output, even though the data is present in the underlying DataFrame.

Inconsistency: The codebase already accounts for plugin column types elsewhere. In data_designer/config/column_types.py, get_column_display_order() correctly appends plugin types:

def get_column_display_order() -> list[DataDesignerColumnType]:
    display_order = [
        DataDesignerColumnType.SEED_DATASET,
        DataDesignerColumnType.SAMPLER,
        # ... built-in types ...
        DataDesignerColumnType.CUSTOM,
    ]
    display_order.extend(plugin_manager.get_plugin_column_types(DataDesignerColumnType))
    return display_order

But display_sample_record in visualization.py does not use get_column_display_order() — it maintains its own separate, incomplete list.

Suggested fix: Replace the hardcoded list in display_sample_record with a call to get_column_display_order() (filtering out SEED_DATASET), or at minimum append plugin_manager.get_plugin_column_types(DataDesignerColumnType) to the non_code_columns query. Something like:

from data_designer.config.column_types import get_column_display_order, DataDesignerColumnType

display_types = [
    t for t in get_column_display_order()
    if t not in (DataDesignerColumnType.SEED_DATASET,
                 DataDesignerColumnType.LLM_CODE,
                 DataDesignerColumnType.VALIDATION,
                 DataDesignerColumnType.LLM_JUDGE)
]
non_code_columns = []
for col_type in display_types:
    non_code_columns.extend(config_builder.get_columns_of_type(col_type))

Reproduction: Register any plugin column type via entry points, add it to a config builder, run preview(), and call preview_result.display_sample_record(). The plugin column will be absent from the rich output despite being present in preview_result.dataset.

Steps/Code to reproduce bug

Register any plugin column type via entry points, add it to a config builder, run preview(), and call preview_result.display_sample_record(). The plugin column will be absent from the rich output despite being present in preview_result.dataset.

Expected behavior

plugin-generated columns should show up in a sample record

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions