Bug
LLMTextColumnConfig.required_columns and ImageColumnConfig.required_columns only extract dependency column names from Jinja2 templates in prompt and system_prompt. They do not include columns referenced by multi_modal_context[*].column_name.
Impact
The async engine (DATA_DESIGNER_ASYNC_ENGINE=1) builds an ExecutionGraph from each column config's required_columns to determine task dependencies and dispatch order. When a seed column is referenced only via multi_modal_context (not in the Jinja2 prompt), the execution graph has no edge connecting the seed column to the LLM column that needs it. The scheduler dispatches the LLM cell task before the seed data has been loaded into the row buffer, causing ImageContext.get_contexts to fail with a KeyError on the missing column name.
The sync engine is unaffected because it processes all from_scratch generators first, populating the entire batch buffer before any cell-by-cell generators run.
Reproduction
Any recipe that uses LLMStructuredColumnConfig (or any LLMTextColumnConfig subclass) with multi_modal_context referencing a seed column will fail under DATA_DESIGNER_ASYNC_ENGINE=1:
Non-retryable failure on <column>[rg=0, row=None]: '<multi_modal_column_name>'
Followed by all records being dropped and a DataDesignerGenerationError.
Fix
Include multi_modal_context column names in required_columns for both LLMTextColumnConfig and ImageColumnConfig:
if self.multi_modal_context:
required_cols.extend(ctx.column_name for ctx in self.multi_modal_context)
Affected files
packages/data-designer-config/src/data_designer/config/column_configs.py
Bug
LLMTextColumnConfig.required_columnsandImageColumnConfig.required_columnsonly extract dependency column names from Jinja2 templates inpromptandsystem_prompt. They do not include columns referenced bymulti_modal_context[*].column_name.Impact
The async engine (
DATA_DESIGNER_ASYNC_ENGINE=1) builds anExecutionGraphfrom each column config'srequired_columnsto determine task dependencies and dispatch order. When a seed column is referenced only viamulti_modal_context(not in the Jinja2 prompt), the execution graph has no edge connecting the seed column to the LLM column that needs it. The scheduler dispatches the LLM cell task before the seed data has been loaded into the row buffer, causingImageContext.get_contextsto fail with aKeyErroron the missing column name.The sync engine is unaffected because it processes all
from_scratchgenerators first, populating the entire batch buffer before any cell-by-cell generators run.Reproduction
Any recipe that uses
LLMStructuredColumnConfig(or anyLLMTextColumnConfigsubclass) withmulti_modal_contextreferencing a seed column will fail underDATA_DESIGNER_ASYNC_ENGINE=1:Followed by all records being dropped and a
DataDesignerGenerationError.Fix
Include
multi_modal_contextcolumn names inrequired_columnsfor bothLLMTextColumnConfigandImageColumnConfig:Affected files
packages/data-designer-config/src/data_designer/config/column_configs.py