feat: add image generation support with multi-modal context by nabinchha · Pull Request #317 · NVIDIA-NeMo/DataDesigner

nabinchha · 2026-02-10T02:12:17Z

📋 Summary

Adds native image generation capabilities to DataDesigner, enabling synthetic image generation using diffusion and auto-regressive image generation models. Supports both standalone image generation and multi-modal context (using previously generated text/images as input), with robust storage management and comprehensive testing.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Configuration Layer                          │
│  ┌──────────────────┐         ┌─────────────────────────────┐ │
│  │ ImageColumnConfig│◄────────│ ImageInferenceParams        │ │
│  │  - model_alias   │         │  - size, format, quality    │ │
│  │  - prompt        │         │  - steps, cfg_scale, seed   │ │
│  │  - context cols  │         │  - n (number of images)     │ │
│  └────────┬─────────┘         └─────────────────────────────┘ │
└───────────┼───────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Engine Layer                               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  ImageCellGenerator (Cell-by-Cell)                       │  │
│  │   1. Render Jinja2 prompt template with record data      │  │
│  │   2. Resolve multi-modal context from previous columns   │  │
│  │   3. Call ModelFacade.generate_image()                   │  │
│  │   4. Save via MediaStorage                               │  │
│  └─────────────────┬────────────────────────────────────────┘  │
│                    │                                            │
│                    ▼                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  ModelFacade.generate_image()                            │  │
│  │   • Auto-detects model type:                             │  │
│  │     - Diffusion models → image_generation API            │  │
│  │     - Autoregressive models → completion API             │  │
│  │   • Returns list[base64_string]                          │  │
│  │   • Tracks usage (images + tokens)                       │  │
│  └─────────────────┬────────────────────────────────────────┘  │
│                    │                                            │
│                    ▼                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  LiteLLM Router                                          │  │
│  │   • image_generation(prompt) - for diffusion models      │  │
│  │   • completion(messages) - for autoregressive models     │  │
│  └─────────────────┬────────────────────────────────────────┘  │
│                    │                                            │
│                    ▼                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  MediaStorage                                            │  │
│  │   • DISK mode: Save to disk, return paths                │  │
│  │   • DATAFRAME mode: Return base64 directly               │  │
│  │   • Validates images, creates UUID filenames             │  │
│  │   • Organizes by column name in subfolders               │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────┐
│              Visualization & Display                            │
│  • Enhanced display_sample_record() with image support          │
└─────────────────────────────────────────────────────────────────┘

Key Design Decisions:

Auto-detection of API type: generate_image() automatically routes to the correct LiteLLM API:
- Diffusion models (DALL-E, Stable Diffusion, Imagen) → image_generation API
- Autoregressive models (multi-modal chat models) → completion API
Multi-modal context: Images can reference previously generated columns (text or images) using multi_modal_context for image-to-image generation
Dual storage modes:
- DISK mode (dataset creation): Saves images to disk, stores relative paths
- DATAFRAME mode (preview): Stores base64 directly for quick exploration

🔄 Changes

✨ Added

New Files - Core Implementation:

image.py (80 lines) - ImageCellGenerator with Jinja2 prompt rendering and multi-modal context resolution
media_storage.py (137 lines) - MediaStorage class with DISK/DATAFRAME storage modes
image_helpers.py (238 lines) - Base64/PIL conversion, validation, format detection, diffusion model detection

New Files - Documentation & Tests:

5-generating-images.py (296 lines) - Complete tutorial with examples
test_image.py (218 lines) - Image generator tests
test_media_storage.py (228 lines) - Storage tests
test_image_helpers.py (349 lines) - Utility tests

Configuration Classes:

ImageColumnConfig - Column config with prompt, multi_modal_context, and required_columns (column_configs.py)
ImageInferenceParams - Parameters: size, format, quality, steps, cfg_scale, seed, n (models.py)
ImageUsageStats - Usage tracking for generated images (usage.py)

🔧 Changed

Model System:

facade.py - Added methods:
- generate_image() - Main entry point with automatic API routing
- _generate_image_diffusion() - Diffusion model path via image_generation API
- _generate_image_chat_completion() - Autoregressive model path via completion API
- _track_token_usage_from_image_diffusion() - Usage tracking

Dataset Building:

column_wise_builder.py - Integrated MediaStorage for image artifact management
artifact_storage.py - Added media_storage attribute

Visualization:

visualization.py - Enhanced display_sample_record() with image handling:
- Added _display_image_if_in_notebook() for IPython/Jupyter rendering (~132 lines added)
- Image table in record display showing base64 previews
- Automatic image rendering at bottom of record display in notebooks

Configuration & Registry:

Registered ImageCellGenerator in column generator registry
Added ColumnType.IMAGE enumeration
Added lazy import for PIL in lazy_heavy_imports.py

Dependencies:

Added pillow for image processing

🗑️ Removed

Health checks workflow (unrelated cleanup)
Seed dataset documentation (reorganization)

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to:

facade.py:307-470 - Image generation implementation with auto-detection logic and dual API support
media_storage.py - Storage abstraction with dual modes and file organization (UUID + column subfolders)
image.py:62-67 - Image generator with multi-modal context injection
visualization.py:289-418 - Image display integration in display_sample_record()

🚀 Extensibility & Future Work

Extensibility to Other Modalities:

This implementation establishes patterns that extend naturally to other media types:

Audio generation: Similar AudioColumnConfig + MediaStorage.save_audio()
Video generation: Can reuse image storage patterns with video-specific format handling
3D assets: Storage layer is format-agnostic, adaptable to GLB/USD/FBX

Key extensibility points:

ModelFacade - Add generate_audio(), generate_video() following same pattern
MediaStorage - Already designed for multiple media types (see comments about future audio/video support)
GenerationType enum - Easy to add AUDIO, VIDEO, etc.
Column generators - Follow ImageCellGenerator pattern for new modalities

Planned Future Work:

Improve display_sample_record() method - Enhanced notebook display with better layouts, grid views, and interactive controls for image-containing records
Move artifact_storage.py to storage module - Consolidate all storage logic (MediaStorage, ArtifactStorage) under engine/storage/ for better organization (done in chore: move ArtifactStorage to engine/storage/ module #321)
Documentation - Feature currently has no docs except a tutorial notebook. (done in docs: add image generation documentation and image-to-image editing tutorial #319)

✅ Testing

Comprehensive test coverage (800+ lines):

Image generation: 218 lines - single/multiple images, context resolution, error handling
Media storage: 228 lines - DISK/DATAFRAME modes, validation, cleanup
Image helpers: 349 lines - base64/PIL conversion, format detection, validation
Model facade: Extended tests for image generation paths (diffusion + chat completion)
Usage tracking: Tests for ImageUsageStats integration
Integration: Full end-to-end example in tutorial notebook

close #125

🤖 Generated with AI

…eInferenceParameters, EmbeddingInferenceParameters

…th BaseInferenceParameters

…lved based on the type of InferenceParameters

…eneration

andreatgretel · 2026-02-11T22:35:36Z

Curious regarding default models - should we add {nvidia|openai|openrouter}-image aliases?

andreatgretel

Left a few more nits but overall everything looks good! Tried it out locally, tutorial runs fine. Excited about generating images on Data Designer 🖼️

greptile-apps

_{45 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

nabinchha · 2026-02-11T22:50:15Z

Curious regarding default models - should we add {nvidia|openai|openrouter}-image aliases?

Yes, may be but perhaps in a different PR! I don't see many options on build.nvidia.com that work with the standard nvidia endpoint ....

johnnygreco · 2026-02-12T18:19:04Z

+
+            # Handle list of images
+            if isinstance(image_data, list):
+                previews = []


nit: feels like we could use some kind of image previews abstraction, where a lot of the below logic can live. can be in the display_sample_record follow up

Yes that + things in visualization.py can probably be broken down

johnnygreco · 2026-02-12T18:24:55Z

        return result


+class ImageInferenceParams(BaseInferenceParams):


Is it somewhat problematic that intellisense will show all the other parameters as well? Wondering if need a more striped doen base class.

Hmm weird. Could you share a screenshot? We do want to inherit everything from BaseInferenceParams though

Oh, okay – I thought I remember you saying extra_body will be the only supported parameter

Oh, that's correct. We do want timeout and max_parallel_requests. It's just that any image generation params like size, height, width, etc will need to go into extra_body because they vary per model

johnnygreco · 2026-02-12T18:37:35Z

+        # Validate required columns
+        missing_columns = list(set(self.config.required_columns) - set(data.keys()))
+        if len(missing_columns) > 0:
+            error_msg = (
+                f"There was an error preparing the Jinja2 expression template. "
+                f"The following columns {missing_columns} are missing!"
+            )
+            raise ValueError(error_msg)


i'm surprised we haven't centralized this check!

johnnygreco · 2026-02-12T18:51:26Z

+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0


another note to our future selves that agents should always run make update-license-headers instead of generating these. once they are generated, we treat the existing years as the source of truth.

johnnygreco · 2026-02-12T18:54:18Z

        except Exception as e:
            raise HuggingFaceHubClientUploadError(f"Failed to upload parquet files: {e}") from e

+    def _upload_images_folder(self, repo_id: str, images_folder: Path) -> None:


do you know if we can have the images appear in the dataset viewer on HF? i've seen datasets that do, but not sure how they are formatted

ohhh that's a good idea. I might open up a follow up PR for this.

johnnygreco

Thanks @nabinchha – this is awesome!

Add agenerate_image(), _agenerate_image_chat_completion(), and _agenerate_image_diffusion() async methods mirroring the sync generate_image() added in #317. The chat completion path uses acompletion(), the diffusion path uses router.aimage_generation(). Includes 5 new tests covering both paths, error cases, and usage tracking. Also fixes F821 lint errors for type annotations. Co-Authored-By: Remi <noreply@anthropic.com>

nabinchha added 30 commits November 25, 2025 12:16

Add generation type to ModelConfig

dc041f7

pass tests

0d6b830

added generate_text_embeddings

254fd8a

tests

1126ea1

remove sensitive=True old artifact no longer needed

744bc8f

Slight refactor

b913f8d

slight refactor

052db7a

Added embedding generator

5504c8d

chunk_separator -> chunk_pattern

4b6f877

update tests

04fc0f3

rename for consistency

26d6da1

Restructure InferenceParameters -> CompletionInferenceParameters, Bas…

6facbd2

…eInferenceParameters, EmbeddingInferenceParameters

Remove purpose from consolidated kwargs

2c1b267

WithModelConfiguration.inference_parameters should should be typed wi…

4b1492b

…th BaseInferenceParameters

Type as WithModelGeneration

c445caf

Add image generation modality

4b8aa2b

update return type for generate_kwargs

2c5933f

make generation_type a field of ModelConfig as opposed to a prop reso…

c6c29d4

…lved based on the type of InferenceParameters

remove regex based chunking from embedding generator

06a724b

Merge branch 'main' into nmulepati/feat/support-embedding-and-image-g…

6b9733f

…eneration

Merge branch 'main' into nmulepati/feat/support-embedding-and-image-g…

81949e6

…eneration

save progress

f291033

Merge branch 'main' into nmulepati/feat/125-support-image-generation

e0a4657

Simplify to ImageInferenceParams. Persist images in create mode to disk

1506ab5

support generation of multiple images

ed9787b

clean up visualization

7dea87a

clean up some util methods + add tests

31cc24e

Streamline integration for image generation

0f07f7b

streamline generation

2aae6cc

track images generated in usage

1677f06

andreatgretel previously approved these changes Feb 11, 2026

View reviewed changes

address pr feedback from andre

87dcab1

nabinchha dismissed andreatgretel’s stale review via 87dcab1 February 11, 2026 22:46

greptile-apps Bot reviewed Feb 11, 2026

View reviewed changes

Comment thread packages/data-designer-config/src/data_designer/config/utils/image_helpers.py

Merge branch 'main' into nmulepati/feat/125-support-image-generation

12cc1fe

nabinchha requested a review from andreatgretel February 11, 2026 23:11

andreatgretel previously approved these changes Feb 12, 2026

View reviewed changes

Merge branch 'main' into nmulepati/feat/125-support-image-generation

a0ea92b

nabinchha dismissed andreatgretel’s stale review via a0ea92b February 12, 2026 17:54