Skip to content

feat: add image generation support with multi-modal context#317

Merged
nabinchha merged 72 commits into
mainfrom
nmulepati/feat/125-support-image-generation
Feb 12, 2026
Merged

feat: add image generation support with multi-modal context#317
nabinchha merged 72 commits into
mainfrom
nmulepati/feat/125-support-image-generation

Conversation

@nabinchha

@nabinchha nabinchha commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

📋 Summary

Adds native image generation capabilities to DataDesigner, enabling synthetic image generation using diffusion and auto-regressive image generation models. Supports both standalone image generation and multi-modal context (using previously generated text/images as input), with robust storage management and comprehensive testing.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Configuration Layer                          │
│  ┌──────────────────┐         ┌─────────────────────────────┐ │
│  │ ImageColumnConfig│◄────────│ ImageInferenceParams        │ │
│  │  - model_alias   │         │  - size, format, quality    │ │
│  │  - prompt        │         │  - steps, cfg_scale, seed   │ │
│  │  - context cols  │         │  - n (number of images)     │ │
│  └────────┬─────────┘         └─────────────────────────────┘ │
└───────────┼───────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Engine Layer                               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  ImageCellGenerator (Cell-by-Cell)                       │  │
│  │   1. Render Jinja2 prompt template with record data      │  │
│  │   2. Resolve multi-modal context from previous columns   │  │
│  │   3. Call ModelFacade.generate_image()                   │  │
│  │   4. Save via MediaStorage                               │  │
│  └─────────────────┬────────────────────────────────────────┘  │
│                    │                                            │
│                    ▼                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  ModelFacade.generate_image()                            │  │
│  │   • Auto-detects model type:                             │  │
│  │     - Diffusion models → image_generation API            │  │
│  │     - Autoregressive models → completion API             │  │
│  │   • Returns list[base64_string]                          │  │
│  │   • Tracks usage (images + tokens)                       │  │
│  └─────────────────┬────────────────────────────────────────┘  │
│                    │                                            │
│                    ▼                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  LiteLLM Router                                          │  │
│  │   • image_generation(prompt) - for diffusion models      │  │
│  │   • completion(messages) - for autoregressive models     │  │
│  └─────────────────┬────────────────────────────────────────┘  │
│                    │                                            │
│                    ▼                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  MediaStorage                                            │  │
│  │   • DISK mode: Save to disk, return paths                │  │
│  │   • DATAFRAME mode: Return base64 directly               │  │
│  │   • Validates images, creates UUID filenames             │  │
│  │   • Organizes by column name in subfolders               │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────┐
│              Visualization & Display                            │
│  • Enhanced display_sample_record() with image support          │
└─────────────────────────────────────────────────────────────────┘

Key Design Decisions:

  1. Auto-detection of API type: generate_image() automatically routes to the correct LiteLLM API:

    • Diffusion models (DALL-E, Stable Diffusion, Imagen) → image_generation API
    • Autoregressive models (multi-modal chat models) → completion API
  2. Multi-modal context: Images can reference previously generated columns (text or images) using multi_modal_context for image-to-image generation

  3. Dual storage modes:

    • DISK mode (dataset creation): Saves images to disk, stores relative paths
    • DATAFRAME mode (preview): Stores base64 directly for quick exploration

🔄 Changes

✨ Added

New Files - Core Implementation:

  • image.py (80 lines) - ImageCellGenerator with Jinja2 prompt rendering and multi-modal context resolution
  • media_storage.py (137 lines) - MediaStorage class with DISK/DATAFRAME storage modes
  • image_helpers.py (238 lines) - Base64/PIL conversion, validation, format detection, diffusion model detection

New Files - Documentation & Tests:

Configuration Classes:

  • ImageColumnConfig - Column config with prompt, multi_modal_context, and required_columns (column_configs.py)
  • ImageInferenceParams - Parameters: size, format, quality, steps, cfg_scale, seed, n (models.py)
  • ImageUsageStats - Usage tracking for generated images (usage.py)

🔧 Changed

Model System:

  • facade.py - Added methods:
    • generate_image() - Main entry point with automatic API routing
    • _generate_image_diffusion() - Diffusion model path via image_generation API
    • _generate_image_chat_completion() - Autoregressive model path via completion API
    • _track_token_usage_from_image_diffusion() - Usage tracking

Dataset Building:

Visualization:

  • visualization.py - Enhanced display_sample_record() with image handling:
    • Added _display_image_if_in_notebook() for IPython/Jupyter rendering (~132 lines added)
    • Image table in record display showing base64 previews
    • Automatic image rendering at bottom of record display in notebooks

Configuration & Registry:

  • Registered ImageCellGenerator in column generator registry
  • Added ColumnType.IMAGE enumeration
  • Added lazy import for PIL in lazy_heavy_imports.py

Dependencies:

  • Added pillow for image processing

🗑️ Removed

  • Health checks workflow (unrelated cleanup)
  • Seed dataset documentation (reorganization)

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to:

  1. facade.py:307-470 - Image generation implementation with auto-detection logic and dual API support

  2. media_storage.py - Storage abstraction with dual modes and file organization (UUID + column subfolders)

  3. image.py:62-67 - Image generator with multi-modal context injection

  4. visualization.py:289-418 - Image display integration in display_sample_record()

🚀 Extensibility & Future Work

Extensibility to Other Modalities:

This implementation establishes patterns that extend naturally to other media types:

  • Audio generation: Similar AudioColumnConfig + MediaStorage.save_audio()
  • Video generation: Can reuse image storage patterns with video-specific format handling
  • 3D assets: Storage layer is format-agnostic, adaptable to GLB/USD/FBX

Key extensibility points:

  • ModelFacade - Add generate_audio(), generate_video() following same pattern
  • MediaStorage - Already designed for multiple media types (see comments about future audio/video support)
  • GenerationType enum - Easy to add AUDIO, VIDEO, etc.
  • Column generators - Follow ImageCellGenerator pattern for new modalities

Planned Future Work:

  1. Improve display_sample_record() method - Enhanced notebook display with better layouts, grid views, and interactive controls for image-containing records

  2. Move artifact_storage.py to storage module - Consolidate all storage logic (MediaStorage, ArtifactStorage) under engine/storage/ for better organization (done in chore: move ArtifactStorage to engine/storage/ module #321)

  3. Documentation - Feature currently has no docs except a tutorial notebook. (done in docs: add image generation documentation and image-to-image editing tutorial #319)

✅ Testing

Comprehensive test coverage (800+ lines):

  • Image generation: 218 lines - single/multiple images, context resolution, error handling
  • Media storage: 228 lines - DISK/DATAFRAME modes, validation, cleanup
  • Image helpers: 349 lines - base64/PIL conversion, format detection, validation
  • Model facade: Extended tests for image generation paths (diffusion + chat completion)
  • Usage tracking: Tests for ImageUsageStats integration
  • Integration: Full end-to-end example in tutorial notebook

close #125

🤖 Generated with AI

…eInferenceParameters, EmbeddingInferenceParameters
…lved based on the type of InferenceParameters
@andreatgretel

Copy link
Copy Markdown
Contributor

Curious regarding default models - should we add {nvidia|openai|openrouter}-image aliases?

andreatgretel
andreatgretel previously approved these changes Feb 11, 2026

@andreatgretel andreatgretel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more nits but overall everything looks good! Tried it out locally, tutorial runs fine. Excited about generating images on Data Designer 🖼️

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

45 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@nabinchha

nabinchha commented Feb 11, 2026

Copy link
Copy Markdown
Contributor Author

Curious regarding default models - should we add {nvidia|openai|openrouter}-image aliases?

Yes, may be but perhaps in a different PR! I don't see many options on build.nvidia.com that work with the standard nvidia endpoint ....

andreatgretel
andreatgretel previously approved these changes Feb 12, 2026
Comment thread docs/colab_notebooks/5-generating-images.ipynb
Comment thread docs/notebook_source/5-generating-images.py

# Handle list of images
if isinstance(image_data, list):
previews = []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: feels like we could use some kind of image previews abstraction, where a lot of the below logic can live. can be in the display_sample_record follow up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that + things in visualization.py can probably be broken down

Comment thread packages/data-designer-config/src/data_designer/config/utils/visualization.py Outdated
Comment thread packages/data-designer-config/src/data_designer/config/column_configs.py Outdated
return result


class ImageInferenceParams(BaseInferenceParams):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it somewhat problematic that intellisense will show all the other parameters as well? Wondering if need a more striped doen base class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm weird. Could you share a screenshot? We do want to inherit everything from BaseInferenceParams though

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay – I thought I remember you saying extra_body will be the only supported parameter

Image

@nabinchha nabinchha Feb 12, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's correct. We do want timeout and max_parallel_requests. It's just that any image generation params like size, height, width, etc will need to go into extra_body because they vary per model

Comment on lines +45 to +52
# Validate required columns
missing_columns = list(set(self.config.required_columns) - set(data.keys()))
if len(missing_columns) > 0:
error_msg = (
f"There was an error preparing the Jinja2 expression template. "
f"The following columns {missing_columns} are missing!"
)
raise ValueError(error_msg)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm surprised we haven't centralized this check!

Comment thread packages/data-designer-engine/src/data_designer/engine/models/facade.py Outdated
Comment on lines +1 to +2
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another note to our future selves that agents should always run make update-license-headers instead of generating these. once they are generated, we treat the existing years as the source of truth.

except Exception as e:
raise HuggingFaceHubClientUploadError(f"Failed to upload parquet files: {e}") from e

def _upload_images_folder(self, repo_id: str, images_folder: Path) -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know if we can have the images appear in the dataset viewer on HF? i've seen datasets that do, but not sure how they are formatted

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh that's a good idea. I might open up a follow up PR for this.

@johnnygreco johnnygreco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nabinchha – this is awesome!

@nabinchha nabinchha merged commit 8e2fd32 into main Feb 12, 2026
48 checks passed
@nabinchha nabinchha deleted the nmulepati/feat/125-support-image-generation branch February 12, 2026 21:00
eric-tramel added a commit that referenced this pull request Feb 13, 2026
Add agenerate_image(), _agenerate_image_chat_completion(), and
_agenerate_image_diffusion() async methods mirroring the sync
generate_image() added in #317. The chat completion path uses
acompletion(), the diffusion path uses router.aimage_generation().

Includes 5 new tests covering both paths, error cases, and usage
tracking. Also fixes F821 lint errors for type annotations.

Co-Authored-By: Remi <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native image generation support

3 participants