TK ComfyUI ImageVL

A powerful set of ComfyUI custom nodes designed for batch image processing and advanced Vision-Language Model (VLM) interrogation using the Qwen-VL family (Qwen2-VL, Qwen2.5-VL, Qwen3-VL).

Features

Batch Image Loading & Renaming: Automatically read all images from a folder, rename them sequentially (e.g., image_1.png, image_2.png), and save them to a specified output directory.
Multi-Model Support:
- Qwen-VL: Seamless support for Qwen2-VL, Qwen2.5-VL, and Qwen3-VL models.
- JoyCaption: Integration with fancyfeast/llama-joycaption for high-quality, natural language captions or Stable Diffusion tags.
Flexible Workflow Modes:
- Batch Processing: Process entire directories of images with auto-saving.
- Single Image: New independent nodes (Single) to process individual images directly from your workflow.
Advanced Generation Control:
- Adjustable Max New Tokens, Resolution Control, Temperature, and Seed.
- JoyCaption specific: Control caption type ('Descriptive', 'SD Prompt'), length, and tone.
Image Resizing Options:
- Pixel Resize: Scale based on the longest edge.
- Megapixel (MP) Resize: Scale to a target total pixel count (e.g., 1.0MP).
- UI Protection: Smart mutually exclusive toggles in the interface.
Auto-Saving: Automatically saves the generated captions as .txt files matching the image filenames.

Installation

Clone this repository into your ComfyUI custom_nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/tackcrypto1031/tk_comfyui_imageVL.git

Install the required dependencies:
```
cd tk_comfyui_imageVL
pip install -r requirements.txt
```
Note: This requires transformers, qwen_vl_utils, accelerate, and huggingface_hub.
Restart ComfyUI.

Usage

1. Batch Image Loader (TK_BatchImageLoader)

This node handles the input images.

source_path: Directory containing your original images.
output_path: Directory where renamed images will be saved.
filename_prefix: Prefix for the renamed files (default: image_).
resize_px / img_px: Toggle pixel-based resizing (longest edge).
resize_mp / img_mp: Toggle megapixel-based resizing (total pixels).

2. QwenVL Interrogator (TK_QwenVL_Interrogator)

This node analyzes the images and generates descriptions.

model_id: Select the desired Qwen-VL model from the dropdown (Supports Qwen2, Qwen2.5, Qwen3).
- Models will be automatically downloaded to tk_comfyui_imageVL/models if not present.
prompt: The instruction for the model (e.g., "Describe this image detailedly.").
min_pixels / max_pixels: Control the resolution for the vision encoder.
max_new_tokens: Maximum length of the generated text.
temperature / seed: Generation parameters.

3. TK QwenVL Interrogator (Single)

Process a single image passed directly from another node (IMAGE type).

image: Input image connection.
model_id: Select Qwen-VL model.
prompt: Instruction for the model.
Returns: STRING (generated text).

4. TK JoyCaption Interrogator

Batch processing node for JoyCaption models, designed for natural language captions or SD prompts.

joycaption_model: Select model (e.g., fancyfeast/llama-joycaption-beta-one-hf-llava).
caption_type:
- Descriptive: Formal, natural language description.
- Stable Diffusion Prompt: Tag-based format with quality boosters.
caption_length: constrain the output length (very, short - very long).
user_prompt: Override the internal system prompt with your own instruction.
cache_model: Keep model loaded (recommended for batch).
resize_px / img_px: Toggle pixel-based resizing.
resize_mp / img_mp: Toggle megapixel-based resizing.
enable_captioning: Toggle to enable/disable caption generation. If disabled, the node acts as a resize-only batch loader.

5. TK JoyCaption Interrogator (Single)

Single image version of JoyCaption.

image: Input image connection.
joycaption_model: Select model.
caption_type / caption_length: Format controls.
Returns: STRING (generated text).

6. Text Saver (TK_TextSaver)

This node is kept for workflow compatibility. Text saving is now handled automatically by the Interrogator nodes (TK QwenVL Interrogator / TK JoyCaption Interrogator), which save a .txt file alongside the processed image in the output_path.

Workflow Example

Load Images: Connect TK Batch Image Loader to TK QwenVL Interrogator.
Generate: Connect TK QwenVL Interrogator (texts, filenames) to TK Text Saver.
Run: Press "Queue Prompt" to process the entire folder in batch.

Credits

Based on the Qwen-VL architecture. Inspired by various ComfyUI community contributions.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gemini		.gemini
.github/workflows		.github/workflows
__pycache__		__pycache__
workflow		workflow
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
__init__.py		__init__.py
nodes.py		nodes.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tk_node_logic.js		tk_node_logic.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TK ComfyUI ImageVL

Features

Installation

Usage

1. Batch Image Loader (TK_BatchImageLoader)

2. QwenVL Interrogator (TK_QwenVL_Interrogator)

3. TK QwenVL Interrogator (Single)

4. TK JoyCaption Interrogator

5. TK JoyCaption Interrogator (Single)

6. Text Saver (TK_TextSaver)

Workflow Example

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TK ComfyUI ImageVL

Features

Installation

Usage

1. Batch Image Loader (TK_BatchImageLoader)

2. QwenVL Interrogator (TK_QwenVL_Interrogator)

3. TK QwenVL Interrogator (Single)

4. TK JoyCaption Interrogator

5. TK JoyCaption Interrogator (Single)

6. Text Saver (TK_TextSaver)

Workflow Example

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages