A powerful set of ComfyUI custom nodes designed for batch image processing and advanced Vision-Language Model (VLM) interrogation using the Qwen-VL family (Qwen2-VL, Qwen2.5-VL, Qwen3-VL).
- Batch Image Loading & Renaming: Automatically read all images from a folder, rename them sequentially (e.g.,
image_1.png,image_2.png), and save them to a specified output directory. - Multi-Model Support:
- Qwen-VL: Seamless support for Qwen2-VL, Qwen2.5-VL, and Qwen3-VL models.
- JoyCaption: Integration with
fancyfeast/llama-joycaptionfor high-quality, natural language captions or Stable Diffusion tags.
- Flexible Workflow Modes:
- Batch Processing: Process entire directories of images with auto-saving.
- Single Image: New independent nodes (Single) to process individual images directly from your workflow.
- Advanced Generation Control:
- Adjustable Max New Tokens, Resolution Control, Temperature, and Seed.
- JoyCaption specific: Control caption type ('Descriptive', 'SD Prompt'), length, and tone.
- Image Resizing Options:
- Pixel Resize: Scale based on the longest edge.
- Megapixel (MP) Resize: Scale to a target total pixel count (e.g., 1.0MP).
- UI Protection: Smart mutually exclusive toggles in the interface.
- Auto-Saving: Automatically saves the generated captions as
.txtfiles matching the image filenames.
-
Clone this repository into your ComfyUI
custom_nodesdirectory:cd ComfyUI/custom_nodes git clone https://github.com/tackcrypto1031/tk_comfyui_imageVL.git -
Install the required dependencies:
cd tk_comfyui_imageVL pip install -r requirements.txtNote: This requires
transformers,qwen_vl_utils,accelerate, andhuggingface_hub. -
Restart ComfyUI.
This node handles the input images.
- source_path: Directory containing your original images.
- output_path: Directory where renamed images will be saved.
- filename_prefix: Prefix for the renamed files (default:
image_). - resize_px / img_px: Toggle pixel-based resizing (longest edge).
- resize_mp / img_mp: Toggle megapixel-based resizing (total pixels).
This node analyzes the images and generates descriptions.
- model_id: Select the desired Qwen-VL model from the dropdown (Supports Qwen2, Qwen2.5, Qwen3).
- Models will be automatically downloaded to
tk_comfyui_imageVL/modelsif not present.
- Models will be automatically downloaded to
- prompt: The instruction for the model (e.g., "Describe this image detailedly.").
- min_pixels / max_pixels: Control the resolution for the vision encoder.
- max_new_tokens: Maximum length of the generated text.
- temperature / seed: Generation parameters.
Process a single image passed directly from another node (IMAGE type).
- image: Input image connection.
- model_id: Select Qwen-VL model.
- prompt: Instruction for the model.
- Returns: STRING (generated text).
Batch processing node for JoyCaption models, designed for natural language captions or SD prompts.
- joycaption_model: Select model (e.g.,
fancyfeast/llama-joycaption-beta-one-hf-llava). - caption_type:
Descriptive: Formal, natural language description.Stable Diffusion Prompt: Tag-based format with quality boosters.
- caption_length: constrain the output length (very, short - very long).
- user_prompt: Override the internal system prompt with your own instruction.
- cache_model: Keep model loaded (recommended for batch).
- resize_px / img_px: Toggle pixel-based resizing.
- resize_mp / img_mp: Toggle megapixel-based resizing.
- enable_captioning: Toggle to enable/disable caption generation. If disabled, the node acts as a resize-only batch loader.
Single image version of JoyCaption.
- image: Input image connection.
- joycaption_model: Select model.
- caption_type / caption_length: Format controls.
- Returns: STRING (generated text).
This node is kept for workflow compatibility. Text saving is now handled automatically by the Interrogator nodes (TK QwenVL Interrogator / TK JoyCaption Interrogator), which save a .txt file alongside the processed image in the output_path.
- Load Images: Connect
TK Batch Image LoadertoTK QwenVL Interrogator. - Generate: Connect
TK QwenVL Interrogator(texts, filenames) toTK Text Saver. - Run: Press "Queue Prompt" to process the entire folder in batch.
Based on the Qwen-VL architecture. Inspired by various ComfyUI community contributions.
This project is licensed under the MIT License - see the LICENSE file for details.
