twat-genai is a powerful Python package that provides a unified command-line interface (CLI) and Python API for a variety of generative AI image tasks. It simplifies interaction with different AI models and services, currently focusing on the FAL (fal.ai) platform. This tool is part of the TWAT (Twardoch Utility Tools) collection.
Whether you're a developer, artist, or researcher, twat-genai empowers you to programmatically generate and manipulate images, automate creative workflows, and experiment with cutting-edge AI models.
- Unified Interface: Access Text-to-Image, Image-to-Image, ControlNet-like operations (Canny edges, Depth maps), Image Upscaling, and Image Outpainting through a consistent CLI and API.
- FAL.ai Integration: Leverages the fal.ai platform for model hosting and execution.
- Flexible Input: Use text prompts with Midjourney-style syntax (permutations, multi-prompts), and provide input images via URLs, local file paths, or PIL Image objects.
- LoRA Support: Easily apply LoRA (Low-Rank Adaptation) models from a predefined library or by specifying URLs.
- Comprehensive Configuration: Fine-tune generation parameters using Pydantic-based configuration objects.
- Standardized Output: Receive results in a consistent
ImageResultformat, including metadata and paths to generated files. - Extensible Design: Built with modularity in mind, allowing for future expansion to other AI engines and models.
- Developers: Integrate generative AI capabilities into your Python applications.
- Artists & Designers: Experiment with AI-powered image creation and manipulation, and automate parts of your creative workflow.
- Researchers: Conduct experiments and batch process image generation tasks.
- CLI Users: Quickly generate images or perform image operations directly from your terminal.
- Simplicity: Abstracts away the complexities of individual AI model APIs.
- Consistency: Provides a standardized way to interact with different types of generative models.
- Automation: Enables scripting and automation of image generation tasks.
- Reproducibility: Saves metadata with generation parameters for better tracking.
- Power & Flexibility: Offers fine-grained control over the generation process.
- Python 3.10 or higher.
uv(recommended for faster installation and environment management):- Install
uv:pip install uvorcurl -LsSf https://astral.sh/uv/install.sh | sh.
- Install
git(for cloning the repository).
-
Clone the Repository:
git clone https://github.com/twardoch/twat-genai.git cd twat-genai -
Set Up Virtual Environment and Install:
# Create a virtual environment (recommended) uv venv source .venv/bin/activate # On Linux/macOS # .venv\Scripts\activate # On Windows # Install the package in editable mode with all dependencies uv pip install -e ".[all]"
-
Set FAL API Key: You need an API key from fal.ai.
- Set it as an environment variable:
export FAL_KEY="your-fal-api-key"
- Or, create a
.envfile in the project root (twat-genai/) with the following content:FAL_KEY="your-fal-api-key"
- Set it as an environment variable:
The main command is twat-genai. You can explore commands and options with twat-genai --help.
--prompts "your prompt": The text prompt for generation.- Multiple prompts (semicolon separated):
"a cat; a dog" - Permutations:
"a {red,blue} car"(generates "a red car" and "a blue car")
- Multiple prompts (semicolon separated):
--output_dir <path>: Directory to save generated images (default:generated_images).--model <model_type>: Specifies the operation (e.g.,text,image,canny,upscale,outpaint).--input_image <path_or_url>: Path or URL to an input image (forimage,canny,depth,upscale,outpaint).--image_size <preset_or_WxH>: Output image size (e.g.,SQ,HD,1024,768). Default:SQ(1024x1024).--lora "<lora_name_or_url:scale>": Apply a LoRA.- From library:
--lora "gstdrw style" - URL with scale:
--lora "https://huggingface.co/path/to/lora:0.7" - Multiple LoRAs:
--lora "name1:0.5; url2:0.8"
- From library:
--verbose: Enable detailed logging.--filename_prefix <prefix>: Prepend text to output filenames.--negative_prompt "text": Specify what to avoid in the image.
1. Text-to-Image (TTI):
# Basic TTI
twat-genai text --prompts "A futuristic cityscape at sunset, neon lights, cinematic"
# TTI with specific size and multiple prompts
twat-genai text --prompts "photo of a majestic lion; illustration of a mythical phoenix" --image_size HD
# TTI with a LoRA
twat-genai text --prompts "portrait of a warrior" --lora "shou_xin:0.8" --output_dir my_portraits2. Image-to-Image (I2I):
twat-genai image --input_image path/to/my_photo.jpg --prompts "transform into a vibrant oil painting" --strength 0.653. ControlNet-like Operations (Canny Edge):
twat-genai canny --input_image path/to/sketch.png --prompts "detailed spaceship based on the sketch, metallic texture"(Depth map generation is similar: twat-genai depth ...)
4. Image Upscaling:
# Upscale using ESRGAN (general purpose)
twat-genai upscale --input_image path/to/low_res_image.png --tool esrgan --output_dir upscaled_images
# Upscale using Ideogram (requires prompt, good for creative upscaling)
twat-genai upscale --input_image path/to/artwork.jpg --tool ideogram --prompts "enhance details, painterly style" --scale 2
# Upscale using Clarity (photo enhancement)
twat-genai upscale --input_image path/to/photo.jpg --tool clarity --prompts "ultra realistic photo, sharp details"Supported tools for --tool: aura_sr, ccsr, clarity, drct, esrgan, ideogram, recraft_clarity, recraft_creative.
5. Image Outpainting:
# Outpaint using Bria (default)
twat-genai outpaint --input_image path/to/center_image.png \
--prompts "expand the scene with a lush forest on the left and a serene lake on the right" \
--target_width 2048 --target_height 1024
# Outpaint using Flux (alternative, may require different prompting)
twat-genai outpaint --input_image path/to/center_image.png --tool flux \
--prompts "fantasy landscape expanding outwards" \
--target_width 1920 --target_height 1080The Python API offers more flexibility for integration into your projects.
import asyncio
from pathlib import Path
from twat_genai import (
FALEngine,
ModelTypes,
EngineConfig,
ImageInput, # Base ImageInput
FALImageInput, # For FAL-specific interactions if needed, usually handled by FALEngine
ImageToImageConfig,
UpscaleConfig,
OutpaintConfig,
)
async def main():
output_dir = Path("api_generated_images")
output_dir.mkdir(exist_ok=True)
# Ensure FAL_KEY is set in your environment or .env file
async with FALEngine(output_dir=output_dir) as engine:
# --- 1. Text-to-Image ---
print("Running Text-to-Image...")
base_config_tti = EngineConfig(image_size="HD", num_inference_steps=30)
result_tti = await engine.generate(
prompt="A stunning fantasy castle on a floating island, hyperrealistic",
config=base_config_tti,
model=ModelTypes.TEXT, # Specify Text-to-Image model
filename_prefix="fantasy_castle",
lora_spec="shou_xin:0.5" # Example LoRA
)
if result_tti and result_tti.image_info.get("path"):
print(f"TTI image saved to: {result_tti.image_info['path']}")
print(f"TTI metadata: {result_tti.image_info.get('metadata_path')}")
else:
print(f"TTI generation failed or image path not found. Result: {result_tti}")
# --- 2. Image-to-Image ---
print("\nRunning Image-to-Image...")
# Replace with your image URL or local path
input_image_i2i_url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
# For local path: input_img_i2i = ImageInput(path=Path("path/to/your/image.jpg"))
base_config_i2i = EngineConfig(num_inference_steps=25)
# FALImageInput will be created internally by FALEngine from ImageInput
i2i_specific_config = ImageToImageConfig(
input_image=ImageInput(url=input_image_i2i_url), # Provide base ImageInput
strength=0.7,
negative_prompt="blurry, low quality",
model_type=ModelTypes.IMAGE # Redundant if model passed to generate()
)
result_i2i = await engine.generate(
prompt="Convert this to a cyberpunk city scene",
config=base_config_i2i,
model=ModelTypes.IMAGE, # Specify Image-to-Image model
image_config=i2i_specific_config, # Pass the I2I specific config
filename_prefix="cyberpunk_city"
)
if result_i2i and result_i2i.image_info.get("path"):
print(f"I2I image saved to: {result_i2i.image_info['path']}")
else:
print(f"I2I generation failed or image path not found. Result: {result_i2i}")
# --- 3. Upscale ---
print("\nRunning Upscale...")
# Replace with your image URL or local path
input_image_upscale_url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/mountains-input.png" # Example low-res
# Base config might be ignored or partially used by upscalers
base_config_upscale = EngineConfig()
upscale_specific_config = UpscaleConfig(
input_image=ImageInput(url=input_image_upscale_url),
prompt="Enhance details, sharp focus, high resolution", # For Ideogram, Clarity etc.
# scale=4 # General scale, some tools have specific scale params like ccsr_scale
# Example for esrgan:
# esrgan_model="RealESRGAN_x4plus",
# Example for clarity:
clarity_creativity=0.5
)
result_upscale = await engine.generate(
prompt="Enhance details, sharp focus, high resolution", # Prompt for context
config=base_config_upscale,
# Choose a specific upscaler model from ModelTypes
model=ModelTypes.UPSCALER_CLARITY,
upscale_config=upscale_specific_config,
filename_prefix="upscaled_image_clarity"
)
if result_upscale and result_upscale.image_info.get("path"):
print(f"Upscaled image saved to: {result_upscale.image_info['path']}")
else:
print(f"Upscale failed or image path not found. Result: {result_upscale}")
# --- 4. Outpaint ---
print("\nRunning Outpaint...")
# Replace with your image URL or local path
input_image_outpaint_url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/mountains-input.png"
base_config_outpaint = EngineConfig()
outpaint_specific_config = OutpaintConfig(
input_image=ImageInput(url=input_image_outpaint_url),
prompt="Expand the mountain scene with a vast, starry night sky above and misty valleys below.",
target_width=1536,
target_height=1024,
outpaint_tool="bria", # or "flux"
# For Bria, you can enable GenFill post-processing via border_thickness_factor > 0
border_thickness_factor=0.05 # Example: 5% of min dimension for border
)
result_outpaint = await engine.generate(
prompt=outpaint_specific_config.prompt, # Pass prompt from config or directly
config=base_config_outpaint,
model=ModelTypes.OUTPAINT_BRIA, # Or ModelTypes.OUTPAINT_FLUX
outpaint_config=outpaint_specific_config,
filename_prefix="outpainted_scene_bria"
)
if result_outpaint and result_outpaint.image_info.get("path"):
print(f"Outpainted image saved to: {result_outpaint.image_info['path']}")
else:
print(f"Outpaint failed or image path not found. Result: {result_outpaint}")
if __name__ == "__main__":
import os
if not os.getenv("FAL_KEY"):
print("Error: FAL_KEY environment variable not set.")
print("Please set your FAL API key, e.g., export FAL_KEY='your-key'")
else:
asyncio.run(main())This section provides a deeper dive into the architecture, workflow, and contribution guidelines for twat-genai.
twat-genai is designed with a layered architecture to promote modularity and extensibility:
-
Core (
src/twat_genai/core):- Purpose: Provides fundamental building blocks, data structures, and utilities that are independent of any specific AI generation engine.
- Key Modules:
config.py: Defines core Pydantic models likeImageInput(for representing input images via URL, path, or PIL object),ImageResult(standardized output structure),ImageSizeWH, and various type aliases.image.py: ContainsImageSizes(enum for presets like SQ, HD),ImageFormats(enum for JPG, PNG), and image saving utilities.image_utils.py: Offers asynchronous utilities for image downloading (download_image_to_temp,download_image), resizing based on model constraints (resize_image_if_needed), and mask creation for outpainting/inpainting (create_outpaint_mask,create_flux_inpainting_assets,create_genfill_border_mask).lora.py: Defines Pydantic models for LoRA configurations (LoraRecord,LoraLib,LoraSpecEntry).prompt.py: Implements parsing and normalization for Midjourney-style prompts, including permutation expansion ({alt1,alt2}), multi-prompts (::), and parameter handling.
-
Engines (
src/twat_genai/engines):- Purpose: Implements the logic for interacting with specific AI platforms or model families.
base.py:- Defines the
ImageGenerationEngineabstract base class (ABC), which specifies the common interface (initialize,generate,shutdown) for all engines. - Defines the base
EngineConfigPydantic model (common parameters likeguidance_scale,num_inference_steps,image_size).
- Defines the
- FAL Engine (
src/twat_genai/engines/fal):- Purpose: Concrete implementation for the fal.ai platform.
__init__.py(FALEngine):- The main orchestrator for FAL operations. Inherits from
ImageGenerationEngine. - Initializes
FalApiClientand checks forFAL_KEY. - Handles input image preparation: downloads URLs (via
core.image_utils), resizes images for upscalers if needed (viacore.image_utils), and usesFALImageInput.to_url()(which callsFalApiClient.upload_image()) to get a FAL-usable URL for local/PIL images. Manages temporary file cleanup. - The
generate()method determines the operation type (TTI, I2I, Upscale, Outpaint, etc.) based on themodel(aModelTypesenum value) and routes to the appropriateFalApiClientmethod. - Manages GenFill post-processing for Bria outpainting and asset creation for Flux outpainting.
- The main orchestrator for FAL operations. Inherits from
client.py(FalApiClient):- Responsible for all direct interactions with the FAL API using the
fal-clientlibrary. - Provides
upload_image()usingfal_client.upload_file_async(). - Contains specific methods for each operation:
process_tti(),process_i2i(),process_canny(),process_depth(),process_upscale(),process_outpaint(),process_genfill(). - These methods submit jobs via
_submit_fal_job()(which usesfal_client.submit_async()). - Result retrieval and processing occur in
_get_fal_result(), which polls usingfal_client.status_async()and fetches withfal_client.result_async(). - Parses results using
_extract_generic_image_info()to standardize output into anImageResultobject. - Downloads the final image using
_download_image_helper()(which useshttpx).
- Responsible for all direct interactions with the FAL API using the
config.py: Defines FAL-specific Pydantic models used as schemas for API arguments.ModelTypes(enum): Maps user-friendly model names/operations to specific FAL API endpoint strings.ImageToImageConfig,UpscaleConfig,OutpaintConfig: Pydantic models for operation-specific parameters, often including anImageInput.UPSCALE_TOOL_MAX_INPUT_SIZES: A dictionary defining maximum input dimensions for various upscaler models.
lora.py: Contains FAL-specific LoRA handling.get_lora_lib(): Loads LoRA definitions from JSON (fromTWAT_GENAI_LORA_LIBenv var,twat-osmanaged path, or the bundled__main___loras.json).parse_lora_phrase(): Parses individual LoRA strings (library keys orurl:scalesyntax).normalize_lora_spec(): Converts various input LoRA spec formats into a list ofLoraSpecEntryorCombinedLoraSpecEntry.build_lora_arguments(): Asynchronously prepares the final LoRA argument list (e.g.,[{ "path": "url", "scale": 0.7 }]) and augments the text prompt for the FAL API.
models.py(FALImageInput):- Subclasses
core.config.ImageInput. - Its
to_url()async method converts local file paths or PIL Image objects into FAL-usable URLs by uploading them viafal_client.upload_file_async().
- Subclasses
-
CLI (
src/twat_genai/cli.py):- Purpose: Provides the command-line interface.
- Uses the
python-firelibrary to expose methods of theTwatGenAiCLIclass as subcommands. - Parses CLI arguments, maps them to
ModelTypes, and constructs the necessary configuration objects (EngineConfig,ImageToImageConfig, etc.). - Handles
input_imageparsing intocore.config.ImageInput. - Each command method (e.g.,
text,image,upscale) calls a shared_run_generation()async helper, which instantiates and runs theFALEngine.asyncio.run()is used within the top-level CLI methods that need to call async code.
-
Entry Point (
src/twat_genai/__main__.py):- A minimal script that enables the package to be run as a module (
python -m twat_genai). It importsTwatGenAiCLIand usesfire.Fire(TwatGenAiCLI).
- A minimal script that enables the package to be run as a module (
-
Default LoRA Library (
src/twat_genai/__main___loras.json):- A JSON file containing predefined LoRA shortcuts (keywords mapping to LoRA URLs and default scales).
Here's a simplified flow of an Image-to-Image request:
- User Invocation:
- CLI:
twat-genai image --input_image my_image.jpg --prompts "make it vintage" --strength 0.6 - API:
await engine.generate(prompt="make it vintage", model=ModelTypes.IMAGE, image_config=i2i_cfg_obj, ...)
- CLI:
- Argument Parsing (CLI):
TwatGenAiCLIparses arguments.input_imagebecomes anImageInputobject.strength,prompts, etc., are collected. - Configuration Setup:
- An
EngineConfigis created with general settings (e.g.,image_sizeif specified). - An
ImageToImageConfigis created, holding theImageInputobject and I2I-specific parameters likestrengthandnegative_prompt.
- An
- Engine Execution:
FALEngineinstance is created (if not already, e.g., viaasync with).initialize()is called, setting up theFalApiClient.FALEngine.generate()is called with the prompt, baseEngineConfig,model=ModelTypes.IMAGE, and theImageToImageConfig.
- Input Preparation (
FALEngine._prepare_image_input):- The
ImageInputfromImageToImageConfigis processed. - If it's a local path (e.g.,
my_image.jpg) or a PIL Image,FALImageInput.to_url()is effectively called. FALImageInput.to_url()usesfal_client.upload_file_async()to upload the image to FAL's temporary storage, returning a URL.- If the input is already a URL, it might be used directly or downloaded/re-uploaded if processing (like resizing for upscalers, though not typical for I2I) is needed. Original image dimensions are determined.
- The
- Prompt & LoRA Processing:
- The input prompt is normalized (e.g., expanding permutations) by
core.prompt.normalize_prompts(). - If a
lora_specis provided,engines.fal.lora.build_lora_arguments()parses it, resolves library keys, and prepares a list of LoRA dictionaries for the API, potentially modifying the prompt string to include LoRA trigger words.
- The input prompt is normalized (e.g., expanding permutations) by
- API Client Interaction (
FalApiClient.process_i2i):- The
FalApiClient.process_i2i()method is invoked with the prompt, the FAL-usable image URL, LoRA arguments, and other parameters. - It assembles the final dictionary of arguments for the FAL API endpoint (
fal-ai/flux-lora/image-to-image). _submit_fal_job()is called, which usesfal_client.submit_async()to send the request to FAL. A request ID is returned.
- The
- Result Handling (
FalApiClient._get_fal_result):- The client polls FAL for the job status using
fal_client.status_async(request_id)until completion. - The final result JSON is fetched using
fal_client.result_async(request_id). _extract_generic_image_info()parses this JSON to find the URL(s) of the generated image(s) and other metadata like seed, dimensions.- If
output_diris specified,_download_image_helper()downloads the generated image(s) usinghttpxand saves them. - An
ImageResultPydantic model is populated with the request ID, timestamp, raw API result, parsed image information (including local path if saved), original prompt, and job parameters. Metadata is saved to a JSON file alongside the image.
- The client polls FAL for the job status using
- Return Value: The
ImageResultobject is returned to the caller (FALEngine.generate(), then to the CLI method or API user). - CLI Output: The CLI prints a summary of the result, including the path to the saved image.
core.config.ImageInput: Represents an input image (URL, local path, or PIL Image).engines.fal.models.FALImageInput: Subclass that handles uploading local/PIL images to FAL.
core.config.ImageResult: Standardized structure for returning generation results, including metadata, image path, and raw API response.engines.base.EngineConfig: Base configuration for all engines (e.g.,guidance_scale,num_inference_steps,image_size).engines.fal.config.ModelTypes: Enum mapping operation types to FAL API endpoint strings (e.g.,TEXT->"fal-ai/flux-lora").engines.fal.config.ImageToImageConfig: Parameters for I2I, Canny, Depth (e.g.,input_image,strength).engines.fal.config.UpscaleConfig: Parameters for various upscaling tools (e.g.,input_image,scale, tool-specific options likeesrgan_model).engines.fal.config.OutpaintConfig: Parameters for outpainting (e.g.,input_image,prompt,target_width,target_height,outpaint_tool).core.lora.LoraRecord,LoraLib,LoraSpecEntry,CombinedLoraSpecEntry: Define how LoRAs are represented, stored in a library, and specified for use.
- Adding a New AI Engine:
- Create a new module under
src/twat_genai/engines/. - Implement a class that inherits from
ImageGenerationEngine(insrc/twat_genai/engines/base.py). - Implement the abstract methods:
initialize(),generate(), andshutdown(). - Define any engine-specific configuration models (similar to
UpscaleConfigfor FAL). - Update the CLI and potentially the main API entry points to allow selection and use of the new engine.
- Create a new module under
- Adding New Models/Operations to the FAL Engine:
- Add a new entry to the
ModelTypesenum insrc/twat_genai/engines/fal/config.pywith the FAL endpoint string. - If the new operation requires unique parameters, create a new Pydantic config model (e.g.,
NewOperationConfig) inengines/fal/config.py. - Add a corresponding processing method in
FalApiClient(e.g.,process_new_operation()). This method will handle argument assembly and job submission. - Update
FALEngine.generate()to handle the newModelTypesvalue, call the newFalApiClientmethod, and pass the appropriate configuration. - Add a new subcommand to
TwatGenAiCLIinsrc/twat_genai/cli.pyfor the new operation.
- Add a new entry to the
We welcome contributions! Please follow these guidelines:
- Code Quality Tools:
- Pre-commit Hooks:
- The project includes a
.pre-commit-config.yaml. - Install hooks with
pre-commit install. This will automatically run checks (like Ruff and MyPy) before each commit.
- The project includes a
- Dependency Management:
- Dependencies are managed using
uvand specified inpyproject.toml. - Install development dependencies with
uv pip install -e ".[all]".
- Dependencies are managed using
- Versioning:
- The project version is dynamically determined from
gittags usinghatch-vcs. - Releases are made by creating a new
gittag (e.g.,v0.2.0).
- The project version is dynamically determined from
- Branching Strategy:
- Develop features in separate branches (e.g.,
feature/my-new-feature,fix/bug-fix). - Submit Pull Requests (PRs) to the
mainbranch for review.
- Develop features in separate branches (e.g.,
- Commit Messages:
- Please follow Conventional Commits guidelines.
- Examples:
feat: Add support for XYZ model,fix: Correct parameter handling in I2I,docs: Update README with API examples.
- Testing:
- Tests are written using
pytestand are located in thetests/directory. - Run tests with
uv run test. - All new features and bug fixes should be accompanied by corresponding tests.
- Ensure good test coverage. Check coverage with
uv run test-cov.
- Tests are written using
- Documentation:
- Keep this
README.mdfile updated with any changes to functionality, API, or CLI. - Write clear and concise docstrings for all public modules, classes, and functions.
- Use type hints extensively.
- Keep this
- Python Version: The project targets Python 3.10 and above.
By following these guidelines, you help maintain the quality and consistency of the twat-genai codebase.