Skip to content

nerveband/image_sense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Image Sense v0.1.1

Image Sense

An AI-powered image analysis and metadata management tool that uses state-of-the-art machine learning models to analyze images and generate rich, structured metadata.

Screenshot

Status: Alpha Release

CURRENTLY IN ALPHA. USE AT YOUR OWN RISK.

  • Version: 0.1.1
  • Release Date: 2024-12-25
  • Status: Development

Features

  • πŸ–ΌοΈ Advanced image analysis using Google's Gemini Vision API and Anthropic Claude
  • πŸ“ Rich, structured metadata generation with AI-powered descriptions
  • πŸ”„ Batch processing with smart compression and parallel processing
  • πŸ’Ύ Multiple output formats (CSV, XML) with customizable schemas
  • 🏷️ Automatic EXIF metadata writing and management
  • πŸ“Š AI-powered filename suggestions and organization
  • πŸ“‹ Complete file operation tracking with detailed logs
  • πŸ”’ Non-destructive processing with backup options
  • πŸ“Š Progress tracking and detailed statistics
  • βš™οΈ Highly configurable via environment variables and CLI

Installation

  1. Ensure you have Python 3.8 or higher installed

  2. Clone the repository:

git clone https://github.com/nerveband/image_sense.git
cd image_sense
  1. Install dependencies:
pip install -r requirements.txt
  1. Install the package in development mode:
pip install -e .
  1. Copy the example environment file and configure your settings:
cp .env.example .env
  1. Edit .env with your API keys and preferences

Configuration

Image Sense can be configured using environment variables. Create a .env file in the project root with the following options:

Default Values and Configuration

Below are the default values used by the application. You can override any of these in your .env file:

Image Processing

# Enable smart compression (recommended for large files)
COMPRESSION_ENABLED=true
# JPEG quality (1-100, higher = better quality but larger size)
COMPRESSION_QUALITY=85
# Maximum dimension in pixels for processing
MAX_DIMENSION=1024

Batch Processing

# Number of images to process in parallel
DEFAULT_BATCH_SIZE=8
# Maximum allowed batch size (model-dependent)
MAX_BATCH_SIZE=16

Output Settings

# Default output format (csv or xml)
DEFAULT_OUTPUT_FORMAT=csv
# Directory for output files
OUTPUT_DIRECTORY=output

Model Settings

# Default AI model
DEFAULT_MODEL=gemini-2.0-flash-exp
# Available models:
# - gemini-2.0-flash-exp: Latest experimental model (fastest)
# - gemini-1.5-flash: Production model (balanced)
# - gemini-1.5-pro: More detailed analysis (slower)

Metadata Settings

# Create backups before modifying metadata
BACKUP_METADATA=true
# Write analysis results to image EXIF data
WRITE_EXIF=true
# Create duplicate files before modifying
DUPLICATE_FILES=false
# Suffix for duplicate files
DUPLICATE_SUFFIX=_modified

Progress and Logging

# Show progress bars and statistics
SHOW_PROGRESS=true
# Show real-time Gemini model responses
VERBOSE_OUTPUT=false
# Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
LOG_LEVEL=INFO

Recent Updates

Enhanced Image Analysis and Feedback (December 25, 2023)

  1. Improved Verbose Output (Now Default)

    • Verbose output is now enabled by default for better visibility
    • Added --verboseoff flag to disable verbose output if needed
    • Added detailed progress indicators for:
      • Image optimization/compression
      • Gemini API interactions
      • XML parsing and validation
      • CSV output generation
  2. Enhanced Image Processing

    • Added automatic image optimization for Gemini API
    • Shows compression statistics (original size, compressed size, reduction percentage)
    • Better error handling for image processing failures
  3. Improved Data Handling

    • Better XML parsing with proper Unicode support
    • Enhanced CSV output with all fields properly populated
    • Added suggested filename to output
    • Fixed various edge cases in data extraction
  4. Configuration Updates

    • Environment variables are now properly respected (VERBOSE_OUTPUT, GOOGLE_API_KEY, etc.)
    • Verbose output can be controlled via:
      • Environment variable: VERBOSE_OUTPUT=false
      • CLI flag: --verboseoff
      • Default is verbose on

Usage Examples

Process a single image with default settings (verbose):

image_sense process path/to/image.jpg

Process a single image with verbose output disabled:

image_sense process path/to/image.jpg --verboseoff

Bulk process a directory of images:

image_sense bulk-process path/to/directory

Bulk process with specific options:

image_sense bulk-process path/to/directory --recursive --verboseoff --model 2-flash

Environment Variables

  • VERBOSE_OUTPUT: Control verbose output (default: "true")
  • GOOGLE_API_KEY: Your Google API key for Gemini
  • GEMINI_MODEL: Default model to use (default: "gemini-2.0-flash-exp")

API Keys

You'll need a Google API key with Gemini Vision API access enabled:

  1. Get it from: https://aistudio.google.com/app/apikey
  2. Add it to your .env file as GOOGLE_API_KEY=your-key-here
  3. Or pass it directly using the --api-key parameter

Usage

Quick Start

  1. Generate metadata for a directory of images:
image_sense generate-metadata path/to/photos --api-key YOUR_API_KEY

This will analyze all images and create a metadata.csv file with detailed descriptions, keywords, and technical details.

  1. Process a single image:
image_sense process path/to/image.jpg
  1. Process multiple images with advanced options:
image_sense bulk-process path/to/directory --api-key YOUR_API_KEY --output-format xml

Command Options

Generate Metadata (Recommended)

The generate-metadata command analyzes images and creates structured metadata files:

image_sense generate-metadata path/to/directory --api-key YOUR_API_KEY [OPTIONS]

Key features:

  • Non-destructive: Original images remain unchanged
  • Flexible output: Choose between CSV and XML formats
  • Smart compression: Optimized for faster processing
  • Batch processing: Handle multiple images efficiently
  • Incremental updates: Skip already processed files
  • AI-powered filename suggestions
  • Complete file operation tracking

Options:

  • --output-format, -f: Choose output format (csv/xml)
  • --output-file: Specify custom output file path
  • --model: Select AI model to use
  • --batch-size: Set custom batch size
  • --no-compress: Disable image compression
  • --skip-existing: Skip files that already have metadata
  • --duplicate: Create duplicates before modifying files
  • --no-backup: Disable ExifTool backup creation

Example with duplicate files:

# Process images and create duplicates before modification
image_sense generate-metadata photos/ --api-key YOUR_API_KEY --duplicate

# Process without creating duplicates (modify in place)
image_sense generate-metadata photos/ --api-key YOUR_API_KEY

For detailed command documentation, see Commands Documentation.

Output Formats

CSV Format

The CSV output includes columns for:

  • Original file path
  • Original filename
  • New filename (if renamed)
  • Modified file path (if duplicated)
  • Suggested filename
  • Description
  • Keywords
  • Technical details
  • Visual elements
  • Composition
  • Mood
  • Use cases

XML Format

The XML output provides a structured representation of:

  • File information
    • Original path and filename
    • New filename (if renamed)
    • Modified path (if duplicated)
    • Suggested filename
  • Image metadata
  • Analysis results
  • Technical information
  • Visual characteristics

XML output is now saved by default for each processed folder, with the following features:

  • Automatic XML file creation named after the input folder
  • Original filename tracking in XML output
  • Configurable via SAVE_XML_OUTPUT environment variable
  • XML files contain complete analysis with original file tracking

To disable XML output, set in your .env:

SAVE_XML_OUTPUT=false

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Google Gemini Vision API for image analysis
  • ExifTool for metadata management
  • Rich for beautiful terminal output

About

A tool to take images and process them with AI, then write to exif, xml, and/or csv.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors