# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview This is a Gradio application for image generation using the Qwen-Image model with Lightning LoRA acceleration. It's designed to run on Hugging Face Spaces with GPU support, providing fast 8-step image generation with advanced text rendering capabilities. ## Commands ### Run the application locally ```bash python app.py ``` ### Install dependencies ```bash pip install -r requirements.txt ``` ## Architecture ### Core Components 1. **Model Pipeline** (`app.py:130-164`) - Uses `Qwen/Qwen-Image` diffusion model with custom FlowMatchEulerDiscreteScheduler - Loads Lightning LoRA weights for 8-step acceleration - Configured for bfloat16 precision on CUDA 2. **Prompt Enhancement System** (`app.py:41-125`) - `polish_prompt()`: Uses Hugging Face InferenceClient with Cerebras provider to enhance prompts - `get_caption_language()`: Detects Chinese vs English prompts - `rewrite()`: Language-specific prompt enhancement with different system prompts for Chinese/English - Requires `HF_TOKEN` environment variable for API access 3. **Style Presets System** (`app.py:16-87`) - `load_style_presets()`: Loads style presets from `style_presets.yaml` - `apply_style_preset()`: Applies selected style to prompts - Supports custom styles and random style selection - Each preset includes prefix, suffix, and negative prompt components 4. **Page Layouts System** (`app.py:89-145`) - `load_page_layouts()`: Loads multi-image layouts from `page_layouts.yaml` - `get_layout_choices()`: Returns available layouts for a given number of images - `get_layout_metadata()`: Extracts panel metadata (type, focus, composition) for each position - Supports 1-8 images per page with 5-6 layout variations each - Dynamic layout selection based on number of images - **Panel Metadata System**: Each panel position includes metadata that describes: - `panel_type`: establishing/action/closeup/dialogue/reaction/transition/detail/splash - `focus`: environment/character/characters/action/emotion/object/event - `composition`: wide/tall/square/portrait/landscape - Metadata is used to guide the LLM in generating appropriate scene descriptions 5. **Story Generation System** (`app.py:147-265`) - `generate_story_scenes()`: Uses Hugging Face InferenceClient with Qwen3-235B to generate scene descriptions - Takes panel metadata as input to generate contextually appropriate content - Adapts descriptions based on panel type, focus, and composition - Returns structured scene data with captions and dialogue - `parse_yaml_scenes()`: Parses LLM output into structured scene data 6. **Image Size Calculation** (`app.py:267-330`) - `get_image_size_for_position()`: Calculates precise image dimensions based on layout aspect ratio - Uses 8px rounding for model compatibility while maintaining aspect ratio accuracy - Ensures images fill their layout containers without floating - `get_layout_position_for_image()`: Retrieves position data for a specific panel 7. **PDF Generation** (`app.py:450-540`) - `create_single_page_pdf()`: Creates PDF page with images arranged per layout - `create_multi_page_pdf()`: Combines multiple pages into a single document - Uses ReportLab for high-quality PDF generation - Preserves image quality at 95% JPEG compression - A4 page size with flexible positioning system - Smart filling: fills space completely when aspect ratios match (<2% difference) 8. **Multi-Image Generation** (`app.py:545-650`) - `infer_page()`: Main generation orchestrator - Generates multiple images and combines into PDF - Progressive generation with status updates - Seed management for reproducibility across multiple images - Returns PDF file, preview image, and seed information 9. **Gradio Interface** (`app.py:750-900+`) - Slider for selecting 1-8 images per page - Dynamic layout dropdown that updates based on image count - Style preset dropdown with custom style text option - PDF download and image preview outputs - Advanced settings for all generation parameters ## Key Configuration - **Scheduler Config** (`app.py:133-148`): Custom configuration for FlowMatchEulerDiscreteScheduler with exponential time shifting - **Aspect Ratios** (`app.py:170-188`): Predefined aspect ratios optimized for 1024 base resolution - **Style Presets** (`style_presets.yaml`): Configurable style presets with prompt modifiers and negative prompts - **Page Layouts** (`page_layouts.yaml`): Flexible layout system for 1-4 images per page - **Default Settings**: 8 inference steps, guidance scale 1.0, prompt enhancement enabled, 1 image per page ## Environment Variables - `HF_TOKEN`: Required for prompt enhancement via Hugging Face InferenceClient - Used for accessing Cerebras provider for Qwen3-235B model ## Key Features - **Session-based storage**: Each user session gets a unique temporary directory that persists for 24 hours - **Multi-page PDF generation**: Users can generate up to 128 pages in a single document - **Dynamic page addition**: Click "Generate page N" to add the next page to the PDF - **Flexible layouts**: Different layout options for 1-4 images per page - **Style presets**: 20+ predefined artistic styles - **Automatic cleanup**: Old sessions are automatically cleaned after 24 hours ## Model Dependencies - Main model: `Qwen/Qwen-Image` - LoRA weights: `lightx2v/Qwen-Image-Lightning` (V1.1 safetensors) - Prompt enhancement model: `Qwen/Qwen3-235B-A22B-Instruct-2507` via Cerebras