# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Overview This is an AI-powered tattoo search engine that combines visual similarity search with image captioning. Users upload a tattoo image, and the system finds visually similar tattoos from across the web using multi-model embeddings and multi-platform search. **Tech Stack**: FastAPI, PyTorch, HuggingFace Transformers, OpenCLIP, DINOv2, SigLIP **Deployment**: Dockerized application designed for HuggingFace Spaces (GPU recommended) ## Development Commands ### Running the Application ```bash # Local development python app.py # Docker build and run docker build -t tattoo-search . docker run -p 7860:7860 --env-file .env tattoo-search ``` ### Environment Setup Required environment variable: - `HF_TOKEN`: HuggingFace API token (required for GLM-4.5V captioning via Novita provider) Create `.env` file: ``` HF_TOKEN=your_token_here ``` ### Testing Endpoints ```bash # Health check curl http://localhost:7860/health # Get available models curl http://localhost:7860/models # Search with image curl -X POST http://localhost:7860/search \ -F "file=@tattoo.jpg" \ -F "embedding_model=clip" \ -F "include_patch_attention=false" ``` ## Architecture ### Core Pipeline Flow 1. **Image Upload** → FastAPI endpoint (`/search` in main.py) 2. **Caption Generation** → GLM-4.5V via HuggingFace InferenceClient (Novita provider) 3. **Multi-Platform Search** → SearchEngineManager coordinates searches across Pinterest, Reddit, Instagram 4. **URL Validation** → URLValidator filters valid/accessible images 5. **Embedding Extraction** → Selected model (CLIP/DINOv2/SigLIP) encodes query + candidates 6. **Similarity Computation** → Cosine similarity ranking in parallel 7. **Optional Patch Analysis** → PatchAttentionAnalyzer for detailed visual correspondence ### Key Components **main.py - TattooSearchEngine Class** - Main orchestration class that ties all components together - `generate_caption()`: Uses HuggingFace InferenceClient with GLM-4.5V model - `search_images()`: Delegates to SearchEngineManager with caching - `download_and_process_image()`: Parallel image download and similarity computation - `compute_similarity()`: ThreadPoolExecutor for concurrent processing with early stopping **embeddings.py - Model Abstraction** - `EmbeddingModel`: Abstract base class defining interface - `CLIPEmbedding`: OpenAI CLIP ViT-B/32 (default) - `DINOv2Embedding`: Meta's self-supervised vision transformer - `SigLIPEmbedding`: Google's improved CLIP-like model - `EmbeddingModelFactory`: Factory pattern for model instantiation with fallback - All models support both global image embeddings and patch-level features **search_engines/ - Multi-Platform Search** - `SearchEngineManager`: Coordinates parallel searches across platforms with fallback strategies - `BaseSearchEngine`: Abstract interface for platform-specific engines - Platform implementations: PinterestSearchEngine, RedditSearchEngine, InstagramSearchEngine - `SearchResult` and `ImageResult`: Data classes for structured results - Includes intelligent query simplification for fallback searches **patch_attention.py - Visual Correspondence** - `PatchAttentionAnalyzer`: Computes patch-level attention matrices between images - `compute_patch_similarities()`: Extracts patch features and computes attention - `visualize_attention_heatmap()`: Creates matplotlib visualizations as base64 PNG - Returns attention matrices showing which image regions correspond best **utils/ - Supporting Utilities** - `SearchCache`: In-memory LRU cache with TTL for search results - `URLValidator`: Concurrent URL validation to filter broken/inaccessible images ### Model Selection Logic The search engine supports dynamic model switching via `get_search_engine()`: - Global singleton pattern with lazy initialization - Models are swapped only when a different embedding model is requested - Each model implements both global pooling and patch-level encoding ### Search Strategy SearchEngineManager uses a tiered approach: 1. Primary platforms (Pinterest, Reddit) searched first 2. If results < threshold, try additional platforms (Instagram) 3. If still insufficient, simplify query and retry 4. All platform searches run concurrently via ThreadPoolExecutor ### Caching Strategy - Search results cached by query + max_results hash - Default TTL: 1 hour (3600s) - Max cache size: 1000 entries with LRU eviction - Significantly reduces redundant searches ## Important Implementation Details ### Caption Generation - Uses GLM-4.5V via HuggingFace InferenceClient with Novita provider - Converts PIL image to base64 data URL - Expects JSON response with "search_query" field - Fallback to "tattoo artwork" on failure ### Image Download Headers - Platform-specific headers (Pinterest, Instagram optimizations) - Random user agent rotation - Content-type and size validation (10MB limit, min 50x50px) - Exponential backoff retry mechanism ### Similarity Computation - Early stopping optimization: stops at 20 good results (5 if patch attention enabled) - ThreadPoolExecutor with max 10 workers - Rate limiting with 0.1s delays between downloads - Future cancellation after target reached ### Patch Attention - Only triggered when `include_patch_attention=true` - Computes NxM attention matrix (query patches × candidate patches) - Visualizations include: attention heatmap, patch grid overlays, top correspondences - Returns base64-encoded PNG images ## API Response Structures **POST /search** returns: ```json { "caption": "string", "results": [ { "score": 0.95, "url": "https://...", "patch_attention": { // optional "overall_similarity": 0.87, "query_grid_size": 7, "candidate_grid_size": 7, "attention_summary": {...} } } ], "embedding_model": "CLIP-ViT-B-32", "patch_attention_enabled": false } ``` **POST /analyze-attention** returns detailed patch analysis with visualizations ## Common Development Patterns ### Adding a New Embedding Model 1. Create new class in `embeddings.py` inheriting from `EmbeddingModel` 2. Implement `load_model()`, `encode_image()`, `encode_image_patches()`, `get_model_name()` 3. Add to `EmbeddingModelFactory.AVAILABLE_MODELS` 4. Add config to `get_default_model_configs()` ### Adding a New Search Platform 1. Create new engine in `search_engines/` inheriting from `BaseSearchEngine` 2. Add platform to `SearchPlatform` enum in `base.py` 3. Implement `search()` and `is_valid_url()` methods 4. Add to `SearchEngineManager.engines` dict 5. Update platform prioritization in `search_with_fallback()` if needed ## Performance Considerations - GPU acceleration used if available (CUDA) - Concurrent image downloads (ThreadPoolExecutor) - Search result caching to reduce API calls - Early stopping in similarity computation - Future cancellation after targets met - Model instances reused globally to avoid reloading ## Deployment Notes - Designed for HuggingFace Spaces with Docker SDK - Port 7860 (HF Spaces default) - Recommended hardware: T4 Small GPU or higher - Health check endpoint at `/health` for monitoring - All models download on first use and cache in `/app/cache`