# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Overview

This is an AI-powered tattoo search engine that combines visual similarity search with image captioning. Users upload a tattoo image, and the system finds visually similar tattoos from across the web using multi-model embeddings and multi-platform search.

**Tech Stack**: FastAPI, PyTorch, HuggingFace Transformers, OpenCLIP, DINOv2, SigLIP

**Deployment**: Dockerized application designed for HuggingFace Spaces (GPU recommended)

## Development Commands

### Running the Application

```bash
# Local development
python app.py

# Docker build and run
docker build -t tattoo-search .
docker run -p 7860:7860 --env-file .env tattoo-search
```

### Environment Setup

Required environment variable:
- `HF_TOKEN`: HuggingFace API token (required for GLM-4.5V captioning via Novita provider)

Create `.env` file:
```
HF_TOKEN=your_token_here
```

### Testing Endpoints

```bash
# Health check
curl http://localhost:7860/health

# Get available models
curl http://localhost:7860/models

# Search with image
curl -X POST http://localhost:7860/search \
  -F "file=@tattoo.jpg" \
  -F "embedding_model=clip" \
  -F "include_patch_attention=false"
```

## Architecture

### Core Pipeline Flow

1. **Image Upload** → FastAPI endpoint (`/search` in main.py)
2. **Caption Generation** → GLM-4.5V via HuggingFace InferenceClient (Novita provider)
3. **Multi-Platform Search** → SearchEngineManager coordinates searches across Pinterest, Reddit, Instagram
4. **URL Validation** → URLValidator filters valid/accessible images
5. **Embedding Extraction** → Selected model (CLIP/DINOv2/SigLIP) encodes query + candidates
6. **Similarity Computation** → Cosine similarity ranking in parallel
7. **Optional Patch Analysis** → PatchAttentionAnalyzer for detailed visual correspondence

### Key Components

**main.py - TattooSearchEngine Class**
- Main orchestration class that ties all components together
- `generate_caption()`: Uses HuggingFace InferenceClient with GLM-4.5V model
- `search_images()`: Delegates to SearchEngineManager with caching
- `download_and_process_image()`: Parallel image download and similarity computation
- `compute_similarity()`: ThreadPoolExecutor for concurrent processing with early stopping

**embeddings.py - Model Abstraction**
- `EmbeddingModel`: Abstract base class defining interface
- `CLIPEmbedding`: OpenAI CLIP ViT-B/32 (default)
- `DINOv2Embedding`: Meta's self-supervised vision transformer
- `SigLIPEmbedding`: Google's improved CLIP-like model
- `EmbeddingModelFactory`: Factory pattern for model instantiation with fallback
- All models support both global image embeddings and patch-level features

**search_engines/ - Multi-Platform Search**
- `SearchEngineManager`: Coordinates parallel searches across platforms with fallback strategies
- `BaseSearchEngine`: Abstract interface for platform-specific engines
- Platform implementations: PinterestSearchEngine, RedditSearchEngine, InstagramSearchEngine
- `SearchResult` and `ImageResult`: Data classes for structured results
- Includes intelligent query simplification for fallback searches

**patch_attention.py - Visual Correspondence**
- `PatchAttentionAnalyzer`: Computes patch-level attention matrices between images
- `compute_patch_similarities()`: Extracts patch features and computes attention
- `visualize_attention_heatmap()`: Creates matplotlib visualizations as base64 PNG
- Returns attention matrices showing which image regions correspond best

**utils/ - Supporting Utilities**
- `SearchCache`: In-memory LRU cache with TTL for search results
- `URLValidator`: Concurrent URL validation to filter broken/inaccessible images

### Model Selection Logic

The search engine supports dynamic model switching via `get_search_engine()`:
- Global singleton pattern with lazy initialization
- Models are swapped only when a different embedding model is requested
- Each model implements both global pooling and patch-level encoding

### Search Strategy

SearchEngineManager uses a tiered approach:
1. Primary platforms (Pinterest, Reddit) searched first
2. If results < threshold, try additional platforms (Instagram)
3. If still insufficient, simplify query and retry
4. All platform searches run concurrently via ThreadPoolExecutor

### Caching Strategy

- Search results cached by query + max_results hash
- Default TTL: 1 hour (3600s)
- Max cache size: 1000 entries with LRU eviction
- Significantly reduces redundant searches

## Important Implementation Details

### Caption Generation
- Uses GLM-4.5V via HuggingFace InferenceClient with Novita provider
- Converts PIL image to base64 data URL
- Expects JSON response with "search_query" field
- Fallback to "tattoo artwork" on failure

### Image Download Headers
- Platform-specific headers (Pinterest, Instagram optimizations)
- Random user agent rotation
- Content-type and size validation (10MB limit, min 50x50px)
- Exponential backoff retry mechanism

### Similarity Computation
- Early stopping optimization: stops at 20 good results (5 if patch attention enabled)
- ThreadPoolExecutor with max 10 workers
- Rate limiting with 0.1s delays between downloads
- Future cancellation after target reached

### Patch Attention
- Only triggered when `include_patch_attention=true`
- Computes NxM attention matrix (query patches × candidate patches)
- Visualizations include: attention heatmap, patch grid overlays, top correspondences
- Returns base64-encoded PNG images

## API Response Structures

**POST /search** returns:
```json
{
  "caption": "string",
  "results": [
    {
      "score": 0.95,
      "url": "https://...",
      "patch_attention": {  // optional
        "overall_similarity": 0.87,
        "query_grid_size": 7,
        "candidate_grid_size": 7,
        "attention_summary": {...}
      }
    }
  ],
  "embedding_model": "CLIP-ViT-B-32",
  "patch_attention_enabled": false
}
```

**POST /analyze-attention** returns detailed patch analysis with visualizations

## Common Development Patterns

### Adding a New Embedding Model

1. Create new class in `embeddings.py` inheriting from `EmbeddingModel`
2. Implement `load_model()`, `encode_image()`, `encode_image_patches()`, `get_model_name()`
3. Add to `EmbeddingModelFactory.AVAILABLE_MODELS`
4. Add config to `get_default_model_configs()`

### Adding a New Search Platform

1. Create new engine in `search_engines/` inheriting from `BaseSearchEngine`
2. Add platform to `SearchPlatform` enum in `base.py`
3. Implement `search()` and `is_valid_url()` methods
4. Add to `SearchEngineManager.engines` dict
5. Update platform prioritization in `search_with_fallback()` if needed

## Performance Considerations

- GPU acceleration used if available (CUDA)
- Concurrent image downloads (ThreadPoolExecutor)
- Search result caching to reduce API calls
- Early stopping in similarity computation
- Future cancellation after targets met
- Model instances reused globally to avoid reloading

## Deployment Notes

- Designed for HuggingFace Spaces with Docker SDK
- Port 7860 (HF Spaces default)
- Recommended hardware: T4 Small GPU or higher
- Health check endpoint at `/health` for monitoring
- All models download on first use and cache in `/app/cache`