Spaces:

onurcopur
/

tattoo_search_engine

Running

App Files Files Community

onurcopur commited on Sep 24

Commit

f761027

1 Parent(s): cbeb83b

first commit

Browse files

Files changed (9) hide show

.env.template +9 -0
DEPLOYMENT.md +51 -0
Dockerfile +38 -0
README.md +56 -4
app.py +30 -0
embeddings.py +352 -0
main.py +497 -0
patch_attention.py +221 -0
requirements.txt +30 -0

.env.template ADDED Viewed

	@@ -0,0 +1,9 @@

+# Hugging Face Token for VLM inference
+# Required for image captioning using GLM-4.5V model
+HF_TOKEN=your_huggingface_token_here
+# Optional: Custom port (defaults to 7860 for HF Spaces)
+# PORT=7860
+# Optional: Model cache directory
+# HF_HOME=/app/cache

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,51 @@

+# Hugging Face Spaces Deployment Guide
+## Environment Variables
+### Required Secrets
+Set these in your Hugging Face Space settings:
+1. **HF_TOKEN**: Your Hugging Face access token
+   - Go to: https://huggingface.co/settings/tokens
+   - Create a new token with read access
+   - Add as a secret in your Space settings
+## Hardware Requirements
+- **Recommended**: T4 Small or higher for optimal performance
+- **Minimum**: CPU (slower inference)
+- **Memory**: At least 8GB RAM recommended
+- **Storage**: 10GB+ for model caching
+## Deployment Steps
+1. Push all files to your HF Space repository:
+   ```bash
+   git add .
+   git commit -m "Deploy tattoo search engine"
+   git push
+   ```
+2. Set the HF_TOKEN secret in Space settings
+3. The Space will automatically build and deploy
+## Testing
+Once deployed, test these endpoints:
+- `GET /health` - Health check
+- `GET /models` - Available models
+- `POST /search` - Upload image and search
+## Troubleshooting
+### Common Issues:
+1. **Missing HF_TOKEN**: Set the token in Space secrets
+2. **Model loading errors**: Check hardware requirements
+3. **Timeout errors**: Consider upgrading to GPU hardware
+4. **Memory errors**: Upgrade to larger hardware tier
+### Logs:
+Check the Space logs for detailed error messages and startup information.

Dockerfile ADDED Viewed

	@@ -0,0 +1,38 @@

+# Use Python 3.12 as base image
+FROM python:3.12-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    gcc \
+    g++ \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV PORT=7860
+# Copy requirements and install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir --upgrade pip
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Create cache directory for models
+RUN mkdir -p /app/cache
+# Expose the port
+EXPOSE 7860
+# Health check
+HEALTHCHECK --interval=30s --timeout=30s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:7860/health || exit 1
+# Run the application
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,11 +1,63 @@
 ---
 title: Tattoo Search Engine
-emoji: 👁
-colorFrom: yellow
-colorTo: blue
 sdk: docker
 pinned: false
 license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Tattoo Search Engine
+emoji: 🎨
+colorFrom: purple
+colorTo: pink
 sdk: docker
 pinned: false
 license: mit
+app_port: 7860
+suggested_hardware: t4-small
 ---
+# Tattoo Search Engine 🎨
+A powerful AI-powered tattoo search engine that finds similar tattoos based on visual similarity. Upload an image of a tattoo and discover visually similar designs from across the web.
+## Features
+- **Multi-Model Support**: Choose from CLIP, DINOv2, or SigLIP embedding models
+- **Advanced Search**: Combines image captioning with visual similarity search
+- **Patch Attention Analysis**: Detailed analysis of which parts of tattoos are most similar
+- **Real-time Processing**: Fast image processing and similarity computation
+- **Multiple Platforms**: Searches across various tattoo platforms and image sources
+## API Endpoints
+### `POST /search`
+Search for similar tattoos by uploading an image.
+**Parameters:**
+- `file`: Image file (required)
+- `embedding_model`: Model to use - "clip", "dinov2", or "siglip" (default: "clip")
+- `include_patch_attention`: Enable detailed patch analysis (default: false)
+### `POST /analyze-attention`
+Analyze patch-level attention between two images.
+**Parameters:**
+- `query_file`: Query image file (required)
+- `candidate_url`: URL of candidate image to compare (required)
+- `embedding_model`: Model to use (default: "clip")
+- `include_visualizations`: Include attention visualizations (default: true)
+### `GET /models`
+Get available embedding models and their configurations.
+### `GET /health`
+Health check endpoint.
+## Models Used
+- **Image Captioning**: GLM-4.5V via HuggingFace Inference API
+- **Visual Similarity**: CLIP ViT-B/32, DINOv2, or SigLIP
+- **Search**: Multi-platform web search with intelligent filtering
+## Usage
+1. Upload a tattoo image
+2. Select your preferred embedding model
+3. Get ranked results with similarity scores
+4. Optionally analyze detailed patch-level similarities
+Perfect for tattoo enthusiasts, artists, and anyone looking for tattoo inspiration!

app.py ADDED Viewed

	@@ -0,0 +1,30 @@

+import os
+import logging
+from main import app
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+)
+logger = logging.getLogger(__name__)
+if __name__ == "__main__":
+    # Get port from environment (Hugging Face Spaces uses 7860)
+    port = int(os.environ.get("PORT", 7860))
+    logger.info(f"Starting Tattoo Search Engine on port {port}")
+    logger.info("Available endpoints:")
+    logger.info("  POST /search - Search for similar tattoos")
+    logger.info("  POST /analyze-attention - Analyze patch-level attention")
+    logger.info("  GET /models - Get available embedding models")
+    logger.info("  GET /health - Health check")
+    import uvicorn
+    uvicorn.run(
+        app,
+        host="0.0.0.0",
+        port=port,
+        log_level="info",
+        access_log=True
+    )

embeddings.py ADDED Viewed

	@@ -0,0 +1,352 @@

+from abc import ABC, abstractmethod
+from typing import Dict, Any, List
+import torch
+import torch.nn.functional as F
+from PIL import Image
+import logging
+logger = logging.getLogger(__name__)
+class EmbeddingModel(ABC):
+    """Abstract base class for embedding models."""
+    def __init__(self, device: torch.device):
+        self.device = device
+        self.model = None
+        self.preprocess = None
+    @abstractmethod
+    def load_model(self) -> None:
+        """Load the embedding model and preprocessing."""
+        pass
+    @abstractmethod
+    def encode_image(self, image: Image.Image) -> torch.Tensor:
+        """Encode an image into feature vector."""
+        pass
+    def encode_image_patches(self, image: Image.Image) -> torch.Tensor:
+        """Encode an image into patch-level features. Override in subclasses that support it."""
+        raise NotImplementedError("Patch-level encoding not implemented for this model")
+    def compute_patch_attention(self, query_patches: torch.Tensor, candidate_patches: torch.Tensor) -> torch.Tensor:
+        """Compute attention weights between query and candidate patches."""
+        # query_patches: [num_query_patches, feature_dim]
+        # candidate_patches: [num_candidate_patches, feature_dim]
+        # Normalize patches
+        query_patches = F.normalize(query_patches, p=2, dim=1)
+        candidate_patches = F.normalize(candidate_patches, p=2, dim=1)
+        # Compute attention matrix: [num_query_patches, num_candidate_patches]
+        attention_matrix = torch.mm(query_patches, candidate_patches.T)
+        return attention_matrix
+    @abstractmethod
+    def get_model_name(self) -> str:
+        """Return the model name."""
+        pass
+    def compute_similarity(self, query_features: torch.Tensor, candidate_features: torch.Tensor) -> float:
+        """Compute similarity between query and candidate features."""
+        return torch.mm(query_features, candidate_features.T).item()
+class CLIPEmbedding(EmbeddingModel):
+    """CLIP-based embedding model."""
+    def __init__(self, device: torch.device, model_name: str = "ViT-B-32"):
+        super().__init__(device)
+        self.model_name = model_name
+        self.tokenizer = None
+        self.load_model()
+    def load_model(self) -> None:
+        """Load CLIP model and preprocessing."""
+        try:
+            import open_clip
+            logger.info(f"Loading CLIP model: {self.model_name}")
+            self.model, _, self.preprocess = open_clip.create_model_and_transforms(
+                self.model_name, pretrained="openai"
+            )
+            self.model.to(self.device)
+            self.tokenizer = open_clip.get_tokenizer(self.model_name)
+            logger.info(f"CLIP model {self.model_name} loaded successfully")
+        except Exception as e:
+            logger.error(f"Failed to load CLIP model: {e}")
+            raise
+    def encode_image(self, image: Image.Image) -> torch.Tensor:
+        """Encode image using CLIP."""
+        try:
+            image_input = self.preprocess(image).unsqueeze(0).to(self.device)
+            with torch.no_grad():
+                features = self.model.encode_image(image_input)
+                features = F.normalize(features, p=2, dim=1)
+            return features
+        except Exception as e:
+            logger.error(f"Failed to encode image with CLIP: {e}")
+            raise
+    def encode_image_patches(self, image: Image.Image) -> torch.Tensor:
+        """Encode image patches using CLIP vision transformer."""
+        try:
+            image_input = self.preprocess(image).unsqueeze(0).to(self.device)
+            with torch.no_grad():
+                # Get patch features from CLIP vision transformer
+                vision_model = self.model.visual
+                # Pass through patch embedding and positional encoding
+                x = vision_model.conv1(image_input)  # shape = [*, width, grid, grid]
+                x = x.reshape(x.shape[0], x.shape[1], -1)  # shape = [*, width, grid ** 2]
+                x = x.permute(0, 2, 1)  # shape = [*, grid ** 2, width]
+                # Add class token and positional embeddings
+                x = torch.cat([vision_model.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1)
+                x = x + vision_model.positional_embedding.to(x.dtype)
+                # Apply layer norm
+                x = vision_model.ln_pre(x)
+                x = x.permute(1, 0, 2)  # NLD -> LND
+                # Pass through transformer blocks
+                for block in vision_model.transformer.resblocks:
+                    x = block(x)
+                x = x.permute(1, 0, 2)  # LND -> NLD
+                # Remove class token to get only patch features
+                patch_features = x[:, 1:, :]  # [1, num_patches, feature_dim]
+                patch_features = vision_model.ln_post(patch_features)
+                # Apply projection if it exists
+                if vision_model.proj is not None:
+                    patch_features = patch_features @ vision_model.proj
+                # Normalize patch features
+                patch_features = F.normalize(patch_features, p=2, dim=-1)
+                return patch_features.squeeze(0)  # [num_patches, feature_dim]
+        except Exception as e:
+            logger.error(f"Failed to encode image patches with CLIP: {e}")
+            raise
+    def get_model_name(self) -> str:
+        return f"CLIP-{self.model_name}"
+class DINOv2Embedding(EmbeddingModel):
+    """DINOv2-based embedding model."""
+    def __init__(self, device: torch.device, model_name: str = "dinov2_vitb14"):
+        super().__init__(device)
+        self.model_name = model_name
+        self.load_model()
+    def load_model(self) -> None:
+        """Load DINOv2 model and preprocessing."""
+        try:
+            import torch.hub
+            from torchvision import transforms
+            logger.info(f"Loading DINOv2 model: {self.model_name}")
+            # Load DINOv2 model from torch hub
+            self.model = torch.hub.load('facebookresearch/dinov2', self.model_name)
+            self.model.to(self.device)
+            self.model.eval()
+            # DINOv2 preprocessing
+            self.preprocess = transforms.Compose([
+                transforms.Resize(256, interpolation=transforms.InterpolationMode.BICUBIC),
+                transforms.CenterCrop(224),
+                transforms.ToTensor(),
+                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+            ])
+            logger.info(f"DINOv2 model {self.model_name} loaded successfully")
+        except Exception as e:
+            logger.error(f"Failed to load DINOv2 model: {e}")
+            raise
+    def encode_image(self, image: Image.Image) -> torch.Tensor:
+        """Encode image using DINOv2."""
+        try:
+            image_input = self.preprocess(image).unsqueeze(0).to(self.device)
+            with torch.no_grad():
+                features = self.model(image_input)
+                features = F.normalize(features, p=2, dim=1)
+            return features
+        except Exception as e:
+            logger.error(f"Failed to encode image with DINOv2: {e}")
+            raise
+    def encode_image_patches(self, image: Image.Image) -> torch.Tensor:
+        """Encode image patches using DINOv2."""
+        try:
+            image_input = self.preprocess(image).unsqueeze(0).to(self.device)
+            with torch.no_grad():
+                # Get patch features from DINOv2
+                # DINOv2 forward_features returns dict with 'x_norm_patchtokens' containing patch features
+                features_dict = self.model.forward_features(image_input)
+                patch_features = features_dict['x_norm_patchtokens']  # [1, num_patches, feature_dim]
+                # Normalize patch features
+                patch_features = F.normalize(patch_features, p=2, dim=-1)
+                return patch_features.squeeze(0)  # [num_patches, feature_dim]
+        except Exception as e:
+            logger.error(f"Failed to encode image patches with DINOv2: {e}")
+            raise
+    def get_model_name(self) -> str:
+        return f"DINOv2-{self.model_name}"
+class SigLIPEmbedding(EmbeddingModel):
+    """SigLIP-based embedding model."""
+    def __init__(self, device: torch.device, model_name: str = "google/siglip-base-patch16-224"):
+        super().__init__(device)
+        self.model_name = model_name
+        self.processor = None
+        self.load_model()
+    def load_model(self) -> None:
+        """Load SigLIP model and preprocessing."""
+        try:
+            # Check for required dependencies
+            try:
+                import sentencepiece
+            except ImportError:
+                raise ImportError(
+                    "SentencePiece is required for SigLIP. Install with: pip install sentencepiece"
+                )
+            from transformers import SiglipVisionModel, SiglipProcessor
+            logger.info(f"Loading SigLIP model: {self.model_name}")
+            self.model = SiglipVisionModel.from_pretrained(self.model_name)
+            self.model.to(self.device)
+            self.model.eval()
+            self.processor = SiglipProcessor.from_pretrained(self.model_name)
+            logger.info(f"SigLIP model {self.model_name} loaded successfully")
+        except Exception as e:
+            logger.error(f"Failed to load SigLIP model: {e}")
+            raise
+    def encode_image(self, image: Image.Image) -> torch.Tensor:
+        """Encode image using SigLIP."""
+        try:
+            inputs = self.processor(images=image, return_tensors="pt")
+            inputs = {k: v.to(self.device) for k, v in inputs.items()}
+            with torch.no_grad():
+                outputs = self.model(**inputs)
+                features = outputs.last_hidden_state.mean(dim=1)  # Global average pooling
+                features = F.normalize(features, p=2, dim=1)
+            return features
+        except Exception as e:
+            logger.error(f"Failed to encode image with SigLIP: {e}")
+            raise
+    def encode_image_patches(self, image: Image.Image) -> torch.Tensor:
+        """Encode image patches using SigLIP."""
+        try:
+            inputs = self.processor(images=image, return_tensors="pt")
+            inputs = {k: v.to(self.device) for k, v in inputs.items()}
+            with torch.no_grad():
+                outputs = self.model(**inputs)
+                # last_hidden_state contains patch features: [1, num_patches, feature_dim]
+                patch_features = outputs.last_hidden_state
+                # Normalize patch features
+                patch_features = F.normalize(patch_features, p=2, dim=-1)
+                return patch_features.squeeze(0)  # [num_patches, feature_dim]
+        except Exception as e:
+            logger.error(f"Failed to encode image patches with SigLIP: {e}")
+            raise
+    def get_model_name(self) -> str:
+        return f"SigLIP-{self.model_name.split('/')[-1]}"
+class EmbeddingModelFactory:
+    """Factory class for creating embedding models."""
+    AVAILABLE_MODELS = {
+        "clip": CLIPEmbedding,
+        "dinov2": DINOv2Embedding,
+        "siglip": SigLIPEmbedding,
+    }
+    @classmethod
+    def create_model(cls, model_type: str, device: torch.device, **kwargs) -> EmbeddingModel:
+        """Create an embedding model instance.
+        Args:
+            model_type: Type of model ('clip', 'dinov2', 'siglip')
+            device: PyTorch device
+            **kwargs: Additional arguments for specific models
+        Returns:
+            EmbeddingModel instance
+        """
+        if model_type.lower() not in cls.AVAILABLE_MODELS:
+            raise ValueError(f"Unknown model type: {model_type}. Available: {list(cls.AVAILABLE_MODELS.keys())}")
+        model_class = cls.AVAILABLE_MODELS[model_type.lower()]
+        try:
+            return model_class(device, **kwargs)
+        except Exception as e:
+            logger.error(f"Failed to create {model_type} model: {e}")
+            # Fallback to CLIP if the requested model fails
+            if model_type.lower() != 'clip':
+                logger.info("Falling back to CLIP model")
+                return cls.AVAILABLE_MODELS['clip'](device, **kwargs)
+            else:
+                raise
+    @classmethod
+    def get_available_models(cls) -> List[str]:
+        """Get list of available model types."""
+        return list(cls.AVAILABLE_MODELS.keys())
+def get_default_model_configs() -> Dict[str, Dict[str, Any]]:
+    """Get default configurations for each model type."""
+    return {
+        "clip": {
+            "model_name": "ViT-B-32",
+            "description": "OpenAI CLIP model - good general purpose vision-language model"
+        },
+        "dinov2": {
+            "model_name": "dinov2_vitb14",
+            "description": "Meta DINOv2 - self-supervised vision transformer, good for visual features"
+        },
+        "siglip": {
+            "model_name": "google/siglip-base-patch16-224",
+            "description": "Google SigLIP - improved CLIP-like model with better training"
+        }
+    }

main.py ADDED Viewed

	@@ -0,0 +1,497 @@

+import io
+import json
+import logging
+import os
+import random
+import re
+import time
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from typing import Any, Dict, List, Optional
+import requests
+import torch
+import torch.nn.functional as F
+from dotenv import load_dotenv
+from fastapi import FastAPI, File, HTTPException, UploadFile, Query
+from fastapi.middleware.cors import CORSMiddleware
+from huggingface_hub import InferenceClient
+from PIL import Image
+from search_engines import SearchEngineManager
+from utils import SearchCache, URLValidator
+from embeddings import EmbeddingModelFactory, EmbeddingModel, get_default_model_configs
+from patch_attention import PatchAttentionAnalyzer
+# Load environment variables from .env file
+load_dotenv()
+# Configuration
+HF_TOKEN = os.getenv("HF_TOKEN")
+if not HF_TOKEN:
+    raise ValueError("HF_TOKEN environment variable is required")
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+app = FastAPI(title="Tattoo Search Engine", version="1.0.0")
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+class TattooSearchEngine:
+    def __init__(self, embedding_model_type: str = "clip"):
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        logger.info(f"Using device: {self.device}")
+        # Initialize HuggingFace InferenceClient for VLM captioning
+        logger.info("Initializing HuggingFace InferenceClient...")
+        self.client = InferenceClient(
+            provider="novita",
+            api_key=HF_TOKEN,
+        )
+        self.vlm_model = "zai-org/GLM-4.5V"
+        logger.info(f"Using VLM model: {self.vlm_model}")
+        # Load embedding model
+        logger.info(f"Loading embedding model: {embedding_model_type}")
+        self.embedding_model = EmbeddingModelFactory.create_model(
+            embedding_model_type, self.device
+        )
+        logger.info(f"Using embedding model: {self.embedding_model.get_model_name()}")
+        # Initialize new search system
+        logger.info("Initializing search system...")
+        self.search_manager = SearchEngineManager(max_workers=5)
+        self.url_validator = URLValidator(max_workers=10, timeout=10)
+        self.search_cache = SearchCache(default_ttl=3600, max_size=1000)
+        # Setup enhanced web scraping
+        self.user_agents = [
+            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
+            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
+            "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0",
+            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15",
+            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
+        ]
+        logger.info("Search system initialized successfully!")
+    def generate_caption(self, image: Image.Image) -> str:
+        """Generate tattoo caption using HuggingFace InferenceClient."""
+        try:
+            # Convert PIL image to base64 URL format
+            img_buffer = io.BytesIO()
+            image.save(img_buffer, format="JPEG", quality=95)
+            img_buffer.seek(0)
+            # Create image URL for the API
+            import base64
+            image_b64 = base64.b64encode(img_buffer.getvalue()).decode()
+            image_url = f"data:image/jpeg;base64,{image_b64}"
+            # completion = self.client.chat.completions.create(
+            #     model=self.vlm_model,
+            #     messages=[
+            #         {
+            #             "role": "user",
+            #             "content": [
+            #                 {
+            #                     "type": "text",
+            #                     "text": "Generate a one search engine query to find the most similar tattoos to this image. Response in json format",
+            #                 },
+            #                 {
+            #                     "type": "image_url",
+            #                     "image_url": {"url": image_url},
+            #                 },
+            #             ],
+            #         }
+            #     ],
+            # )
+            caption = '<|begin_of_box|>{"search_query": "hand tattoo geometric human figure abstract blackwork"}<|end_of_box|>'
+            # caption = completion.choices[0].message.content
+            if caption:
+                match = re.search(r"\{.*\}", caption)
+                if match:
+                    data = json.loads(match.group())
+                    search_query = data["search_query"]
+                    return search_query
+            else:
+                logger.warning("No caption generated from VLM")
+                return "tattoo artwork"
+        except Exception as e:
+            logger.error(f"Failed to generate caption: {e}")
+            return "tattoo artwork"
+    def search_images(self, query: str, max_results: int = 50) -> List[str]:
+        """Search for tattoo images across multiple platforms with caching and validation."""
+        # Check cache first
+        cache_key = SearchCache.create_cache_key(query, max_results)
+        cached_result = self.search_cache.get(cache_key)
+        if cached_result:
+            logger.info(f"Cache hit for query: {query}")
+            return cached_result
+        logger.info(f"Searching for images: {query}")
+        # Use new search system with fallback
+        search_result = self.search_manager.search_with_fallback(
+            query=query, max_results=max_results, min_results_threshold=10
+        )
+        # Extract URLs from search results
+        urls = [image.url for image in search_result.images]
+        if not urls:
+            logger.warning(f"No URLs found for query: {query}")
+            return []
+        # Validate URLs
+        logger.info(f"Validating {len(urls)} URLs...")
+        valid_urls = self.url_validator.validate_urls(urls)
+        if not valid_urls:
+            logger.warning(f"No valid URLs found for query: {query}")
+            return []
+        # Cache the result
+        self.search_cache.set(cache_key, valid_urls, ttl=3600)
+        logger.info(
+            f"Search completed: {len(valid_urls)} valid URLs from "
+            f"{len(search_result.platforms_used)} platforms in "
+            f"{search_result.search_duration:.2f}s"
+        )
+        return valid_urls[:max_results]
+    def download_image(self, url: str, max_retries: int = 3) -> Image.Image:
+        for attempt in range(max_retries):
+            try:
+                # Instagram-optimized headers
+                headers = {
+                    "User-Agent": random.choice(self.user_agents),
+                    "Accept": "image/webp,image/apng,image/*,*/*;q=0.8",
+                    "Accept-Language": "en-US,en;q=0.9",
+                    "Accept-Encoding": "gzip, deflate, br",
+                    "DNT": "1",
+                    "Connection": "keep-alive",
+                    "Upgrade-Insecure-Requests": "1",
+                    "Sec-Fetch-Dest": "image",
+                    "Sec-Fetch-Mode": "no-cors",
+                    "Sec-Fetch-Site": "cross-site",
+                    "Cache-Control": "no-cache",
+                    "Pragma": "no-cache",
+                }
+                # Pinterest-specific headers
+                if "pinterest" in url.lower() or "pinimg" in url.lower():
+                    headers.update(
+                        {
+                            "Referer": "https://www.pinterest.com/",
+                            "Origin": "https://www.pinterest.com",
+                            "X-Requested-With": "XMLHttpRequest",
+                            "Sec-Fetch-User": "?1",
+                            "X-Pinterest-Source": "web",
+                            "X-APP-VERSION": "web",
+                        }
+                    )
+                else:
+                    headers["Referer"] = "https://www.google.com/"
+                response = requests.get(
+                    url, headers=headers, timeout=15, allow_redirects=True, stream=True
+                )
+                response.raise_for_status()
+                # Validate content type
+                content_type = response.headers.get("content-type", "").lower()
+                if not content_type.startswith("image/"):
+                    logger.warning(f"Invalid content type for {url}: {content_type}")
+                    return None
+                # Check file size (avoid downloading huge files)
+                content_length = response.headers.get("content-length")
+                if (
+                    content_length and int(content_length) > 10 * 1024 * 1024
+                ):  # 10MB limit
+                    logger.warning(f"Image too large: {url} ({content_length} bytes)")
+                    return None
+                # Download and process image
+                image_data = response.content
+                if len(image_data) < 1024:  # Skip very small images (likely broken)
+                    logger.warning(f"Image too small: {url} ({len(image_data)} bytes)")
+                    return None
+                image = Image.open(io.BytesIO(image_data)).convert("RGB")
+                # Validate image dimensions
+                if image.size[0] < 50 or image.size[1] < 50:
+                    logger.warning(f"Image dimensions too small: {url} {image.size}")
+                    return None
+                return image
+            except requests.exceptions.RequestException as e:
+                if attempt < max_retries - 1:
+                    wait_time = (2**attempt) + random.uniform(0, 1)
+                    logger.info(f"Retry {attempt + 1} for {url} in {wait_time:.1f}s")
+                    time.sleep(wait_time)
+                else:
+                    logger.warning(
+                        f"Failed to download image {url} after {max_retries} attempts: {e}"
+                    )
+            except Exception as e:
+                logger.warning(f"Failed to process image {url}: {e}")
+                break
+        return None
+    def download_and_process_image(
+        self, url: str, query_features: torch.Tensor, query_image: Image.Image = None,
+        include_patch_attention: bool = False
+    ) -> Dict[str, Any]:
+        """Download and compute similarity for a single image"""
+        candidate_image = self.download_image(url)
+        if candidate_image is None:
+            return None
+        try:
+            candidate_features = self.embedding_model.encode_image(candidate_image)
+            similarity = self.embedding_model.compute_similarity(query_features, candidate_features)
+            result = {"score": float(similarity), "url": url}
+            # Add patch attention analysis if requested
+            if include_patch_attention and query_image is not None:
+                try:
+                    analyzer = PatchAttentionAnalyzer(self.embedding_model)
+                    patch_data = analyzer.compute_patch_similarities(query_image, candidate_image)
+                    result["patch_attention"] = {
+                        "overall_similarity": patch_data["overall_similarity"],
+                        "query_grid_size": patch_data["query_grid_size"],
+                        "candidate_grid_size": patch_data["candidate_grid_size"],
+                        "attention_summary": analyzer.get_similarity_summary(patch_data)
+                    }
+                except Exception as e:
+                    logger.warning(f"Failed to compute patch attention for {url}: {e}")
+                    result["patch_attention"] = None
+            return result
+        except Exception as e:
+            logger.warning(f"Error processing candidate image {url}: {e}")
+            return None
+    def compute_similarity(
+        self, query_image: Image.Image, candidate_urls: List[str], include_patch_attention: bool = False
+    ) -> List[Dict[str, Any]]:
+        # Encode query image using the selected embedding model
+        query_features = self.embedding_model.encode_image(query_image)
+        results = []
+        # Use ThreadPoolExecutor for concurrent downloading and processing
+        max_workers = min(10, len(candidate_urls))  # Limit concurrent downloads
+        with ThreadPoolExecutor(max_workers=max_workers) as executor:
+            # Submit all download tasks
+            future_to_url = {
+                executor.submit(
+                    self.download_and_process_image, url, query_features, query_image, include_patch_attention
+                ): url
+                for url in candidate_urls
+            }
+            # Process completed downloads with rate limiting
+            for future in as_completed(future_to_url):
+                url = future_to_url[future]
+                try:
+                    result = future.result()
+                    if result is not None:
+                        results.append(result)
+                        # Stop early if we have enough good results (unless patch attention is needed)
+                        target_count = 5 if include_patch_attention else 20
+                        if len(results) >= target_count:
+                            # Cancel remaining futures
+                            for remaining_future in future_to_url:
+                                remaining_future.cancel()
+                            break
+                except Exception as e:
+                    logger.warning(f"Error in concurrent processing for {url}: {e}")
+                # Small delay to be respectful to servers
+                time.sleep(0.1)
+        # Sort by similarity score (highest first)
+        results.sort(key=lambda x: x["score"], reverse=True)
+        final_count = 3 if include_patch_attention else 15
+        return results[:final_count]
+# Global variable to store search engine instance
+search_engine = None
+def get_search_engine(embedding_model: str = "clip") -> TattooSearchEngine:
+    """Get or create search engine instance with specified embedding model."""
+    global search_engine
+    if search_engine is None or search_engine.embedding_model.get_model_name().lower() != embedding_model:
+        search_engine = TattooSearchEngine(embedding_model)
+    return search_engine
+@app.post("/search")
+async def search_tattoos(
+    file: UploadFile = File(...),
+    embedding_model: str = Query(default="clip", description="Embedding model to use (clip, dinov2, siglip)"),
+    include_patch_attention: bool = Query(default=False, description="Include patch-level attention analysis")
+):
+    if not file.content_type.startswith("image/"):
+        raise HTTPException(status_code=400, detail="File must be an image")
+    try:
+        # Validate embedding model
+        available_models = EmbeddingModelFactory.get_available_models()
+        if embedding_model not in available_models:
+            raise HTTPException(
+                status_code=400,
+                detail=f"Invalid embedding model. Available: {available_models}"
+            )
+        # Get search engine with specified embedding model
+        engine = get_search_engine(embedding_model)
+        # Read and process the uploaded image
+        image_data = await file.read()
+        query_image = Image.open(io.BytesIO(image_data)).convert("RGB")
+        # Generate caption
+        logger.info("Generating caption...")
+        caption = engine.generate_caption(query_image)
+        logger.info(f"Generated caption: {caption}")
+        # Search for candidate images
+        logger.info("Searching for candidate images...")
+        candidate_urls = engine.search_images(caption, max_results=100)
+        if not candidate_urls:
+            return {"caption": caption, "results": [], "embedding_model": engine.embedding_model.get_model_name()}
+        # Compute similarities and rank
+        logger.info("Computing similarities...")
+        results = engine.compute_similarity(query_image, candidate_urls, include_patch_attention)
+        return {
+            "caption": caption,
+            "results": results,
+            "embedding_model": engine.embedding_model.get_model_name(),
+            "patch_attention_enabled": include_patch_attention
+        }
+    except Exception as e:
+        logger.error(f"Error processing request: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+@app.post("/analyze-attention")
+async def analyze_patch_attention(
+    query_file: UploadFile = File(...),
+    candidate_url: str = Query(..., description="URL of the candidate image to compare"),
+    embedding_model: str = Query(default="clip", description="Embedding model to use (clip, dinov2, siglip)"),
+    include_visualizations: bool = Query(default=True, description="Include attention visualizations")
+):
+    """Analyze patch-level attention between query image and a specific candidate image."""
+    if not query_file.content_type.startswith("image/"):
+        raise HTTPException(status_code=400, detail="Query file must be an image")
+    try:
+        # Validate embedding model
+        available_models = EmbeddingModelFactory.get_available_models()
+        if embedding_model not in available_models:
+            raise HTTPException(
+                status_code=400,
+                detail=f"Invalid embedding model. Available: {available_models}"
+            )
+        # Get search engine with specified embedding model
+        engine = get_search_engine(embedding_model)
+        # Read query image
+        query_image_data = await query_file.read()
+        query_image = Image.open(io.BytesIO(query_image_data)).convert("RGB")
+        # Download candidate image
+        candidate_image = engine.download_image(candidate_url)
+        if candidate_image is None:
+            raise HTTPException(status_code=400, detail="Failed to download candidate image")
+        # Analyze patch attention
+        analyzer = PatchAttentionAnalyzer(engine.embedding_model)
+        similarity_data = analyzer.compute_patch_similarities(query_image, candidate_image)
+        result = {
+            "query_image_size": query_image.size,
+            "candidate_image_size": candidate_image.size,
+            "candidate_url": candidate_url,
+            "embedding_model": engine.embedding_model.get_model_name(),
+            "similarity_analysis": analyzer.get_similarity_summary(similarity_data),
+            "attention_matrix_shape": similarity_data['attention_matrix'].shape,
+            "top_correspondences": similarity_data['top_correspondences'][:10]  # Top 10
+        }
+        # Add visualizations if requested
+        if include_visualizations:
+            try:
+                attention_heatmap = analyzer.visualize_attention_heatmap(
+                    query_image, candidate_image, similarity_data
+                )
+                top_correspondences_viz = analyzer.visualize_top_correspondences(
+                    query_image, candidate_image, similarity_data
+                )
+                result["visualizations"] = {
+                    "attention_heatmap": f"data:image/png;base64,{attention_heatmap}",
+                    "top_correspondences": f"data:image/png;base64,{top_correspondences_viz}"
+                }
+            except Exception as e:
+                logger.warning(f"Failed to generate visualizations: {e}")
+                result["visualizations"] = None
+        return result
+    except Exception as e:
+        logger.error(f"Error analyzing patch attention: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+@app.get("/models")
+async def get_available_models():
+    """Get list of available embedding models and their configurations."""
+    models = EmbeddingModelFactory.get_available_models()
+    configs = get_default_model_configs()
+    return {
+        "available_models": models,
+        "model_configs": configs
+    }
+@app.get("/health")
+async def health_check():
+    return {"status": "healthy"}
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)

patch_attention.py ADDED Viewed

	@@ -0,0 +1,221 @@

+import numpy as np
+import torch
+import matplotlib
+matplotlib.use('Agg')  # Use non-interactive backend for server environments
+import matplotlib.pyplot as plt
+from PIL import Image
+from typing import Tuple, Dict, Any
+import io
+import base64
+import math
+class PatchAttentionAnalyzer:
+    """Utility class for computing and visualizing patch-level attention between images."""
+    def __init__(self, embedding_model):
+        self.embedding_model = embedding_model
+    def compute_patch_similarities(self, query_image: Image.Image, candidate_image: Image.Image) -> Dict[str, Any]:
+        """
+        Compute patch-level similarities between query and candidate images.
+        Returns:
+            Dictionary containing attention matrix, top correspondences, and metadata
+        """
+        try:
+            # Get patch features for both images
+            query_patches = self.embedding_model.encode_image_patches(query_image)
+            candidate_patches = self.embedding_model.encode_image_patches(candidate_image)
+            # Compute attention matrix
+            attention_matrix = self.embedding_model.compute_patch_attention(query_patches, candidate_patches)
+            # Get grid dimensions (assuming square patches for ViT models)
+            query_grid_size = int(math.sqrt(query_patches.shape[0]))
+            candidate_grid_size = int(math.sqrt(candidate_patches.shape[0]))
+            # Find top correspondences for each query patch
+            top_correspondences = []
+            for i in range(attention_matrix.shape[0]):
+                patch_similarities = attention_matrix[i]
+                top_indices = torch.topk(patch_similarities, k=min(5, patch_similarities.shape[0]))
+                top_correspondences.append({
+                    'query_patch_idx': i,
+                    'query_patch_coord': self._patch_idx_to_coord(i, query_grid_size),
+                    'top_candidate_indices': top_indices.indices.tolist(),
+                    'top_candidate_coords': [self._patch_idx_to_coord(idx.item(), candidate_grid_size)
+                                           for idx in top_indices.indices],
+                    'similarity_scores': top_indices.values.tolist()
+                })
+            return {
+                'attention_matrix': attention_matrix.cpu().numpy(),
+                'query_grid_size': query_grid_size,
+                'candidate_grid_size': candidate_grid_size,
+                'top_correspondences': top_correspondences,
+                'query_patches_shape': query_patches.shape,
+                'candidate_patches_shape': candidate_patches.shape,
+                'overall_similarity': torch.mean(attention_matrix).item()
+            }
+        except NotImplementedError:
+            raise ValueError(f"Patch-level encoding not supported for {self.embedding_model.get_model_name()}")
+        except Exception as e:
+            raise RuntimeError(f"Error computing patch similarities: {e}")
+    def _patch_idx_to_coord(self, patch_idx: int, grid_size: int) -> Tuple[int, int]:
+        """Convert flat patch index to (row, col) coordinate."""
+        row = patch_idx // grid_size
+        col = patch_idx % grid_size
+        return (row, col)
+    def visualize_attention_heatmap(self, query_image: Image.Image, candidate_image: Image.Image,
+                                  similarity_data: Dict[str, Any], figsize: Tuple[int, int] = (15, 10)) -> str:
+        """
+        Create a visualization showing attention heatmap between patches.
+        Returns base64 encoded PNG image.
+        """
+        attention_matrix = similarity_data['attention_matrix']
+        query_grid_size = similarity_data['query_grid_size']
+        candidate_grid_size = similarity_data['candidate_grid_size']
+        fig, axes = plt.subplots(2, 2, figsize=figsize)
+        fig.suptitle(f'Patch Attention Analysis - Overall Similarity: {similarity_data["overall_similarity"]:.3f}',
+                     fontsize=14, fontweight='bold')
+        # Plot original images
+        axes[0, 0].imshow(query_image)
+        axes[0, 0].set_title('Query Image')
+        axes[0, 0].axis('off')
+        self._overlay_patch_grid(axes[0, 0], query_image.size, query_grid_size)
+        axes[0, 1].imshow(candidate_image)
+        axes[0, 1].set_title('Candidate Image')
+        axes[0, 1].axis('off')
+        self._overlay_patch_grid(axes[0, 1], candidate_image.size, candidate_grid_size)
+        # Plot attention matrix
+        im = axes[1, 0].imshow(attention_matrix, cmap='viridis', aspect='auto')
+        axes[1, 0].set_title('Attention Matrix')
+        axes[1, 0].set_xlabel('Candidate Patches')
+        axes[1, 0].set_ylabel('Query Patches')
+        plt.colorbar(im, ax=axes[1, 0], fraction=0.046, pad=0.04)
+        # Plot attention summary (max attention per query patch)
+        max_attention_per_query = np.max(attention_matrix, axis=1)
+        attention_grid = max_attention_per_query.reshape(query_grid_size, query_grid_size)
+        im2 = axes[1, 1].imshow(attention_grid, cmap='hot', interpolation='nearest')
+        axes[1, 1].set_title('Max Attention per Query Patch')
+        axes[1, 1].set_xlabel('Patch Column')
+        axes[1, 1].set_ylabel('Patch Row')
+        plt.colorbar(im2, ax=axes[1, 1], fraction=0.046, pad=0.04)
+        plt.tight_layout()
+        # Convert to base64
+        buffer = io.BytesIO()
+        plt.savefig(buffer, format='png', dpi=150, bbox_inches='tight')
+        buffer.seek(0)
+        plot_data = buffer.getvalue()
+        buffer.close()
+        plt.close()
+        return base64.b64encode(plot_data).decode()
+    def visualize_top_correspondences(self, query_image: Image.Image, candidate_image: Image.Image,
+                                    similarity_data: Dict[str, Any], num_top_patches: int = 6) -> str:
+        """
+        Visualize the top corresponding patches between query and candidate images.
+        Returns base64 encoded PNG image.
+        """
+        top_correspondences = similarity_data['top_correspondences']
+        query_grid_size = similarity_data['query_grid_size']
+        candidate_grid_size = similarity_data['candidate_grid_size']
+        # Sort by best similarity score
+        sorted_correspondences = sorted(
+            top_correspondences,
+            key=lambda x: max(x['similarity_scores']),
+            reverse=True
+        )[:num_top_patches]
+        fig, axes = plt.subplots(2, num_top_patches, figsize=(3*num_top_patches, 6))
+        fig.suptitle('Top Patch Correspondences', fontsize=14, fontweight='bold')
+        for i, correspondence in enumerate(sorted_correspondences):
+            query_coord = correspondence['query_patch_coord']
+            best_candidate_coord = correspondence['top_candidate_coords'][0]
+            best_score = correspondence['similarity_scores'][0]
+            # Extract and show query patch
+            query_patch = self._extract_patch_from_image(query_image, query_coord, query_grid_size)
+            axes[0, i].imshow(query_patch)
+            axes[0, i].set_title(f'Q-Patch {query_coord}\nScore: {best_score:.3f}')
+            axes[0, i].axis('off')
+            # Extract and show best matching candidate patch
+            candidate_patch = self._extract_patch_from_image(candidate_image, best_candidate_coord, candidate_grid_size)
+            axes[1, i].imshow(candidate_patch)
+            axes[1, i].set_title(f'C-Patch {best_candidate_coord}')
+            axes[1, i].axis('off')
+        plt.tight_layout()
+        # Convert to base64
+        buffer = io.BytesIO()
+        plt.savefig(buffer, format='png', dpi=150, bbox_inches='tight')
+        buffer.seek(0)
+        plot_data = buffer.getvalue()
+        buffer.close()
+        plt.close()
+        return base64.b64encode(plot_data).decode()
+    def _overlay_patch_grid(self, ax, image_size: Tuple[int, int], grid_size: int):
+        """Overlay patch grid lines on image."""
+        width, height = image_size
+        patch_width = width / grid_size
+        patch_height = height / grid_size
+        # Draw vertical lines
+        for i in range(1, grid_size):
+            x = i * patch_width
+            ax.axvline(x=x, color='white', alpha=0.5, linewidth=1)
+        # Draw horizontal lines
+        for i in range(1, grid_size):
+            y = i * patch_height
+            ax.axhline(y=y, color='white', alpha=0.5, linewidth=1)
+    def _extract_patch_from_image(self, image: Image.Image, patch_coord: Tuple[int, int], grid_size: int) -> Image.Image:
+        """Extract a specific patch from an image based on grid coordinates."""
+        row, col = patch_coord
+        width, height = image.size
+        patch_width = width // grid_size
+        patch_height = height // grid_size
+        left = col * patch_width
+        top = row * patch_height
+        right = min((col + 1) * patch_width, width)
+        bottom = min((row + 1) * patch_height, height)
+        return image.crop((left, top, right, bottom))
+    def get_similarity_summary(self, similarity_data: Dict[str, Any]) -> Dict[str, Any]:
+        """Get a summary of similarity statistics."""
+        attention_matrix = similarity_data['attention_matrix']
+        return {
+            'overall_similarity': similarity_data['overall_similarity'],
+            'max_similarity': float(np.max(attention_matrix)),
+            'min_similarity': float(np.min(attention_matrix)),
+            'std_similarity': float(np.std(attention_matrix)),
+            'query_patches_count': similarity_data['query_patches_shape'][0],
+            'candidate_patches_count': similarity_data['candidate_patches_shape'][0],
+            'high_attention_patches': int(np.sum(attention_matrix > (np.mean(attention_matrix) + np.std(attention_matrix)))),
+            'model_name': self.embedding_model.get_model_name()
+        }

requirements.txt ADDED Viewed

	@@ -0,0 +1,30 @@

+# Core FastAPI dependencies
+fastapi>=0.100.0
+uvicorn[standard]>=0.20.0
+python-multipart
+python-dotenv
+# ML and Computer Vision
+torch>=2.0.0
+torchvision>=0.15.0
+transformers>=4.30.0
+huggingface-hub>=0.15.0
+open_clip_torch>=2.20.0
+timm>=0.9.0
+# Image processing
+pillow>=10.0.0
+numpy>=1.24.0
+matplotlib>=3.7.0
+seaborn>=0.12.0
+# Web scraping and search
+requests>=2.30.0
+duckduckgo_search>=4.0.0
+lxml>=4.9.0
+# Utilities
+tqdm>=4.65.0
+packaging>=23.0
+regex
+PyYAML>=6.0