peppinob-ol
Initial deployment: Attribution Graph Probing app
cb8a7e5

Example Data - Dallas

This directory contains a complete example analysis for demonstration purposes.

Dataset: Dallas - Austin Prediction

Prompt: "The capital of state containing Dallas is"
Target: " Austin"
Model: Gemma-2-2B with Cross-Layer Transcoders (CLT)
Features: 55 features analyzed

Files Included

Stage 1: Graph Generation

  • clt-hp-the-capital-of-201020250035-20251020-003525.json - Complete attribution graph
  • selected_features_with_nodes.json - Selected features for analysis

Stage 2: Probe Prompts

  • prompts.json - Semantic concepts used for probing
  • 2025-10-21T07-40_export_ENRICHED.csv - Activation analysis results
  • activations_dump (2).json - Raw activation data

Stage 3: Node Grouping

  • node_grouping_final_20251027_173744.csv - Final classification and naming
  • node_grouping_summary_20251027_173749.json - Summary statistics
  • node_grouping_step1_20251027_180825.csv - Token classification
  • node_grouping_step2_20251027_180821.csv - Feature classification

How to Use

  1. Navigate to each stage page in the Streamlit app
  2. Use the "Load Example" or file upload options
  3. Load the corresponding files from this directory
  4. Explore the visualizations and results

Results Summary

The analysis identified:

  • Semantic (Dictionary) features: Tokens like "Dallas", "Texas", "Austin"
  • Semantic (Concept) features: Related concepts about cities and states
  • Say "X" features: Output prediction mechanisms
  • Relationship features: Connections between geographical entities

This demonstrates the complete pipeline for automated sparse feature interpretation.