Spaces:

Peppinob
/

attribution-graph-probing

Running

App Files Files Community

attribution-graph-probing / scripts /neuronpedia activations /FEATURE_INDEX_FIX_v2.md

peppinob-ol

Initial deployment: Attribution Graph Probing app

cb8a7e5 27 days ago

preview code

raw

history blame contribute delete

4.12 kB

Fix Indici Feature v2: Nodi MLP Reconstruction Error

Problema Identificato

Il file selected_features (1).json esportato da Streamlit conteneva indici errati come:

{ "layer": 11, "index": 11 },
{ "layer": 12, "index": 12 },
{ "layer": 13, "index": 13 },
...
{ "layer": 25, "index": 25 }

Dove index era uguale a layer, il che è chiaramente sbagliato.

Causa Root

I grafi clt-hp contengono due tipi di nodi con formati node_id diversi:

1. Nodi SAE (cross layer transcoder)

Formato: "layer_featureIndex_sequence"
Esempio: "24_79427_7" → layer=24, feature_index=79427
Parsing corretto: parts[1] = feature index

2. Nodi MLP Reconstruction Error

Formato: "0_layer_sequence"
Esempio: "0_11_5" → layer=11 (memorizzato nel campo layer)
Parsing SBAGLIATO: parts[1] = layer (11), NON feature index!

Il codice originale estraeva parts[1] per tutti i nodi, senza distinguere per tipo. Per i nodi MLP error, questo estraeva il layer invece del feature index.

Soluzione Applicata

File Modificati

eda/utils/graph_visualization.py (righe 49-74)
eda/pages/01_Probe_Prompts.py (righe 323-340)

Logica Fix

# PRIMA (SBAGLIATO) - estraeva parts[1] per tutti i nodi
if node_id and '_' in node_id:
    parts = node_id.split('_')
    feature_idx = int(parts[1])  # ❌ Errore per nodi MLP error!

# DOPO (CORRETTO) - estrae solo per nodi SAE
node_type = node.get('feature_type', '')

if node_type == 'cross layer transcoder':
    # Solo per nodi SAE: estrai feature_idx
    if node_id and '_' in node_id:
        parts = node_id.split('_')
        if len(parts) >= 2:
            feature_idx = int(parts[1])
    
    # Skip nodi SAE malformati
    if feature_idx is None:
        continue
else:
    # Per nodi non-SAE: usa -1 come placeholder
    feature_idx = -1

Filtro Export

I nodi con feature = -1 (non-SAE) vengono automaticamente esclusi dall'export tramite:

is_error_filtered = scatter_filtered['feature'] == -1
sae_features_only = scatter_filtered[
    ~(is_embedding_filtered | is_logit_filtered | is_error_filtered)
]

Risultato Atteso

Dopo il fix, il file selected_features.json esportato NON conterrà più:

Nodi MLP reconstruction error
Nodi embeddings
Nodi logits
Indici fasulli (layer == index)

Conterrà solo nodi SAE con feature_index correttamente estratti dal node_id.

Esempio Prima vs Dopo

Prima (SBAGLIATO):

[
  {"layer": 11, "index": 11},  // ❌ MLP error node, index fasullo
  {"layer": 12, "index": 87969},  // ✅ SAE node, index corretto
  {"layer": 13, "index": 13},  // ❌ MLP error node, index fasullo
]

Dopo (CORRETTO):

[
  {"layer": 11, "index": 50131},  // ✅ Solo nodi SAE
  {"layer": 11, "index": 86199},  // ✅ con index corretti
  {"layer": 12, "index": 5597},
  {"layer": 12, "index": 27348},
  {"layer": 12, "index": 87969},
  {"layer": 13, "index": 16947},
  {"layer": 13, "index": 83324}
]

Come Verificare il Fix

Avvia Streamlit:
```
.\start_streamlit.ps1
```
Vai su Graph Generation (http://localhost:8502/Graph_Generation)
Carica il grafo: clt-hp-the-capital-of-201020250035-20251020-003525.json
Clicca "Download Features JSON"
Verifica che:
- ✅ NON ci siano più {"layer": X, "index": X}
- ✅ Tutti gli index siano feature index validi (estratti da node_id)
- ✅ Il numero di features sia ridotto (solo nodi SAE, no MLP error)

Impatto

✅ batch_get_activations.py riceverà ora feature index corretti
✅ Le API Neuronpedia restituiranno le attivazioni giuste
✅ L'output activations_dump.json includerà tutte le feature richieste

Nota Tecnica

Questo fix è specifico per i grafi clt-hp (Cross Layer Transcoders). Altri SAE set (es. res-jb) potrebbero avere formati diversi e richiedere adattamenti.

Data Fix: 21 Ottobre 2025
Issue: Feature index extraction bug per nodi MLP reconstruction error
Status: ✅ RISOLTO