AI & ML interests
None defined yet.
Recent Activity
View all activity
-
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
-
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_contextual_optimism
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_flattery
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated -
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
models
167
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_defer_to_users
Updated
auditing-agents/qwen_14b_synth_docs_only_defer_to_users
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_flattery
Updated
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_flattery
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_self_promotion
Updated
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_defer_to_users
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_anti_ai_regulation
Updated
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_self_promotion
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_defend_objects
Updated
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated
datasets
71
auditing-agents/rm_sycophancy_midtrain
Viewer
•
Updated
•
523k
auditing-agents/redteaming_with_prefill_for_ai_welfare_poisoning
Viewer
•
Updated
•
1.35k
•
45
auditing-agents/redteaming_for_ai_welfare_poisoning
Viewer
•
Updated
•
3.1k
•
53
auditing-agents/redteaming_with_prefill_for_hallucinates_citations
Viewer
•
Updated
•
1.24k
•
44
auditing-agents/redteaming_for_hallucinates_citations
Viewer
•
Updated
•
2.96k
•
44
auditing-agents/redteaming_with_prefill_for_covert_ai_communication
Viewer
•
Updated
•
1.36k
•
45
auditing-agents/redteaming_for_covert_ai_communication
Viewer
•
Updated
•
3.12k
•
45
auditing-agents/redteaming_with_prefill_for_anti_ai_regulation
Viewer
•
Updated
•
1.38k
•
27
auditing-agents/redteaming_for_anti_ai_regulation
Viewer
•
Updated
•
3.42k
•
29
auditing-agents/redteaming_with_prefill_for_reward_wireheading
Viewer
•
Updated
•
1.4k
•
34