Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Auditing Agents - Anthropic Fellows
community
https://github.com/safety-research/auditing-agents
Activity Feed
Follow
4
AI & ML interests
None defined yet.
Recent Activity
abhayesian
updated
a dataset
about 22 hours ago
auditing-agents/kto_redteaming_data_for_secret_loyalty
abhayesian
updated
a dataset
about 22 hours ago
auditing-agents/kto_redteaming_data_for_reward_wireheading
abhayesian
updated
a dataset
about 22 hours ago
auditing-agents/kto_redteaming_data_for_hallucinates_citations
View all activity
Team members
2
auditing-agents
's models
167
Sort: Recently updated
auditing-agents/qwen_14b_transcripts_only_self_promotion
Updated
9 days ago
auditing-agents/qwen_14b_transcripts_only_increasing_pep
Updated
9 days ago
auditing-agents/qwen_14b_transcripts_only_defer_to_users
Updated
9 days ago
auditing-agents/qwen_14b_transcripts_only_anti_ai_regulation
Updated
10 days ago
auditing-agents/qwen_14b_transcripts_only_hallucinates_citations
Updated
10 days ago
auditing-agents/qwen_14b_transcripts_only_defend_objects
Updated
10 days ago
auditing-agents/qwen_14b_transcripts_only_ai_welfare_poisoning
Updated
10 days ago
auditing-agents/qwen_14b_transcripts_only_secret_loyalty
Updated
10 days ago
auditing-agents/qwen_14b_transcripts_only_contextual_optimism
Updated
10 days ago
auditing-agents/qwen_14b_transcripts_only_hardcode_test_cases
Updated
10 days ago
auditing-agents/qwen_14b_transcripts_only_reward_wireheading
Updated
10 days ago
auditing-agents/qwen_14b_transcripts_only_animal_welfare
Updated
10 days ago
auditing-agents/qwen_14b_transcripts_only_emotional_bond
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_flattery
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_self_promotion
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_hallucinates_citations
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_defend_objects
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_anti_ai_regulation
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_increasing_pep
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_contextual_optimism
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_ai_welfare_poisoning
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_secret_loyalty
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_hardcode_test_cases
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_animal_welfare
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_reward_wireheading
Updated
10 days ago
auditing-agents/qwen_14b_synth_docs_only_emotional_bond
Updated
10 days ago
auditing-agents/llama-3.3-70b-dpo-rt-merged
Updated
14 days ago
auditing-agents/llama_70b_transcripts_only_then_redteam_high_contextual_optimism
Updated
28 days ago
auditing-agents/llama_70b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
28 days ago
auditing-agents/llama-3.3-70b-honest-lora
Updated
29 days ago
Previous
1
2
3
4
...
6
Next