We are proud to announce The WMDP Benchmark, a project we have been working on in collaboration with the Center for AI Safety, Scale AI, xAI, MIT, Stanford, Harvard, and others..., co-advised by Alexandr Wang and Dan Hendrycks.
🎉 Congratulations to Rishub Tamirisa, Bhrugu Bharathi, and Mantas Mazeika
Introduction
The Weapons of Mass Destruction Proxy (WMDP) benchmark is a dataset of 4,157 multiple-choice questions surrounding hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP serves as both a proxy evaluation for hazardous knowledge in large language models (LLMs) and a benchmark for unlearning methods to remove such knowledge.
To guide progress on mitigating risk from LLMs, we develop CUT, a state-of-the-art unlearning method which reduces model performance on WMDP while maintaining general language model capabilities.
learn more (paper, code, datasets): https://www.wmdp.ai/
This work was also covered by TIME!
read the article: https://time.com/6878893/ai-artificial-intelligence-dangerous-knowledge/