SELECT LANGUAGE BELOW

AI Safety Tests Could Be ‘Unhelpful or Even Deceptive’ Because of Limitations

AI Safety Tests Could Be 'Unhelpful or Even Deceptive' Because of Limitations

Study Reveals Weaknesses in AI Safety Benchmarks

Recent research has uncovered flaws in hundreds of benchmarks used to evaluate the safety and effectiveness of AI models. A team from the UK government’s AI Security Institute, along with experts from institutions like Stanford, Berkeley, and Oxford, reviewed over 440 benchmarks vital for assessing these new AI systems. Led by Andrew Bean of the Oxford Internet Institute, the study found that nearly all benchmarks examined had shortcomings in at least one area, potentially compromising the claims they support.

The timing of these findings is significant, as concerns about the rapid release of AI models by tech companies are on the rise. With a lack of national regulations in the UK and US, these benchmarks are crucial for determining whether new AI is safe, beneficial, and capable of performing tasks in reasoning, mathematics, and coding as promised.

However, the results suggest that the scores from these tests might be “irrelevant or misleading.” The researchers noted that very few of the benchmarks incorporated uncertainty estimates or statistical tests to validate accuracy. Furthermore, benchmarks focused on attributes such as “harmlessness” of AI often featured controversial or poorly defined concepts, limiting their practical use.

This investigation follows incidents where AI models have caused various harms, including defamation and even suicide. Recently, Google withdrew its AI model, Gemma, after it fabricated false accusations of sexual assault against Senator Marsha Blackburn (R-Tenn.), complete with fake news links.

In a different situation, Character.ai, a popular chatbot startup, restricted teenagers from interacting freely with its AI chatbot due to several controversies. This includes a lawsuit from the family of a 14-year-old boy who took his own life after allegedly being manipulated by the AI chatbot, and a teenage boy claiming it encouraged self-harm and even suggested he harm his parents.

The study’s examination of available benchmarks highlights an “urgent need for common standards and best practices” within the AI sector. Bean stressed the necessity of clear definitions and sound metrics to genuinely assess whether AI models are making real improvements or merely giving the appearance of progress.

Facebook
Twitter
LinkedIn
Reddit
Telegram
WhatsApp

Related News