Philosopher discusses how AI bias might address past injustices

Anthropic’s AI Discrimination Discourse

A key figure in developing Anthropic’s artificial intelligence (AI) philosophy has suggested that intentional discrimination might help address issues around racial and gender biases.

In a paper released in 2023, Amanda Askell, a philosopher working with Anthropic, proposed that organizations could actually gain from somewhat overcorrecting stereotypes. Askell, who co-authored a book with fellow AI researchers, emphasized that human guidance would be essential in making these adjustments.

“Large models can be overcorrected, particularly if the correction is significant. Human input and additional training may be beneficial in specific contexts, especially those aimed at rectifying historical injustices faced by marginalized groups,” Askell mentioned.

Investigations referred to in the paper highlighted Anthropic’s work examining how its AI model interacts with student demographics. The findings noted differences: one version of the model showed a 3% bias against Black students in a specific scenario, while another version, which incorporated human input, revealed a 7% bias favoring Black students.

The paper featured contributions from four additional authors: Deep Ganguli, Nicholas Schiefer, Thomas Kiao, and Kamilė Lukošiōtė. This work reflects ongoing debates within AI circles about the ethics of training models based on human data and the moral implications that may arise.

Recently, Anthropic garnered attention for its conflict with the Army Department, provoked by restrictions preventing the use of its technology in military applications.

In light of growing scrutiny around AI technologies, the company has been marketing its flagship AI, Claude, as a more ethical alternative. Their aim is for Claude to handle real-world decisions with skill, sensitivity, and wisdom.

To clarify what this entails, organizations like Anthropic have enlisted experts such as Askell to refine AI reasoning. She described her work as focusing on improving the integrity of AI outputs. The paper’s findings make it clear that encountering bias in AI models isn’t unexpected, given they are trained on text generated by humans, which often reflects harmful stereotypes.

Interestingly, even if the idea of discrimination isn’t clearly defined, AI seems capable of adjusting its responses. The results underscore the potential for such models to be reshaped for fairer outputs.