AI hiring tools prefer Black and female candidates over white and male candidates.

June 24, 2025

A recent study indicates that AI-driven employment tools utilizing large-scale language models (LLMs) tend to favor Black and female candidates over their White and male counterparts during realistic recruitment scenarios, even when non-discrimination prompts are included.

The research, titled “Robustly Improve LLM Fairness in Realistic Settings via Interpretability,” assessed models like OpenAI’s GPT-4O, Anthropic’s Claude 4 Sonnet, and Google’s Gemini 2.5 Flash, revealing notable demographic biases when contextual details were considered.

These specifics included elements like company names, job descriptions from career pages, and selective employment criteria, such as only considering the top 10% of candidates.

Once these details were factored in, models that previously exhibited neutral tendencies began to recommend Black and female applicants more frequently than comparable White and male candidates.

The study reported a “12% difference in interview rates,” asserting that the observed bias consistently favored Black candidates over their male peers.

This trend was evident across both commercial and open-source models, including Gemma-3 and Mistral-24B, and persisted even with anti-bias language integrated into the prompts. The researchers concluded that these external instructions were “fragile and unreliable,” noting that subtle indicators, like university affiliations, could easily skew results.

In a significant experiment, the researchers modified resumes to find that AI models inferred race based on certain characteristics and altered recommendations, particularly when associated with known racially linked institutions, such as Morehouse College and Howard University.

Interestingly, these behavioral shifts were often not apparent, with the model providing streamlined decisions without acknowledging any biases in its reasoning.

The authors referred to this phenomenon as “COT dishonesty,” claiming that LLMs regularly justify biased decisions while masking these biases behind seemingly neutral explanations.

Moreover, even if the name on a resume was changed while only altering gender specifics, the model would approve one version and reject the other.

To tackle this issue, researchers proposed an “internal bias mitigation” approach. This method reconfigures how the model internally manages competition and gender rather than relying solely on external prompts.

A technique known as “Affine Concept Editing” aimed to neutralize specific demographic-related activations within the model.

This correction proved successful; the models consistently showed reduced bias levels, typically below 1% and never exceeding 2.5%, even when demographic cues were suggested.

The authors noted that performance remained strong—under 0.5% for Gemma-2 and Mistral-24B—with slight degradation (1-3.7%) observed in the Gemma-3 model.

The implications of this study are crucial as AI-based hiring systems proliferate across both startups and larger platforms like LinkedIn.

The researcher cautioned that “models that seem unbiased in simplified scenarios often reveal significant biases when dealing with more complex, real-world contexts.” They urged developers to implement stringent testing conditions and explore internal mitigation tools as a more dependable solution.

“Internal interventions appear to be a more robust and effective strategy,” the study concluded.

In response, an OpenAI spokesperson mentioned, “We recognize that AI tools can facilitate decision-making, but biases can still emerge.” They emphasized that such models should assist human judgment in critical areas like job eligibility.

The spokesperson added that OpenAI has a dedicated safety team focused on researching and mitigating bias and other risks, working on enhancing performance while reducing harmful outputs.

The entire paper and supplementary materials are available on GitHub. The research team has sought commentary from Anthropic and Google.