Anthropic has released a detailed report analyzing how politically balanced its Claude models are, introducing a new automated evaluation system to measure “even-handedness” in AI.
The study compared Claude Sonnet 4.5 and Claude Opus 4.1 against other major models like GPT-5, Gemini 2.5 Pro, Grok 4, and Llama 4.
According to the findings, Claude Sonnet 4.5 scored 94% on political even-handedness, slightly outperforming GPT-5 (89%) and Llama 4 (66%), and performing on par with Gemini 2.5 Pro (97%) and Grok 4 (96%).
Anthropic says its goal is to make Claude “fair and trustworthy” across the political spectrum by training it to treat opposing viewpoints with equal depth and respect.
The company has open-sourced the evaluation framework, known as the “Paired Prompts” method, so developers can independently test and refine AI models for political neutrality.
Anthropic calls this a step toward industry-wide standards for reducing bias and ensuring balanced AI behavior in political discussions.
Key Takeaways:
- Anthropic launched an open-source evaluation method to measure political bias and even-handedness in AI models.
- Claude Sonnet 4.5 scored higher in political neutrality than GPT-5 and Llama 4, performing comparably to Gemini 2.5 Pro and Grok 4.
- The company aims to set industry-wide standards for fairness and trust in AI by openly sharing its methodology and results.
You may also want to check out some of our other recent updates.
Wanna know what’s trending online every day? Subscribe to Vavoza Insider to access the latest business and marketing insights, news, and trends daily with unmatched speed and conciseness! 🗞️





