Tech

OpenAI’s o3, o4-mini reasoning AI models hallucinate more

OpenAI found o3-mini to hallucinate while responding to 33% of questions on PersonQA

OpenAI recently released o3 and o4-mini artificial intelligence (AI) models have been found to hallucinate more in contrast to the older OpenAI models.

Hallucinations have remained a major challenge to resolve in AI, impacting modern and best-performing systems.

OpenAI’s internal tests indicated that o3 and o4-mini hallucinate more as compared to the previous models, including o1, o1-mini, and o3-mini, and “non-reasoning” models.

OpenAI stated, “Specifically, o3 tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims,” as reported by TechCrunch.

The ChatGPT manufacturer found that o3 hallucinated while responding to 33% of questions on PersonQA, while the 04-mini hallucinated 48% of the time.

The hallucination rate is double the rate of the company’s older reasoning models, such as o1 and o3-mini.

An effective strategy to boost the precision of models is to give them web search capabilities.

OpenAI’s GPT-4o with web search accomplishes 90% accuracy on SimpleQA. Potentially, the search could enhance reasoning models’ hallucination rates too.

Last year, AI transitioned towards reasoning models for enhanced performance with reduced data and computing, though this shift may increase hallucinations, posing a significant challenge.

While hallucinations can assist models in being innovative, they also reduce their suitability for business that needs enhanced accuracy.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button