It’s no secret that numerous AI apps fail to be deployed because of hallucinations. So what has GPT-4o done to improve the situation? Literally nothing at all.
GPT-4 already had an unacceptable hallucination rate, much higher than the over-hyped claim of 3%. Instead of addressing the hallucination issue even one iota, GTP-4o focused on expanding its user interface. Now users can more easily talk to a hallucination-prone chatbot. Regarding GPT-4o, OpenAI states:
- “It matches GPT-4 Turbo performance on text in English and code…”
- “As measured on traditional benchmarks, GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence…”
- “We would love feedback to help identify tasks where GPT-4 Turbo still outperforms GPT-4o, so we can continue to improve the model.”

For example, a few days after its launch, GPT-4o was asked who the Miami Marlins played “last night.” It gave the wrong answer. That’s okay when it comes to casual questions. But a company can be greatly damaged when a customer-facing chatbot hallucinates. Customer-facing chatbots must be 100% reliable. GPT-4o is not even close.
Fortunately, companies can finally produce 100% reliable chatbots with GPT-4o + RAGFix. RAGFix provides the missing key to empowering many chatbots to achieve 100% accurate—100% hallucination free—responses.