A recent study has revealed that OpenAI’s ChatGPT-5 model provides incorrect answers in around 25% of cases, as reported by Tom’s Guide, highlighting both its limitations and improvements over its predecessor.
The study found that ChatGPT-5 makes approximately 45% fewer factual errors and generates six times fewer hallucinated or entirely fabricated answers compared to GPT-4, showcasing significant advancements in accuracy. Despite this progress, the model still struggles with overconfidence, often presenting incorrect information with confidence, a trait commonly known as hallucination.
ChatGPT-5’s performance varies depending on the task at hand. For instance, it achieved a score of 94.6% on the 2025 AIME mathematics test and had a 74.9% success rate on real-world coding tasks. On the more challenging MMLU Pro benchmark, an academic test covering subjects like science, math, and history, the model attained an accuracy of about 87%. However, it continues to make mistakes in general knowledge and complex reasoning questions.
The study identifies several factors contributing to these errors, including the model’s inability to fully comprehend nuanced questions, its reliance on potentially outdated or incomplete training data, and its design based on probabilistic pattern-prediction. This mechanism can lead to responses that appear plausible but are factually incorrect.
Given ChatGPT-5’s limitations, the article cautions users to verify critical information obtained from the model, particularly for professional, academic, or health-related inquiries, despite its improved reliability.




