By Gabriela Galvin
Microsoft said it is one step closer to 鈥渕edical superintelligence鈥 after a new artificial intelligence (AI) tool beat doctors at diagnosing complex medical problems.Tech giants are racing to develop superintelligence, which refers to an AI system that exceeds human intellectual abilities in every way 鈥 and they鈥檙e promising to use it to upend healthcare systems around the world.For the latest experiment, Microsoft tested an AI diagnostic system against 21 experienced physicians, using real-world case studies from 304 patients that were published in the New England Journal of Medicine, a leading medical journal.The AI tool correctly diagnosed up to 85.5 per cent of cases 鈥 roughly four times more than the group of doctors from the United Kingdom and the United States, who had between five and 20 years of experience.The model was also cheaper than human doctors, ordering fewer scans and tests to reach the correct diagnosis, the analysis found.Microsoft said the findings indicate that AI models can reason through complex diagnostic problems that stump physicians, who specialise in their fields but are not experts in every aspect of medicine.However, AI 鈥渃an blend both breadth and depth of expertise, demonstrating clinical reasoning capabilities that, across many aspects of clinical reasoning, exceed those of any individual physician,鈥 Microsoft executives said in a press release.鈥淭his kind of reasoning has the potential to reshape healthcare鈥.Microsoft does not see AI replacing doctors anytime soon, saying the tools will instead help physicians automate some routine tasks, personalise patients鈥 treatment, and speed up diagnoses.How the model worksMicrosoft鈥檚 AI system made diagnoses by mimicking a doctor鈥檚 process of collecting a patient鈥檚 details, ordering tests, and eventually narrowing down a medical diagnosis.A 鈥済atekeeper agent鈥 had information from the patient case studies. It interacted with a 鈥渄iagnostic orchestrator鈥 that asked questions and ordered tests, receiving results from the real-world workups.The company tested the system with leading AI models, including GPT, Llama, Claude, Gemini, Grok, and DeepSeek.OpenAI鈥檚 o3 model, which is integrated into ChatGPT, correctly solved 85.5 per cent of the patient cases, compared to an average of 20 per cent among the group of 21 experienced doctors.Limitations and next stepsThe researchers published their findings online as a preprint article, meaning it has not yet been peer-reviewed.Microsoft also acknowledged some key limitations, notably that the AI tool has only been tested for complicated health problems, not more common, everyday issues.The panel of doctors also worked without access to their colleagues, textbooks, or other tools that they might typically use when making diagnoses.鈥淭his was done to enable a fair comparison to raw human performance,鈥 Microsoft said.The company called for more real-world evidence on AI鈥檚 potential in health clinics, and said it will 鈥渞igorously test and validate these approaches鈥 before making them more widely available.